Hi Daniel, > -----Original Message----- > From: Daniel Florey [mailto:[EMAIL PROTECTED] > Sent: Dienstag, 24. Februar 2004 13:23 > To: Slide Developers Mailing List > Subject: Re: Full Text Search for MS Word and Excel files? > > > Hi Martin, > my proposal would look like this: > > public interface Extractor { > /** > * Will be called from extractor framework before > content and properties will > be stored > */ > public void extract(InputStream content) throws > ExtractException;
agreed > > /** > * gets extracted property value from the resource, for > example "author" > * for a word doc, ... > * > */ > public String getPropertyValue(String propertyName); > > /** > * gets a description of all properties that are > provided by this extractor. > * Can be used by indexing framework to e.g. generate > columns in index table Of course the store / indexer could do whatever it wants with the properties, but I think, the normal case should be to write the properties into DescriptorStore as NodeProperties. So these properties can be exposed to DASL. So what about following comment: * Can be used to be stored as NodeProperty in DescriptorStore > */ > public PropertyDescriptor[] getPropertyDescriptors(); > } > > I prefer InputStream for content because the whole document > doesn't have to be > loaded into memory. agreed. Best regards, Martin --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
