Hi, there is nothing of the mentioned interfaces implemented / checked in yet. I will implement this things tomorrow and check in a proposal. I'm too busy today to work on it. I will check in a sample Domain.xml where you can see how the content-type or URL matching is cofigured. Regards, Daniel
Am Dienstag, 24. Februar 2004 15:49 schrieb Ryan Rhodes: > Hi guys, > > This all sounds great. I think I understand the extractor interface, and > I've worked with POI in the past so this doesn't sound too hard to > implement. I'm still a little fuzzy on how this fits into the big picture. > > How is the association made between my extractor and my MIME type (.DOC)? > > When does the extractor get invoked... at the time the content is stored? > > How does this integrate with DASL... are these properties automatically a > part of the content so that searches return a reference to the original > content or does it return a reference to the extracted content and then its > my job to map back to the original content? (sorry, I'm still learning > DASL). > > By the way, once you submit your proposal, does that mean the code is in > the CVS, or at what point is it likely to become a part of the release > (2.x) ? > > thanks, > > Ryan > > From: <[EMAIL PROTECTED]> > > >Reply-To: "Slide Developers Mailing List" <[EMAIL PROTECTED]> > >To: <[EMAIL PROTECTED]> > >Subject: RE: Full Text Search for MS Word and Excel files? > >Date: Tue, 24 Feb 2004 13:43:42 +0100 > > > >Hi Daniel, > > > > > -----Original Message----- > > > From: Daniel Florey [mailto:[EMAIL PROTECTED] > > > Sent: Dienstag, 24. Februar 2004 13:23 > > > To: Slide Developers Mailing List > > > Subject: Re: Full Text Search for MS Word and Excel files? > > > > > > > > > Hi Martin, > > > my proposal would look like this: > > > > > > public interface Extractor { > > > /** > > > * Will be called from extractor framework before > > > content and properties will > > > be stored > > > */ > > > public void extract(InputStream content) throws > > > ExtractException; > > > >agreed > > > > > /** > > > * gets extracted property value from the resource, for > > > example "author" > > > * for a word doc, ... > > > * > > > */ > > > public String getPropertyValue(String propertyName); > > > > > > /** > > > * gets a description of all properties that are > > > provided by this extractor. > > > * Can be used by indexing framework to e.g. generate > > > columns in index table > > > >Of course the store / indexer could do whatever it wants with the > >properties, but I think, the normal case should be to write the > >properties into DescriptorStore as NodeProperties. So these properties > >can be exposed to DASL. So what about following comment: > > > >* Can be used to be stored as NodeProperty in DescriptorStore > > > > > */ > > > public PropertyDescriptor[] getPropertyDescriptors(); > > > } > > > > > > I prefer InputStream for content because the whole document > > > doesn't have to be > > > loaded into memory. > > > >agreed. > > > > > >Best regards, > >Martin > > > >--------------------------------------------------------------------- > >To unsubscribe, e-mail: [EMAIL PROTECTED] > >For additional commands, e-mail: [EMAIL PROTECTED] > > _________________________________________________________________ > Get fast, reliable access with MSN 9 Dial-up. Click here for Special Offer! > http://click.atdmt.com/AVE/go/onm00200361ave/direct/01/ > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
