Re: Full Text Search for MS Word and Excel files?

Daniel Florey Tue, 24 Feb 2004 07:08:23 -0800

Hi,
there is nothing of the mentioned interfaces implemented / checked in yet. I 
will implement this things tomorrow and check in a proposal. I'm too busy 
today to work on it.
I will check in a sample Domain.xml where you can see how the content-type or 
URL matching is cofigured.
Regards, Daniel


Am Dienstag, 24. Februar 2004 15:49 schrieb Ryan Rhodes:
> Hi guys,
>
> This all sounds great.  I think I understand the extractor interface, and
> I've worked with POI in the past so this doesn't sound too hard to
> implement.  I'm still a little fuzzy on how this fits into the big picture.
>
> How is the association made between my extractor and my MIME type (.DOC)?
>
> When does the extractor get invoked... at the time the content is stored?
>
> How does this integrate with DASL... are these properties automatically a
> part of the content so that searches return a reference to the original
> content or does it return a reference to the extracted content and then its
> my job to map back to the original content?  (sorry, I'm still learning
> DASL).
>
> By the way, once you submit your proposal, does that mean the code is in
> the CVS, or at what point is it likely to become a part of the release
> (2.x) ?
>
> thanks,
>
> Ryan
>
> From: <[EMAIL PROTECTED]>
>
> >Reply-To: "Slide Developers Mailing List" <[EMAIL PROTECTED]>
> >To: <[EMAIL PROTECTED]>
> >Subject: RE: Full Text Search for MS Word and Excel files?
> >Date: Tue, 24 Feb 2004 13:43:42 +0100
> >
> >Hi Daniel,
> >
> > > -----Original Message-----
> > > From: Daniel Florey [mailto:[EMAIL PROTECTED]
> > > Sent: Dienstag, 24. Februar 2004 13:23
> > > To: Slide Developers Mailing List
> > > Subject: Re: Full Text Search for MS Word and Excel files?
> > >
> > >
> > > Hi Martin,
> > > my proposal would look like this:
> > >
> > > public interface Extractor {
> > >   /**
> > >   * Will be called from extractor framework before
> > > content and properties will
> > > be stored
> > >   */
> > >   public void extract(InputStream content) throws
> > > ExtractException;
> >
> >agreed
> >
> > >   /**
> > >    * gets extracted property value from the resource, for
> > > example "author"
> > >    * for a word doc, ...
> > >    *
> > >   */
> > >   public String getPropertyValue(String propertyName);
> > >
> > >   /**
> > >   * gets a description of all properties that are
> > > provided by this extractor.
> > >   * Can be used by indexing framework to e.g. generate
> > > columns in index table
> >
> >Of course the store / indexer could do whatever it wants with the
> >properties, but I think, the normal case should be to write the
> >properties into DescriptorStore as NodeProperties. So these properties
> >can be exposed to DASL. So what about following comment:
> >
> >* Can be used to be stored as NodeProperty in DescriptorStore
> >
> > >   */
> > >   public PropertyDescriptor[] getPropertyDescriptors();
> > > }
> > >
> > > I prefer InputStream for content because the whole document
> > > doesn't have to be
> > > loaded into memory.
> >
> >agreed.
> >
> >
> >Best regards,
> >Martin
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail: [EMAIL PROTECTED]
> >For additional commands, e-mail: [EMAIL PROTECTED]
>
> _________________________________________________________________
> Get fast, reliable access with MSN 9 Dial-up. Click here for Special Offer!
> http://click.atdmt.com/AVE/go/onm00200361ave/direct/01/
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Full Text Search for MS Word and Excel files?

Reply via email to