Hi Daniel,

I'd like to try and implement this if its ok. I'm looking at Oliver's version of the LuceneIndexer that is in CVS. It looks like you extend AbstractService to give it a context for transactions and implement IndexStore to override the IExpressionFactory.

Do you know if everything works with IExpressionFactory? Do I just need to plug the lucene code into it?

I'm not sure I understand the difference between the Indexer and the IndexStore. In the example in CVS the LuceneIndex is separate from the actual IndexStore (SimpleTxtContainsIndexer) even though they both implement Indexer.

What is the separation between these two interfaces?

Kind of a side point.. but is it envisioned that multiple extractors might be operating on the same content... with the extracted content from both needing to go into the index?

Regards,

Ryan Rhodes


From: Daniel Florey <[EMAIL PROTECTED]>
Reply-To: "Slide Users Mailing List" <[EMAIL PROTECTED]>
To: Slide Users Mailing List <[EMAIL PROTECTED]>
Subject: Re: How to implement a ContentExtractor?
Date: Mon, 21 Jun 2004 09:44:14 +0200

Hi Ryan,
you are exactly right. I didn't implement the ContentExtractor yet, because it makes no sense to do it in the way the property extractors works.
As you stated the content extractor only makes sense in combination with an indexer.
It was my plan to build an indexing framework, but had no time to do it. The LuceneIndex by Christophe is not checked in yet, because it is not integrated into all of the DASL stuff. So it is not possible to search the content via webdav by using this index.
If you want to perform server side queries only, it might be a choice to use this indexer and to integrate the ContentExtractor you are thinking of.
But in long term we need the 'big' solution that integrates indexing, extracting and DASL.
Regards,


Daniel


ryan wrote:

I tried to build a content extractor to pull the text from MS Word docs.

It looks like the PropertyExtractorTrigger is fired by the event
framework when a node is created or stored, and then it calls the
ExtractorManager to get all the PropertyExtractors associated with the
node that changed and adds the extracted properties to the node.

event framework --> PropertyExtractorTrigger --> ExtractorManager -->
PropertyExtractor


I don't think the ContentExtractor is getting called at all now. I was thinking it probably can't be a ContentExtractorTrigger, because there isn't anywhere to store the extracted content on the node. I think it will probably have to call ExtractorManager from LuceneIndex. Something like:

IndexTrigger --> LuceneIndex --> ExtractorManager --> ContentExtractor

Does this sound correct?


I found this LuceneIndex posted by Christophe, but I don't think it is checked into CVS. I believe you can index fields in Lucene that are not actually stored as content. I would like to try and add the content extractor code to the LuceneIndex. Does anyone know the status of the LuceneIndex?

http://www.mail-archive.com/[EMAIL PROTECTED]/msg09091.html

Thanks,

Ryan Rhodes






--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]




--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to