Re: DASL lucene content indexing

Fabrice Dewasmes Wed, 07 Dec 2005 00:53:35 -0800

[EMAIL PROTECTED] wrote:

Hi,I just try to get the new lucene indexer to work, but up to now without

success. I'm working with the Slide head/trunk.
I followed the steps in the Wiki to configure DASL/Lucene in the Domain.xml
and the indexes get created when server starts up.
But the content is never updated, it just contains a single segment file.

Is there something that I missed?

Daniel


BTW: Is there a way to search the mailing list archives? The layout changed
and no search anymore?



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Try with this configuration for your domain.xml :

inside the store element :

<contentindexerclassname="org.apache.slide.index.TextContentIndexer"><parametername="indexpath">${filespath}index/content</parameter>

               </contentindexer>

<propertiesindexerclassname="org.apache.slide.index.lucene.LucenePropertiesIndexer"><parametername="indexpath">${filespath}index/metadata</parameter>

                   <configuration name="indexed-properties">
                       <property name="author" namespace="DAV:">

<textanalyzer="org.apache.lucene.analysis.WhitespaceAnalyzer"/>

                           <is-defined/>
                       </property>
                   </configuration>
               </propertiesindexer>

And use extractors :

  <!-- Extractor configuration -->
   <extractors>

<extractorclassname="org.apache.slide.extractor.SimpleXmlExtractor"uri="/files/articles/test.xml">

           <configuration>

<instruction property="title"xpath="/article/title/text()" /><instruction property="summary"xpath="/article/summary/text()" />

           </configuration>
       </extractor>

<extractorclassname="org.apache.slide.extractor.OfficeExtractor" uri="/files/docs/">

           <configuration>

<instruction property="author"id="SummaryInformation-0-4" /><instruction property="application"id="SummaryInformation-0-18" />

           </configuration>
       </extractor>

<extractorclassname="org.apache.slide.extractor.TextContentExtractor"uri="/files/spaces">

       </extractor>

<extractorclassname="org.apache.slide.extractor.XmlContentExtractor"uri="/files/spaces">

       </extractor>

<extractorclassname="org.apache.slide.extractor.MSWordExtractor" uri="/files/spaces">

       </extractor>

<extractorclassname="org.apache.slide.extractor.MSExcelExtractor" uri="/files/spaces">

       </extractor>

<extractorclassname="org.apache.slide.extractor.MSPowerPointExtractor"uri="/files/spaces">

       </extractor>

<extractor classname="org.apache.slide.extractor.PDFExtractor"uri="/files/spaces">

       </extractor>

This should do the trick

regards,

Fabrice

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: DASL lucene content indexing

Reply via email to