Re: Slide, Dasl and pdf

Eirikur Hrafnsson Wed, 13 Apr 2005 10:12:17 -0700

There is a document somewhere in the Slide head which contains settings for the indexers (property and content) and the extractors (pdf,office...) it's called Extractor-Domain.xml. You could take a look at that. Here are my working settings (from my Domain.xml):

.... (at the end of my <store> definition) <contentindexer classname="org.apache.slide.index.lucene.LuceneContentIndexer">  <parameter name="indexpath">store/index/content</parameter>  <parameter name="asynchron">true</parameter> </contentindexer> <propertiesindexer classname="org.apache.slide.index.lucene.LucenePropertiesIndexer"> <parameter name="indexpath">store/index/metadata</parameter> <parameter name="asynchron">true</parameter>

 <configuration name="indexed-properties"> <property name="ContentType" namespace="IW:"> <text/> <is-defined/> </property> </configuration> </propertiesindexer> </store>

And then later in Domain.xml I add the extractors and set them to the paths we want to index, in our case just everything under /files .... <parameter name="versioncontrol-exclude"/> <parameter name="checkout-fork">forbidden</parameter> <parameter name="checkin-fork">forbidden</parameter>

 <extractors>  <extractor classname="org.apache.slide.extractor.SimpleXmlExtractor" uri="/files">  <configuration> <instruction namespace="http://xmlns.idega.com/block/article/xml"; property="headline" xpath="/article/headline/text()" /> <instruction namespace="http://xmlns.idega.com/block/article/xml"; property="teaser" xpath="/article/teaser/text()" /> <instruction namespace="http://xmlns.idega.com/block/article/xml"; property="body" xpath="/article/body/text()" /> <instruction namespace="http://xmlns.idega.com/block/article/xml"; property="author" xpath="/article/author/text()" /> <instruction namespace="http://xmlns.idega.com/block/article/xml"; property="source" xpath="/article/source/text()" /> <instruction namespace="http://xmlns.idega.com/block/article/xml"; property="comment" xpath="/article/comment/text()" /> </configuration> </extractor> <extractor classname="org.apache.slide.extractor.XmlContentExtractor" uri="/files"/> 

 <extractor classname="org.apache.slide.extractor.PDFExtractor" uri="/files" /> 

 <extractor classname="org.apache.slide.extractor.TextContentExtractor" uri="/files" /> 

 <extractor classname="org.apache.slide.extractor.OfficeExtractor" uri="/files"> <configuration> <instruction property="author" id="SummaryInformation-0-4" /> <instruction property="application" id="SummaryInformation-0-18" /> </configuration> </extractor> <extractor classname="org.apache.slide.extractor.MSWordExtractor" uri="/files"/> <extractor classname="org.apache.slide.extractor.MSExcelExtractor" uri="/files"/> <extractor classname="org.apache.slide.extractor.MSPowerPointExtractor" uri="/files"/>

    </extractors>

 <events> <event classname="org.apache.slide.webdav.event.WebdavEvent" enable="true" /> <event classname="org.apache.slide.event.ContentEvent" enable="true" /> ...

And that's how we shave!

Best Regards

Eirikur S. Hrafnsson, [EMAIL PROTECTED]
Chief Software Engineer
Idega Software
http://www.idega.com

On 13.4.2005, at 09:19, Edmund Urbani wrote:

Eirikur Hrafnsson wrote:
On 12.4.2005, at 13:19, Edmund Urbani wrote:
Bertrand Tignon wrote:
Thank u for replying Edmund. Well, I'm using Slide 2.1 I didn't manage to get the Slide 2.2 via cvs. I read the "how-to" but I don't see the 2.2 version, is it called "Slide_HEAD_PRE_MERGE", or "SLIDE_HEAD_AFTER_EVENTS" or something like that ? About the wiki "DASL Configuration", I don't know how to get the lucene library needed (package org.apache.slide.index.*). thanx for your help Bertrand.
There is no 2.2 release, yet. The closest you get to 2.2 is the current CVS HEAD. That org.apache.slide.index package is in slide-stores-2.x.jar. It's there even in 2.1, even though it appearantly does not work.
Maybe I should ask a different question on this list:
Does the LuceneIndexer that is currently in CVS HEAD work?
It does, the version in 2.1 does not.
-Eiki
Thanks. That's good to hear. I was about to give up.
Now I'd like to good back to the question I had earlier: Do I need to add anything to my Domain.xml other than the <contentindexer ..> element (as explained in the Wiki) to make the lucene indexer work?
 Edmund
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Slide, Dasl and pdf

Reply via email to