AW: Problem getting full textual search to work with textextractors

Kurz Wolfgang Thu, 26 Mar 2009 09:33:19 -0700

Oh i forgot to mention that i included 

lucene-core-2.3.2.jar


and in my workspace:

<SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
            <param name="path" value="${wsp.home}/index"/>
            <param name="textFilterClasses" 
value="org.apache.jackrabbit.extractor.PlainTextExtractor,org.apache.jackrabbit.extractor.MsWordTextExtractor,org.apache.jackrabbit.extractor.MsExcelTextExtractor,org.apache.jackrabbit.extractor.MsPowerPointTextExtractor,org.apache.jackrabbit.extractor.PdfTextExtractor,org.apache.jackrabbit.extractor.OpenOfficeTextExtractor,org.apache.jackrabbit.extractor.RTFTextExtractor,org.apache.jackrabbit.extractor.HTMLTextExtractor,org.apache.jackrabbit.extractor.XMLTextExtractor"/>
            <param name="extractorPoolSize" value="2"/>
            <param name="supportHighlighting" value="true"/>
        </SearchIndex>



-----Ursprüngliche Nachricht-----
Von: Kurz Wolfgang 
Gesendet: Donnerstag, 26. März 2009 16:20
An: '[email protected]'
Betreff: Problem getting full textual search to work with textextractors

Hello everyone,

i am trying to get the full textual search to work with text extractors.


I uploaded a pfd-file as resource into jackrabbit which works fine as I can 
download it just fine and I get the file back.

But now I wanted to implement textual search inside document I uploaded and 
somehow it doesn't find the documents even though the document contains the 
strings that I am searching for.

What I did I this:

I added these jar files to my tomcat server lib folder since I am using JNDI to 
connect

-jackrabbit-text-extractors-1.5.0.jar
-fontbox-0.1.0.jar
-junit-3.8.1.jar
-nekohtml-1.9.7.jar
-pdfbox-0.7.3.jar
-poi-3.0.2-FINAL.jar
-poi-scratchpad-3.0.2-FINAL.jar
-tm-extractors-0.4.jar

Then my x-path query looks like this:

//*[((jcr:contains(.,'consetetur')) or (jcr:contains(.,'sadipscing')))]

Both of those words are inside the pdf but the search result is empty.

Here is the code how I do the search:

javax.jcr.query.Query jcrQuery;
                try {
                        jcrQuery = 
session.getWorkspace().getQueryManager().createQuery(query, language);
                        QueryResult queryResult = jcrQuery.execute();
                        NodeIterator nodeIterator = queryResult.getNodes();
                        return nodeIterator;
                }
                catch (InvalidQueryException iqe) {
                        throw new 
org.apache.jackrabbit.ocm.exception.InvalidQueryException(iqe);
                }
                catch (RepositoryException re) {
                        throw new 
ObjectContentManagerException(re.getMessage(), re);
                }


Would be really awesome if anyone had an idea for me why this doesn't work

Thx a lot in advance
Wolfgang

AW: Problem getting full textual search to work with textextractors

Reply via email to