Oh i forgot to mention that i included
lucene-core-2.3.2.jar
and in my workspace:
<SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex">
<param name="path" value="${wsp.home}/index"/>
<param name="textFilterClasses"
value="org.apache.jackrabbit.extractor.PlainTextExtractor,org.apache.jackrabbit.extractor.MsWordTextExtractor,org.apache.jackrabbit.extractor.MsExcelTextExtractor,org.apache.jackrabbit.extractor.MsPowerPointTextExtractor,org.apache.jackrabbit.extractor.PdfTextExtractor,org.apache.jackrabbit.extractor.OpenOfficeTextExtractor,org.apache.jackrabbit.extractor.RTFTextExtractor,org.apache.jackrabbit.extractor.HTMLTextExtractor,org.apache.jackrabbit.extractor.XMLTextExtractor"/>
<param name="extractorPoolSize" value="2"/>
<param name="supportHighlighting" value="true"/>
</SearchIndex>
-----Ursprüngliche Nachricht-----
Von: Kurz Wolfgang
Gesendet: Donnerstag, 26. März 2009 16:20
An: '[email protected]'
Betreff: Problem getting full textual search to work with textextractors
Hello everyone,
i am trying to get the full textual search to work with text extractors.
I uploaded a pfd-file as resource into jackrabbit which works fine as I can
download it just fine and I get the file back.
But now I wanted to implement textual search inside document I uploaded and
somehow it doesn't find the documents even though the document contains the
strings that I am searching for.
What I did I this:
I added these jar files to my tomcat server lib folder since I am using JNDI to
connect
-jackrabbit-text-extractors-1.5.0.jar
-fontbox-0.1.0.jar
-junit-3.8.1.jar
-nekohtml-1.9.7.jar
-pdfbox-0.7.3.jar
-poi-3.0.2-FINAL.jar
-poi-scratchpad-3.0.2-FINAL.jar
-tm-extractors-0.4.jar
Then my x-path query looks like this:
//*[((jcr:contains(.,'consetetur')) or (jcr:contains(.,'sadipscing')))]
Both of those words are inside the pdf but the search result is empty.
Here is the code how I do the search:
javax.jcr.query.Query jcrQuery;
try {
jcrQuery =
session.getWorkspace().getQueryManager().createQuery(query, language);
QueryResult queryResult = jcrQuery.execute();
NodeIterator nodeIterator = queryResult.getNodes();
return nodeIterator;
}
catch (InvalidQueryException iqe) {
throw new
org.apache.jackrabbit.ocm.exception.InvalidQueryException(iqe);
}
catch (RepositoryException re) {
throw new
ObjectContentManagerException(re.getMessage(), re);
}
Would be really awesome if anyone had an idea for me why this doesn't work
Thx a lot in advance
Wolfgang