Hi Wolfgang, pdfbox has an additional dependency to jempbox:
[dependency:tree] org.apache.jackrabbit:jackrabbit-text-extractors:jar:1.5.0 +- org.apache.poi:poi:jar:3.0.2-FINAL:compile | \- commons-logging:commons-logging:jar:1.1:compile | \- log4j:log4j:jar:1.2.14:compile +- org.apache.poi:poi-scratchpad:jar:3.0.2-FINAL:compile +- pdfbox:pdfbox:jar:0.7.3:compile | +- org.fontbox:fontbox:jar:0.1.0:compile | \- org.jempbox:jempbox:jar:0.2.0:compile <===== +- net.sourceforge.nekohtml:nekohtml:jar:1.9.7:compile | \- xerces:xercesImpl:jar:2.8.1:compile | \- xml-apis:xml-apis:jar:1.3.03:compile +- org.slf4j:slf4j-api:jar:1.5.3:compile did you see any warnings or errors in the logs? regards marcel On Thu, Mar 26, 2009 at 16:20, Kurz Wolfgang <[email protected]> wrote: > Hello everyone, > > i am trying to get the full textual search to work with text extractors. > > > I uploaded a pfd-file as resource into jackrabbit which works fine as I can > download it just fine and I get the file back. > > But now I wanted to implement textual search inside document I uploaded and > somehow it doesn't find the documents even though the document contains the > strings that I am searching for. > > What I did I this: > > I added these jar files to my tomcat server lib folder since I am using JNDI > to connect > > -jackrabbit-text-extractors-1.5.0.jar > -fontbox-0.1.0.jar > -junit-3.8.1.jar > -nekohtml-1.9.7.jar > -pdfbox-0.7.3.jar > -poi-3.0.2-FINAL.jar > -poi-scratchpad-3.0.2-FINAL.jar > -tm-extractors-0.4.jar > > Then my x-path query looks like this: > > //*[((jcr:contains(.,'consetetur')) or (jcr:contains(.,'sadipscing')))] > > Both of those words are inside the pdf but the search result is empty. > > Here is the code how I do the search: > > javax.jcr.query.Query jcrQuery; > try { > jcrQuery = > session.getWorkspace().getQueryManager().createQuery(query, language); > QueryResult queryResult = jcrQuery.execute(); > NodeIterator nodeIterator = queryResult.getNodes(); > return nodeIterator; > } > catch (InvalidQueryException iqe) { > throw new > org.apache.jackrabbit.ocm.exception.InvalidQueryException(iqe); > } > catch (RepositoryException re) { > throw new > ObjectContentManagerException(re.getMessage(), re); > } > > > Would be really awesome if anyone had an idea for me why this doesn't work > > Thx a lot in advance > Wolfgang >
