Re: Problem getting full textual search to work with textextractors

Marcel Reutegger Fri, 27 Mar 2009 01:31:08 -0700

Hi Wolfgang,

pdfbox has an additional dependency to jempbox:


[dependency:tree]
org.apache.jackrabbit:jackrabbit-text-extractors:jar:1.5.0
+- org.apache.poi:poi:jar:3.0.2-FINAL:compile
|  \- commons-logging:commons-logging:jar:1.1:compile
|     \- log4j:log4j:jar:1.2.14:compile
+- org.apache.poi:poi-scratchpad:jar:3.0.2-FINAL:compile
+- pdfbox:pdfbox:jar:0.7.3:compile
|  +- org.fontbox:fontbox:jar:0.1.0:compile
|  \- org.jempbox:jempbox:jar:0.2.0:compile   <=====
+- net.sourceforge.nekohtml:nekohtml:jar:1.9.7:compile
|  \- xerces:xercesImpl:jar:2.8.1:compile
|     \- xml-apis:xml-apis:jar:1.3.03:compile
+- org.slf4j:slf4j-api:jar:1.5.3:compile

did you see any warnings or errors in the logs?

regards
 marcel

On Thu, Mar 26, 2009 at 16:20, Kurz Wolfgang <[email protected]> wrote:
> Hello everyone,
>
> i am trying to get the full textual search to work with text extractors.
>
>
> I uploaded a pfd-file as resource into jackrabbit which works fine as I can 
> download it just fine and I get the file back.
>
> But now I wanted to implement textual search inside document I uploaded and 
> somehow it doesn't find the documents even though the document contains the 
> strings that I am searching for.
>
> What I did I this:
>
> I added these jar files to my tomcat server lib folder since I am using JNDI 
> to connect
>
> -jackrabbit-text-extractors-1.5.0.jar
> -fontbox-0.1.0.jar
> -junit-3.8.1.jar
> -nekohtml-1.9.7.jar
> -pdfbox-0.7.3.jar
> -poi-3.0.2-FINAL.jar
> -poi-scratchpad-3.0.2-FINAL.jar
> -tm-extractors-0.4.jar
>
> Then my x-path query looks like this:
>
> //*[((jcr:contains(.,'consetetur')) or (jcr:contains(.,'sadipscing')))]
>
> Both of those words are inside the pdf but the search result is empty.
>
> Here is the code how I do the search:
>
> javax.jcr.query.Query jcrQuery;
>                try {
>                        jcrQuery = 
> session.getWorkspace().getQueryManager().createQuery(query, language);
>                        QueryResult queryResult = jcrQuery.execute();
>                        NodeIterator nodeIterator = queryResult.getNodes();
>                        return nodeIterator;
>                }
>                catch (InvalidQueryException iqe) {
>                        throw new 
> org.apache.jackrabbit.ocm.exception.InvalidQueryException(iqe);
>                }
>                catch (RepositoryException re) {
>                        throw new 
> ObjectContentManagerException(re.getMessage(), re);
>                }
>
>
> Would be really awesome if anyone had an idea for me why this doesn't work
>
> Thx a lot in advance
> Wolfgang
>

Re: Problem getting full textual search to work with textextractors

Reply via email to