Re: My repository is not indexing PDFs, what am I missing?
Hi Chetan, On Thu, May 22, 2014 at 6:52 AM, Chetan Mehrotra wrote: > ...This might be due to OAK-1462. We had to disable the > LuceneIndexProvider form getting registered as OSGi service... Would that mean that the LuceneIndexEditor is still called, but the result isn't used? I'm asking because when adding a PDF, LuceneIndexEditor.addOrUpdate does call context.getWriter().updateDocument with a Document that does contain the PDF's full text in a field named :fulltext, so the text extraction is working (thanks Alex for the tika-parsers hint). But the query mentioned earlier in this thread still finds only .txt documents, not .pdf. Adding a .txt also causes LuceneIndexEditor.addOrUpdate to call context.getWriter().updateDocument, but maybe the text is also indexed in another way? -Bertrand
Re: My repository is not indexing PDFs, what am I missing?
Hi Bertrand, Don't you have to also add the tika dependencies (tika-core and tika-parsers) to the pom xml? best, alex On Wed, May 21, 2014 at 5:28 PM, Bertrand Delacretaz wrote: > Hi, > > I'm upgrading the OakSlingRepositoryManager used for Sling tests to > Oak 1.0, and it's not indexing PDFs anymore - it used to with oak 0.8. > > After uploading a text file to /tmp, the > /jcr:root/foo//*[jcr:contains(.,'some word')] query finds it, but the > same doesn't work with a PDF. > > My repository setup is in the OakSlingRepositoryManager [1] - am I > missing something in there? > > -Bertrand > > [1] > https://svn.apache.org/repos/asf/sling/trunk/bundles/jcr/oak-server/src/main/java/org/apache/sling/oak/server/OakSlingRepositoryManager.java >
Re: My repository is not indexing PDFs, what am I missing?
Hi Bertrand, This might be due to OAK-1462. We had to disable the LuceneIndexProvider form getting registered as OSGi service due to handle case where LuceneIndexProvider was getting registered twice (1 default and other for Aggregate case). Would try to resolve this soon by next week and then it should work fine Chetan Mehrotra On Wed, May 21, 2014 at 8:58 PM, Bertrand Delacretaz wrote: > Hi, > > I'm upgrading the OakSlingRepositoryManager used for Sling tests to > Oak 1.0, and it's not indexing PDFs anymore - it used to with oak 0.8. > > After uploading a text file to /tmp, the > /jcr:root/foo//*[jcr:contains(.,'some word')] query finds it, but the > same doesn't work with a PDF. > > My repository setup is in the OakSlingRepositoryManager [1] - am I > missing something in there? > > -Bertrand > > [1] > https://svn.apache.org/repos/asf/sling/trunk/bundles/jcr/oak-server/src/main/java/org/apache/sling/oak/server/OakSlingRepositoryManager.java
My repository is not indexing PDFs, what am I missing?
Hi, I'm upgrading the OakSlingRepositoryManager used for Sling tests to Oak 1.0, and it's not indexing PDFs anymore - it used to with oak 0.8. After uploading a text file to /tmp, the /jcr:root/foo//*[jcr:contains(.,'some word')] query finds it, but the same doesn't work with a PDF. My repository setup is in the OakSlingRepositoryManager [1] - am I missing something in there? -Bertrand [1] https://svn.apache.org/repos/asf/sling/trunk/bundles/jcr/oak-server/src/main/java/org/apache/sling/oak/server/OakSlingRepositoryManager.java