Re: My repository is not indexing PDFs, what am I missing?

2014-05-26 Thread Bertrand Delacretaz
Hi Chetan,

On Thu, May 22, 2014 at 6:52 AM, Chetan Mehrotra
 wrote:
> ...This might be due to OAK-1462. We had to disable the
> LuceneIndexProvider form getting registered as OSGi service...

Would that mean that the LuceneIndexEditor is still called, but the
result isn't used?

I'm asking because when adding a PDF, LuceneIndexEditor.addOrUpdate
does call context.getWriter().updateDocument with a Document that does
contain the PDF's full text in a field named :fulltext, so the text
extraction is working (thanks Alex for the tika-parsers hint).

But the query mentioned earlier in this thread still finds only .txt
documents, not .pdf.

Adding a .txt also causes LuceneIndexEditor.addOrUpdate to call
context.getWriter().updateDocument, but maybe the text is also indexed
in another way?

-Bertrand


Re: My repository is not indexing PDFs, what am I missing?

2014-05-23 Thread Alex Parvulescu
Hi Bertrand,

Don't you have to also add the tika dependencies (tika-core and
tika-parsers) to the pom xml?

best,
alex



On Wed, May 21, 2014 at 5:28 PM, Bertrand Delacretaz  wrote:

> Hi,
>
> I'm upgrading the OakSlingRepositoryManager used for Sling tests to
> Oak 1.0, and it's not indexing PDFs anymore - it used to with oak 0.8.
>
> After uploading a text file to /tmp, the
> /jcr:root/foo//*[jcr:contains(.,'some word')] query finds it, but the
> same doesn't work with a PDF.
>
> My repository setup is in the OakSlingRepositoryManager [1] - am I
> missing something in there?
>
> -Bertrand
>
> [1]
> https://svn.apache.org/repos/asf/sling/trunk/bundles/jcr/oak-server/src/main/java/org/apache/sling/oak/server/OakSlingRepositoryManager.java
>


Re: My repository is not indexing PDFs, what am I missing?

2014-05-21 Thread Chetan Mehrotra
Hi Bertrand,

This might be due to OAK-1462. We had to disable the
LuceneIndexProvider form getting registered as OSGi service due to
handle case where LuceneIndexProvider was getting registered twice (1
default and other for Aggregate case). Would try to resolve this soon
by next week and then it should work fine
Chetan Mehrotra


On Wed, May 21, 2014 at 8:58 PM, Bertrand Delacretaz
 wrote:
> Hi,
>
> I'm upgrading the OakSlingRepositoryManager used for Sling tests to
> Oak 1.0, and it's not indexing PDFs anymore - it used to with oak 0.8.
>
> After uploading a text file to /tmp, the
> /jcr:root/foo//*[jcr:contains(.,'some word')] query finds it, but the
> same doesn't work with a PDF.
>
> My repository setup is in the OakSlingRepositoryManager [1] - am I
> missing something in there?
>
> -Bertrand
>
> [1] 
> https://svn.apache.org/repos/asf/sling/trunk/bundles/jcr/oak-server/src/main/java/org/apache/sling/oak/server/OakSlingRepositoryManager.java


My repository is not indexing PDFs, what am I missing?

2014-05-21 Thread Bertrand Delacretaz
Hi,

I'm upgrading the OakSlingRepositoryManager used for Sling tests to
Oak 1.0, and it's not indexing PDFs anymore - it used to with oak 0.8.

After uploading a text file to /tmp, the
/jcr:root/foo//*[jcr:contains(.,'some word')] query finds it, but the
same doesn't work with a PDF.

My repository setup is in the OakSlingRepositoryManager [1] - am I
missing something in there?

-Bertrand

[1] 
https://svn.apache.org/repos/asf/sling/trunk/bundles/jcr/oak-server/src/main/java/org/apache/sling/oak/server/OakSlingRepositoryManager.java