Re: Lucene search should create index of Jackrabbit repository

Alexander Klimetschek Wed, 12 May 2010 05:46:41 -0700

On Wed, May 12, 2010 at 14:12, Jenni Pothu <[email protected]> wrote:
> Hi Alex,
>        Thanks for the reply and information. It is very useful. Using 
> Jcr:contains I am able to search on the node content. But I need to search 
> the file content also. It's not working with Jcr:contains. Thanks again for 
> the needful.


Binary properties of nt:file nodes are full-text extracted with the
help of Apache Tika (since 2.0 [1], before Jackrabbit also had its own
text extractors [2] [3]). The support of files depends on the file
format and whether there is an open source library available that can
handle that format. Some formats such as PDF come in so many varieties
that there are certain issues every now and then.

Also note that large text extractions are queued and the result of it
might not be immediately visible after the save.

[1] http://lucene.apache.org/tika/
[2] http://jackrabbit.apache.org/jackrabbit-text-extractors.html
[3] http://wiki.apache.org/jackrabbit/Search

Regards,
Alex

-- 
Alexander Klimetschek
[email protected]

Re: Lucene search should create index of Jackrabbit repository

Reply via email to