On Thu, Feb 18, 2010 at 5:30 PM,  <ross.dy...@ipaustralia.gov.au> wrote:
> My binary files are all PDFs, so the text is extracted with PdfBox toolkit
> and the full text becomes keyword searchable.
> All done using the default configuration, except I extended nt:resource to
> add a few attributes.
>
> The mimeType attribute will be application/octet-stream.
> Perhaps there is no plug-in that knows how to extract text from your binary
> files?

I tried pdf, word, and a plain text file . . . how long does it take
for a doc to be indexed?

>
>
>
>
> From:        ChadDavis <chadmichaelda...@gmail.com>
> To:        us...@jackrabbit.apache.org
> Date:        19/02/2010 11:13 AM
> Subject:        Re: jackrabbit 2.0 binary search indexing
> ________________________________
>
>
> On Thu, Feb 18, 2010 at 2:39 PM, Alexander Klimetschek <aklim...@day.com>
> wrote:
>> On Thu, Feb 18, 2010 at 18:35, ChadDavis <chadmichaelda...@gmail.com>
>> wrote:
>>> I'm looking for information on how to enable binary search indexing.
>>> I found documentation for pre-2.0 jackrabbit, and reference to the
>>> fact that Tika is now used internally for the binary indexing.
>>> However, I can't find any documentation of how to enable the binary
>>> indexing . . ..
>>
>> It is enabled for all nt:file binaries, ie. the jcr:content/jcr:data
>> property. The mimetype for text extraction is taken from the
>> jcr:content/jcr:mimeType property. I don't know if you can enable it
>> for other binary properties.
>>
>
> Just to clarify, you are saying that the binary indexing, as long as
> I'm using the JCR built-in node types for my binary file storage, e.g.
> nt:file --> jcr:content <nt:resource> -->jcr:data ( binary property
> with my file ), occurs automatically?
>
> If so, then something's not working for me.  Can you recommend some
> troubleshooting tips?  How can I determine whether the binaries are
> being indexed?  Note, I'm doing a full text search and it DOES hit
> other node properties, etc.
>
>
>
> --
> This message contains privileged and confidential information only
> for use by the intended recipient.  If you are not the intended
> recipient of this message, you must not disseminate, copy or use
> it in any manner.  If you have received this message in error,
> please advise the sender by reply e-mail.  Please ensure all
> e-mail attachments are scanned for viruses prior to opening or
> using.
>
>

Reply via email to