On Thu, Feb 18, 2010 at 5:30 PM, <ross.dy...@ipaustralia.gov.au> wrote: > My binary files are all PDFs, so the text is extracted with PdfBox toolkit > and the full text becomes keyword searchable. > All done using the default configuration, except I extended nt:resource to > add a few attributes. > > The mimeType attribute will be application/octet-stream. > Perhaps there is no plug-in that knows how to extract text from your binary > files?
I tried pdf, word, and a plain text file . . . how long does it take for a doc to be indexed? > > > > > From: ChadDavis <chadmichaelda...@gmail.com> > To: us...@jackrabbit.apache.org > Date: 19/02/2010 11:13 AM > Subject: Re: jackrabbit 2.0 binary search indexing > ________________________________ > > > On Thu, Feb 18, 2010 at 2:39 PM, Alexander Klimetschek <aklim...@day.com> > wrote: >> On Thu, Feb 18, 2010 at 18:35, ChadDavis <chadmichaelda...@gmail.com> >> wrote: >>> I'm looking for information on how to enable binary search indexing. >>> I found documentation for pre-2.0 jackrabbit, and reference to the >>> fact that Tika is now used internally for the binary indexing. >>> However, I can't find any documentation of how to enable the binary >>> indexing . . .. >> >> It is enabled for all nt:file binaries, ie. the jcr:content/jcr:data >> property. The mimetype for text extraction is taken from the >> jcr:content/jcr:mimeType property. I don't know if you can enable it >> for other binary properties. >> > > Just to clarify, you are saying that the binary indexing, as long as > I'm using the JCR built-in node types for my binary file storage, e.g. > nt:file --> jcr:content <nt:resource> -->jcr:data ( binary property > with my file ), occurs automatically? > > If so, then something's not working for me. Can you recommend some > troubleshooting tips? How can I determine whether the binaries are > being indexed? Note, I'm doing a full text search and it DOES hit > other node properties, etc. > > > > -- > This message contains privileged and confidential information only > for use by the intended recipient. If you are not the intended > recipient of this message, you must not disseminate, copy or use > it in any manner. If you have received this message in error, > please advise the sender by reply e-mail. Please ensure all > e-mail attachments are scanned for viruses prior to opening or > using. > >