Searching inside binary contents abd other queries

Sergio Tue, 10 Jun 2008 04:50:36 -0700

Hi,

I am new to JCR technology and I have a couple of questions I would like to
ask.
I am working on a project where I need to store some document files (PDFs,
DOCs, XMLs and text files). The complete datamodel will be stored in a
DBRMS (Oracle). However, there's the need for searching inside those
documents efficiently. That's where I think Jackrabbit will come to the
rescue.


I have been reading the docs and wikis in Jackrabbit's site for about a week
now. I understand some of the basics, but I feel lost most of the time. For
instance:

1) As our database will be holding most of the data, I thought about the
following schema: storing the documents inside BLOBs in the database (in
case we need to access them using some other criteria) AND in Jackrabbit's
repository. While storing those documents using Jackrabbit, I plan to keep
the RDBMS' pointers (probably the document's record primary key) using
properties. The question is: does this make sense? Is it a common practice?
And if not, what is the standard approach?

2) Do I need to define node types for representing my documents? If not, is
there some standard type I can use?

3) I have read that Jackrabbit is able to read inside some document types,
how do you accomplish that? Using TextExtractors? How? Could you point me
to some examples? I failed to find any. Does it depend on the way I store
those documents? If so, how do you do it?

I know that's a lot of questions. If someone could point me to the right
direction (maybe pointing me to some code sample, it would be very
thankful.

Best regards.

-- 
Sergio Tridente

Searching inside binary contents abd other queries

Reply via email to