Sergio wrote:
1) As our database will be holding most of the data, I thought about the
following schema: storing the documents inside BLOBs in the database (in
case we need to access them using some other criteria) AND in Jackrabbit's
repository. While storing those documents using Jackrabbit, I plan to keep
the RDBMS' pointers (probably the document's record primary key) using
properties. The question is: does this make sense? Is it a common practice?
And if not, what is the standard approach?

well, the recommended approach is to replace your RDBMS with Jackrabbit.

2) Do I need to define node types for representing my documents? If not, is
there some standard type I can use?

for files and folders there's nt:file and nt:folder. See: http://wiki.apache.org/jackrabbit/NodeTypeRegistry and of course the JSR 170 specification.

3) I have read that Jackrabbit is able to read inside some document types,
how do you accomplish that? Using TextExtractors?

correct. see: http://jackrabbit.apache.org/jackrabbit-text-extractors.html

How? Could you point me
to some examples? I failed to find any. Does it depend on the way I store
those documents? If so, how do you do it?

the text extractors only work with nt:resource nodes. this means your content structure would look like this:

+ my.pdf (nt:file)
  - jcr:created=20080101 (DATE)
  + jcr:content (nt:resource)
    - jcr:mimeType=application/pdf (STRING)
    - jcr:lastModified=20080101 (DATE)
    - jcr:date=<pdf-binary> (BINARY>

regards
 marcel

Reply via email to