Hi, I am new to JCR technology and I have a couple of questions I would like to ask. I am working on a project where I need to store some document files (PDFs, DOCs, XMLs and text files). The complete datamodel will be stored in a DBRMS (Oracle). However, there's the need for searching inside those documents efficiently. That's where I think Jackrabbit will come to the rescue.
I have been reading the docs and wikis in Jackrabbit's site for about a week now. I understand some of the basics, but I feel lost most of the time. For instance: 1) As our database will be holding most of the data, I thought about the following schema: storing the documents inside BLOBs in the database (in case we need to access them using some other criteria) AND in Jackrabbit's repository. While storing those documents using Jackrabbit, I plan to keep the RDBMS' pointers (probably the document's record primary key) using properties. The question is: does this make sense? Is it a common practice? And if not, what is the standard approach? 2) Do I need to define node types for representing my documents? If not, is there some standard type I can use? 3) I have read that Jackrabbit is able to read inside some document types, how do you accomplish that? Using TextExtractors? How? Could you point me to some examples? I failed to find any. Does it depend on the way I store those documents? If so, how do you do it? I know that's a lot of questions. If someone could point me to the right direction (maybe pointing me to some code sample, it would be very thankful. Best regards. -- Sergio Tridente
