I have an archive of PDF documents, with metadata. Mostly I will find the documents by traversing the tree, but sometimes I will need to get them by their unique identifier (from the source system) which I use for the file's node name and as a property on the jcr_content subnode.
Here is my data structure: <folder>/<folder>/<nt:file name = 282675>/<ed:docreftype name=jcr:content><property name=jcr:data value=binary> <property name=ed:document_id value=282675> <..other properties..> about 100K records, no folder contains > 99 nodes select * from [nt:file] as doc where doc.name = '282675' takes 30 seconds finds nothing select * from [nt:file] as doc inner join [ed:docreftype] as content on ischildnode(content, doc) where content.[ed:document_id] = '282675' takes 50 seconds finds the right record select * from [nt:file] as doc inner join [ed:docreftype] as content on ischildnode(content, doc) where contains(content.[ed:document_id], '282675') and content.[ed:document_id] = '282675'> takes 2 seconds finds the correct record. So getting the Lucene index to do a first cut of the results helps the whole process. Prophecy: He who pulls the mighty sword ExQueryString from the stone JSR-283 shall be the rightful king of all Content.
-- This message contains privileged and confidential information only for use by the intended recipient. If you are not the intended recipient of this message, you must not disseminate, copy or use it in any manner. If you have received this message in error, please advise the sender by reply e-mail. Please ensure all e-mail attachments are scanned for viruses prior to opening or using.
