I have an archive of PDF documents, with metadata.  Mostly I will find the 
documents by traversing the tree, but sometimes I will need to get them by 
their unique identifier (from the source system) which I use for the 
file's node name and as a property on the jcr_content subnode.

Here is my data structure:

<folder>/<folder>/<nt:file name = 282675>/<ed:docreftype 
name=jcr:content><property name=jcr:data value=binary>
 <property name=ed:document_id value=282675>
 <..other properties..>

about 100K records, no folder contains > 99 nodes

select * from [nt:file] as doc where doc.name = '282675'
takes 30 seconds finds nothing

select * from [nt:file] as doc inner join [ed:docreftype] as content on 
ischildnode(content, doc)
where content.[ed:document_id] = '282675'
takes 50 seconds finds the right record

select * from [nt:file] as doc inner join [ed:docreftype] as content on 
ischildnode(content, doc) 
where contains(content.[ed:document_id], '282675') and 
content.[ed:document_id] = '282675'>
takes 2 seconds finds the correct record.

So getting the Lucene index to do a first cut of the results helps the 
whole process.

Prophecy:
He who pulls the mighty sword ExQueryString from the stone JSR-283 shall 
be the rightful king of all Content.
--
This message contains privileged and confidential information only 
for use by the intended recipient.  If you are not the intended 
recipient of this message, you must not disseminate, copy or use 
it in any manner.  If you have received this message in error, 
please advise the sender by reply e-mail.  Please ensure all 
e-mail attachments are scanned for viruses prior to opening or 
using.

Reply via email to