On 16 Dec 2009, at 10:25, Jukka Zitting wrote: > Hi, > > On Tue, Dec 15, 2009 at 6:11 PM, Ian Boston <[email protected]> wrote: >> Is there any other way of getting to the SearchIndex, so that I can get? >> to the Lucene Document and the TermVector (other than AspectJ or cglib) > > Instead of reaching down to the underlying Lucene index, I would > recommend reading the original document data stored in the JCR node > and passing it through the Jackrabbit text extractors and the > configured Lucene Analyzer to get the terms stored in the index.
That can be quite expensive, especially for poor quality PDF,s, and some docx word docs. I am expecting to want to do this for between 25 and 100 nodes at a time aggregating the results. Ian > > BR, > > Jukka Zitting
