On 16 Dec 2009, at 10:25, Jukka Zitting wrote:

> Hi,
> 
> On Tue, Dec 15, 2009 at 6:11 PM, Ian Boston <[email protected]> wrote:
>> Is there any other way of getting to the SearchIndex, so that I can get?
>> to the Lucene Document and the TermVector (other than AspectJ or cglib)
> 
> Instead of reaching down to the underlying Lucene index, I would
> recommend reading the original document data stored in the JCR node
> and passing it through the Jackrabbit text extractors and the
> configured Lucene Analyzer to get the terms stored in the index.


That can be quite expensive, especially for poor quality PDF,s, and some docx 
word docs.
I am expecting to want to do this for between 25 and 100 nodes at a time 
aggregating the results.

Ian

> 
> BR,
> 
> Jukka Zitting

Reply via email to