On 1/11/2013 1:33 PM, Achim Domma wrote:
"At the base, Solr indexes are Lucene indexes, so one can always
  drop down to that level."

That's what I'm looking for. I understand, that at the end, there has to be an inverse index (or rather 
multiple of them), holding all "words" which occurre in my documents, each "word" having 
a list of documents the "word" was part of. I would like to do some statistics based on this 
information, would like to analyze how it changes if I change my text processing settings, ...

If you would give me a starting point like "Data is stored in Lucene indexes, which 
are documented at XXX. In a request handler you can access the indexes via YYY.", I 
would be perfectly happy figuring out the rest on my own. Documentation about 4.0 is a 
bit limited, so it's hard to find an entry point.

There is the TermsComponent, which can be utilized in a terms requestHandler. The example solrconfig.xml found in all downloaded copies of Solr has a /terms request handler.

http://wiki.apache.org/solr/TermsComponent

As you've already been told, there is a tool called Luke, but a version that works with Solr 4.0.0 is hard to find. The official download location only has a 4.0.0-ALPHA version, and there have been reported problems using it with indexes from the final Solr 4.0.0.

Thanks,
Shawn

Reply via email to