Re: Accessing raw index data

Shawn Heisey Fri, 11 Jan 2013 14:02:33 -0800

On 1/11/2013 1:33 PM, Achim Domma wrote:

"At the base, Solr indexes are Lucene indexes, so one can always
  drop down to that level."


That's what I'm looking for. I understand, that at the end, there has to be an inverse index (or rather 
multiple of them), holding all "words" which occurre in my documents, each "word" having 
a list of documents the "word" was part of. I would like to do some statistics based on this 
information, would like to analyze how it changes if I change my text processing settings, ...

If you would give me a starting point like "Data is stored in Lucene indexes, which 
are documented at XXX. In a request handler you can access the indexes via YYY.", I 
would be perfectly happy figuring out the rest on my own. Documentation about 4.0 is a 
bit limited, so it's hard to find an entry point.

There is the TermsComponent, which can be utilized in a termsrequestHandler. The example solrconfig.xml found in all downloadedcopies of Solr has a /terms request handler.


http://wiki.apache.org/solr/TermsComponent

As you've already been told, there is a tool called Luke, but a versionthat works with Solr 4.0.0 is hard to find. The official downloadlocation only has a 4.0.0-ALPHA version, and there have been reportedproblems using it with indexes from the final Solr 4.0.0.


Thanks,
Shawn

Re: Accessing raw index data

Reply via email to