Have you looked at Solr admin interface in details? Specifically, analysis
section under each core. It provides some of the statistics you seem to
want. And, gives you the source code to look at to understand how to create
your own version of that. Specifically, the "Luke" package is what you
might be looking for.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Fri, Jan 11, 2013 at 3:33 PM, Achim Domma <do...@procoders.net> wrote:

> "At the base, Solr indexes are Lucene indexes, so one can always
>  drop down to that level."
>
> That's what I'm looking for. I understand, that at the end, there has to
> be an inverse index (or rather multiple of them), holding all "words" which
> occurre in my documents, each "word" having a list of documents the "word"
> was part of. I would like to do some statistics based on this information,
> would like to analyze how it changes if I change my text processing
> settings, ...
>
> If you would give me a starting point like "Data is stored in Lucene
> indexes, which are documented at XXX. In a request handler you can access
> the indexes via YYY.", I would be perfectly happy figuring out the rest on
> my own. Documentation about 4.0 is a bit limited, so it's hard to find an
> entry point.
>
> cheers,
> Achim
>
> Am 11.01.2013 um 20:54 schrieb Gora Mohanty:
>
> > On 12 January 2013 01:06, Achim Domma <do...@procoders.net> wrote:
> >>
> >> Hi,
> >>
> >> I have just setup my first Solr 4.0 instance and have added about one
> >> million documents. I would like to access the raw data stored in the
> index.
> >> Can somebody give me a starting point how to do that?
> >>
> >> As a first step, a simple dump would be absolutely ok. I just want to
> play
> >> around and do some static offline analysis. In the long term, I probably
> >> would like to implement custom search components to enrich my search
> >> results. So if there's no export for raw data, I would be happy to
> learn how
> >> to implement custom handlers and/or search components. Some guidance
> where
> >> to start would be very appreciated.
> >
> > It is not clear what you mean by "raw data", and what level of
> > customisation you are after. Here are two possibilities:
> > * At the base, Solr indexes are Lucene indexes, so one can always
> >  drop down to that level.
> > * Also, Solr allows plugins for various components. This link might
> >  be of help, depending on the extent of customisation you are after:
> >  http://wiki.apache.org/solr/SolrPlugins
> >
> > Maybe you should approach this from the other end: If you could
> > describe what you are trying to achieve, people might be able to
> > offer possibilities.
> >
> > Regards,
> > Gora
>
>

Reply via email to