Hugo Pinto wrote:
Hello,

I am using Nutch for mirroring, rather than crawling and indexing.
I need to access directly the cached data in my Nutch index, but I am
unable to find an easy way to do so.
I browsed the documentation(wiki, javadocs, and skimmed the code), but
found no straightforward way to do it.
Would anyone suggest a place to look for more information, or perhaps
have done this before and could share a few tips?

Most likely what you need is not the Lucene index, but the segments (shards), right? There's a utility called SegmentReader (available from cmd-line as readseg), and you can use its API to retrieve either all or individual records from a segment (using URL as key).


--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to