hi Carl,
see http://wiki.apache.org/nutch/nutch-0.8-dev/bin/nutch%20readdb
- Renaud
Carl Cerecke wrote:
Hi,
How do I get the page information from whichever segment it is in,
given a URL?
I'm basically looking for a class to use from the command-line which,
given an index and a url, returns me the information for that url from
whichever segment it is in. Similar to SegmentReader -get, but without
having to specify the segment.
This seems like it should be relatively simple to do, but it has
evaded me thus far...
Is the best approach to merge all the segments (hundreds of them) into
one big segment? Would this work? What would the performance be like
for this approach?
Cheers,
Carl.