you can calculate these statistics from the segment data, e.g. parsed
text.
To read the nutch file format is easy possible using the Nutch
Readers e.g. SequenceFile Reader.
Just take a look to the io package.
HTH
Stefan
Am 24.01.2006 um 08:18 schrieb Wong Ting Kiong:
hi all,
I'm now using nutch 0.7.1, and I wish to retrieve content from
index file,
how can i retrieve? Information that i want to retrieve are
- list of words from each links
- occurance of words in each links
can i retrieve these information in raw data format?
thanks for your attention
Kiong