you can calculate these statistics from the segment data, e.g. parsed text. To read the nutch file format is easy possible using the Nutch Readers e.g. SequenceFile Reader.
Just take a look to the io package.

HTH
Stefan


Am 24.01.2006 um 08:18 schrieb Wong Ting Kiong:

hi all,

I'm now using nutch 0.7.1, and I wish to retrieve content from index file,
how can i retrieve? Information that i want to retrieve are
- list of words from each links
- occurance of words in each links
can i retrieve these information in raw data format?

thanks for your attention

Kiong

Reply via email to