Simple solution would be done the segments using following command and just
write a script which can extract the Outlinks present in the documents of
the segment.
$NUTCH_home/bin/nutch readseg -dump -dir segDirsPath -nocontent -nofetch
-nogenerate -noparse -noparsetext
this will give you a dump
Thank you for the hint. How can this be done with the Segment Reader (Nutch
0.9 api)? Thanks in advance.
Cheers,
MyD
vishal vachhani wrote:
Simple solution would be done the segments using following command and
just
write a script which can extract the Outlinks present in the documents
hi
try this command
bin/nutch readseg segment_dir output
(i.e bin/nutch readseg ./crawldir/segments/* output.log
Regards
sanjshra
MyD wrote:
Thank you for the hint. How can this be done with the Segment Reader
(Nutch 0.9 api)? Thanks in advance.
Cheers,
MyD
vishal vachhani
Hi
yes, you can index using index command try following commands
bin/nutch invertlinks crawl/linkdb crawl/segments/*
then
bin/nutch index crawl/indexes crawl/crawldb crawl/linkdb crawl/segments/*
Regards
sanjshra :working:
陈琛 wrote:
thanks very much. i am testing
by the way ,