Merge Question

Emmanuel JOKE Thu, 05 Jul 2007 07:56:57 -0700

I just had a look at the script to merge 2 differents crawl, and i'm
confused for some step.
It says:


...

$nutch_dir/nutch mergelinkdb $linkdb_dir $crawl_1/linkdb $crawl_2/linkdb
==> So far its ok it merged both linkdb in a new linkdb

$nutch_dir/nutch mergedb $webdb_dir $crawl_1/crawldb $crawl_2/crawldb
==> So far its still ok it merged both crawldb in a new crawldb

$nutch_dir/nutch mergesegs $segments_dir $segments_1 $segments_2
==> still ok it merged all segments from both crawl in a new segment

$nutch_dir/nutch invertlinks $linkdb_dir -dir $segments_dir
==> It start to be confusing, why do we have to use invertlinks as we
just merge the linkdb above in the first step ??

$nutch_dir/nutch index $new_indexes $webdb_dir $linkdb_dir $segment
==> So I guess we recreate a new index based on the single segment merged

$nutch_dir/nutch dedup $new_indexes
==> Still ok... it eliminates all dupliacte

$nutch_dir/nutch merge $index_dir $new_indexes
==> Its again confusing, we just create a new index above based on all
segments merged,
so why do we have to merge this index ???

Could you please help me to understand ?

Thanks

Merge Question

Reply via email to