Fwd: Merge Question

Emmanuel Mon, 16 Jul 2007 07:30:09 -0700

Hi Guys

I  would really appreciate if you can help me to get answer to my question
below ?


Beside, I understand that we have the following working directory:
crawl/crawldb
crawl/linkdb
crawl/segments
crawl/index
crawl/indexes

Do we really need crawl/index ? it was used to merge per-segment index but
as far as i understand it's not the case anymore, isn't it ?


---------- Forwarded message ----------
From: Emmanuel JOKE <[EMAIL PROTECTED]>
Date: 5 juil. 2007 22:56
Subject: Merge Question
To: nutch-user <[email protected]>

I just had a look at the script to merge 2 differents crawl, and i'm
confused for some step.
It says:

...

$nutch_dir/nutch mergelinkdb $linkdb_dir $crawl_1/linkdb $crawl_2/linkdb
==> So far its ok it merged both linkdb in a new linkdb


$nutch_dir/nutch mergedb $webdb_dir $crawl_1/crawldb $crawl_2/crawldb
==> So far its still ok it merged both crawldb in a new crawldb

$nutch_dir/nutch mergesegs $segments_dir $segments_1 $segments_2

==> still ok it merged all segments from both crawl in a new segment

$nutch_dir/nutch invertlinks $linkdb_dir -dir $segments_dir
==> It start to be confusing, why do we have to use invertlinks as we
just merge the linkdb above in the first step ??


$nutch_dir/nutch index $new_indexes $webdb_dir $linkdb_dir $segment
==> So I guess we recreate a new index based on the single segment merged

$nutch_dir/nutch dedup $new_indexes

==> Still ok... it eliminates all dupliacte

$nutch_dir/nutch merge $index_dir $new_indexes
==> Its again confusing, we just create a new index above based on all
segments merged,
so why do we have to merge this index ???


Could you please help me to understand ?

Thanks

Fwd: Merge Question

Reply via email to