Gal Nitzan schrieb:
1. fetch/parse - ndfs
2. decide how many segments (datasize?) you want on each searcher
machine
3. invert,index,dedup, merge selected indexes to some ndfs folder
4. copy the ndfs folder to searcher machine
sounds ok to me. when updating the linkdb you should also include previously fetched segments to get correct results. you should run a dedup on all generated indexes as well.
and follow this procedure after every fetch?
yep, and since there is no segment merger for mapred yet you have to decide yourself when you delete old segments.
in the searcher log it clearly says it opens the linkdb for some reason:
well, i haven't looked at the searcher code yet so I might be wrong and the linkdb is used for searching. maybe somebody can explain what data is used for searching.

best regards,
Dominik



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to