1. I am currently NOT using NFDS or map reduce because the number of sites I am looking to fetch and index is relatively small (currently less than 1 million). Accordingly, I am using the 0.7.1 version of nutch available from
the Nutch website. Does this seem like the correct choice?

The next version will be map reduce based in any case. So 0.7 is already the 'old' one and people will not continue to develop it (may just some maintenance releases). Map reduce doesn't mean that you need more than one computer or need the ndfs. It is just a technology to process large data sets.

3. It is my understanding that Nutch 0.7.1 (no NFDS or mapred) only has a
webdb and not the linkdb/crawldb structure.
Right.
If that is correct, then if I'm
trying to add two merged segments (and the index of each) to my "live"
folder, do I also need the webdb of each (and if so, do I need to merge
them)?
No! 0.7 does not need the webdb for searching - i think so, but it is some time ago that I was using 0.7.
Just merge the segments, that's it.

Stefan



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to