1. I am currently NOT using NFDS or map reduce because the number
of sites I
am looking to fetch and index is relatively small (currently less
than 1
million). Accordingly, I am using the 0.7.1 version of nutch
available from
the Nutch website. Does this seem like the correct choice?
The next version will be map reduce based in any case. So 0.7 is
already the 'old' one and people will not continue to develop it (may
just some maintenance releases).
Map reduce doesn't mean that you need more than one computer or need
the ndfs. It is just a technology to process large data sets.
3. It is my understanding that Nutch 0.7.1 (no NFDS or mapred) only
has a
webdb and not the linkdb/crawldb structure.
Right.
If that is correct, then if I'm
trying to add two merged segments (and the index of each) to my "live"
folder, do I also need the webdb of each (and if so, do I need to
merge
them)?
No! 0.7 does not need the webdb for searching - i think so, but it is
some time ago that I was using 0.7.
Just merge the segments, that's it.
Stefan
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general