can nutch/lucene handle getting content from 100s-1000s of pages sinultaneously....
If you mean "Can Nutch handle fetching 1000s of pages at a time", the answer is yes.
If you mean "Can Lucene, when used as the IR engine for Nutch, handle searching 1000s of pages at a time", then answer is also yes.
if it can, how does it write the content to the resulting/output db. does it actually perform 100s-1000s of simultaneous connections to a backend db. does it utilize writing the output files to a filesystem, which is then somehow inserted into a db....
The fetched pages get written to a sequential file. After a fetch cycle, additional data about the page state gets processed, and the results are used to update the crawldb, which is a kind of specialized database for web crawling.
-- Ken -- Ken Krugler Krugle, Inc. +1 530-210-6378 "Find Code, Find Answers"
