can nutch/lucene handle getting content from 100s-1000s of pages
sinultaneously....

If you mean "Can Nutch handle fetching 1000s of pages at a time", the answer is yes.

If you mean "Can Lucene, when used as the IR engine for Nutch, handle searching 1000s of pages at a time", then answer is also yes.

if it can, how does it write the content to the resulting/output db.

does it actually perform 100s-1000s of simultaneous connections to a backend
db.

does it utilize writing the output files to a filesystem, which is then
somehow inserted into a db....

The fetched pages get written to a sequential file. After a fetch cycle, additional data about the page state gets processed, and the results are used to update the crawldb, which is a kind of specialized database for web crawling.

-- Ken
--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"Find Code, Find Answers"

Reply via email to