Some Nutch Questions

Ian Reardon Wed, 04 May 2005 10:19:06 -0700

I would like to build an engine based on a hand full of hand picked
sites from a specific domain.   I had a few questions.


How many documents can I fit on a single server implementation (2 cpu
xeon)?  With space being irrelevant aprox. how many documents can I
have on a single node with respectable search performance?

My idea is to have a hand full of sites that I judge for quality and
index these on a regular basis maybe... once a month.  I would like to
add new sites over time.  Does this sound feasible with nutch?

What method would be best for this type of application? I setup nutch
and crawled a very small sample using method 1 in the tutorial
"Intranet crawl"  I was unable to get whole web crawl to work.  What
is that -dmozfile flag?  I don't want to base this off dmoz.  If
anyone could point me to some documentation or tutorial that better
explains whole web crawling I would appreciate it.  Thanks a lot.

Some Nutch Questions

Reply via email to