Hi
Does anyone have some stats around scalability of how many urls you crawled
and how long it took. Definitely these stats are environment based and the
site(s) crawled, but would be nice to see
 some stats here.

I used nutch with HBase and solr and have got a nice working enviroment and
so far have been able to crawl a limited set, rather very very limited set
of urls satisfactorily. Now that I have a proof of concept, I want to run
it full blown, but before I do that, I want to see if my setup can even
handle this. If not, I want to see how I can throttle my runs. So some
stats/test results would be nice to have.


Regards
Hemant

Reply via email to