Hello, I've setup a small sample hadoop cluster of 6 servers, hdfs, zookeeper, solr and accumulo.
I am running nutch on top of the hadoop cluster and injecting 10,000 URLs in the seed.txt file. Everything works as it should, nothing breaks, everything indexes, etc., and the crawl jobs finishes OK. However, the inject stage of those 10,000 URLs takes up to 50 minutes. I wonder if that is a normal time for an inject or if I should be looking at a possible problem (maybe the gora accumulo module?) or if I am simply being naive and my seed.txt should not be so large to begin with. A bit more information about my setup: Hadoop 2.7.2 Accumulo 1.5.1 Solr 4.10.3 Currently accumulo has about 500 tables with some 200 Million entries (not sure if that affects), Accumulo logs show no major errors or warnings or java exceptions either, neither do the mapreduce logs in hadoop. Thank you very much for your help and your excellent crawler. -- Luis Magaña www.euphorica.com

