Hi, please find my replies below: 1) Which version of Nutch are you using? Are you using the 2.X source code from here [0] e.g. 2.4-SNAPSHOT?
=> I am using apache-nutch-2.3-src.tar.gz downloaded from http://apache.cs.utah.edu/nutch/2.3/ <http://apache.cs.utah.edu/nutch/2.3/> =========================================================== 2) Which version of Cassandra are you using? The recommended version of this Nutch codebase is currently 2.0.2 and Gora 0.5 dependencies. => I am using dsc-cassandra-2.1.2 to load data. Shall I use 2.0.2 version ? =========================================================== 3) The way you are invoking the crawl script is pretty strange. Please read the input parameters => I am running the crawl command from directory : ~/Documents/Softwares/apache-nutch-2.3/runtime/local I tried below options to run the command: 1] bin/nutch crawl urls/ 10 Output : Command crawl is deprecated, please use bin/crawl instead 2] Then I tried bin/nutch bin/crawl urls/ 10 Output : Error: Could not find or load main class bin.crawl 3] So I tried running the command -> bin/crawl urls/ crawlDir/ http://localhost:8983/solr/ 10 Output : It was running fine and data loaded in Cassandra, but was not crawling beyond initial seed I provided in seed.txt =========================================================== 4) the solr_url parameters is optional. Meaning that if you enter it, and it is incorrect, then it will undoubtedly throw an error/exception. => I think earlier I did not paste correct link in first message. I gave solr_url as -> http://localhost:8983/solr/ I hope this url is correct. Please correct me if I am wrong. =========================================================== 5) Please provide a paste of your logs for the crawl task somewhere once you've addressed the above. => Will provide you the logs in some time. =========================================================== Please let me know if I am doing something wrong. Do you want me to send you the nutch-site.xml ? I think there are certain parameters which are causing nutch not to crawl beyond initial seed. -- View this message in context: http://lucene.472066.n3.nabble.com/Nutch-2-with-Cassandra-as-a-storage-is-not-crawling-data-properly-tp4188115p4188632.html Sent from the Nutch - User mailing list archive at Nabble.com.

