Hi Sumant, Please see my replies below
On Mon, Feb 23, 2015 at 10:11 PM, <[email protected]> wrote: > > I am using Nutch 2.x using Cassandra as storage. Currently I am just > crawling only one website, and data is getting loaded to Cassandra in byte > code format. When I use readdb command in Nutch, I did get any useful > crawling data. > > Below are the details of different files and output I am getting: > > *========== command to run crawler =====================* > > bin/crawl urls/ crawlDir/ solr_link 3 > > 1) Which version of Nutch are you using? Are you using the 2.X source code from here [0] e.g. 2.4-SNAPSHOT? 2) Which version of Cassandra are you using? The recommended version of this Nutch codebase is currently 2.0.2 and Gora 0.5 dependencies. 3) The way you are invoking the crawl script is pretty strange. Please read the input parameters https://github.com/apache/nutch/blob/2.x/src/bin/crawl#L33 4) the solr_url parameters is optional. Meaning that if you enter it, and it is incorrect, then it will undoubtedly throw an error/exception. 5) Please provide a paste of your logs for the crawl task somewhere once you've addressed the above. Thank you Lewis [0] http://svn.apache.org/repos/asf/nutch/branches/2.x/

