Hi sumant, I've pasted your Hadoop counters below. It would appear that for the ParseJob task, no record is being passed as the input to the MR framework. This is the issue. There is a problem between FetcherJob and ParserJob. Can you readdb between fetching and parsing? If you get out a record then you are good, if not then you can further DEBUG the issue. Please write back to us and let us know how you are getting on. Thanks Lewis
On Wed, Feb 25, 2015 at 3:06 PM, <[email protected]> wrote: > > user Digest 25 Feb 2015 23:06:30 -0000 Issue 2365 > > > Subject: Re: Nutch 2 with Cassandra as a storage is not crawling data > properly > Hi, > > Please find the logs pasted at below link: > > http://pastebin.com/JvFimRy0 > > 2015-02-24 14:47:43,462 INFO mapred.JobClient - Counters: 9 2015-02-24 14:47:43,462 INFO mapred.JobClient - File Output Format Counters 2015-02-24 14:47:43,462 INFO mapred.JobClient - Bytes Written=0 2015-02-24 14:47:43,462 INFO mapred.JobClient - File Input Format Counters 2015-02-24 14:47:43,462 INFO mapred.JobClient - Bytes Read=0 2015-02-24 14:47:43,462 INFO mapred.JobClient - FileSystemCounters 2015-02-24 14:47:43,462 INFO mapred.JobClient - FILE_BYTES_READ=608455 2015-02-24 14:47:43,462 INFO mapred.JobClient - FILE_BYTES_WRITTEN=695392 2015-02-24 14:47:43,462 INFO mapred.JobClient - Map-Reduce Framework 2015-02-24 14:47:43,462 INFO mapred.JobClient - Map input records=0 2015-02-24 14:47:43,462 INFO mapred.JobClient - Spilled Records=0 2015-02-24 14:47:43,462 INFO mapred.JobClient - Total committed heap usage (bytes)=257425408 2015-02-24 14:47:43,462 INFO mapred.JobClient - SPLIT_RAW_BYTES=862 2015-02-24 14:47:43,462 INFO mapred.JobClient - Map output records=0

