Re: Nutch 2 with Cassandra as a storage is not crawling data properly

Lewis John Mcgibbney Wed, 25 Feb 2015 15:34:07 -0800

Hi sumant,

I've pasted your Hadoop counters below.
It would appear that for the ParseJob task, no record is being passed as
the input to the MR framework. This is the issue. There is a problem
between FetcherJob and ParserJob.
Can you readdb between fetching and parsing?
If you get out a record then you are good, if not then you can further
DEBUG the issue.
Please write back to us and let us know how you are getting on.
Thanks
Lewis



On Wed, Feb 25, 2015 at 3:06 PM, <[email protected]> wrote:

>
> user Digest 25 Feb 2015 23:06:30 -0000 Issue 2365
>
>
> Subject: Re: Nutch 2 with Cassandra as a storage is not crawling data
> properly
> Hi,
>
> Please find the logs pasted at below link:
>
> http://pastebin.com/JvFimRy0
>
>
2015-02-24 14:47:43,462 INFO  mapred.JobClient - Counters: 9
2015-02-24 14:47:43,462 INFO  mapred.JobClient -   File Output Format
Counters
2015-02-24 14:47:43,462 INFO  mapred.JobClient -     Bytes Written=0
2015-02-24 14:47:43,462 INFO  mapred.JobClient -   File Input Format
Counters
2015-02-24 14:47:43,462 INFO  mapred.JobClient -     Bytes Read=0
2015-02-24 14:47:43,462 INFO  mapred.JobClient -   FileSystemCounters
2015-02-24 14:47:43,462 INFO  mapred.JobClient -     FILE_BYTES_READ=608455
2015-02-24 14:47:43,462 INFO  mapred.JobClient -
FILE_BYTES_WRITTEN=695392
2015-02-24 14:47:43,462 INFO  mapred.JobClient -   Map-Reduce Framework
2015-02-24 14:47:43,462 INFO  mapred.JobClient -     Map input records=0
2015-02-24 14:47:43,462 INFO  mapred.JobClient -     Spilled Records=0
2015-02-24 14:47:43,462 INFO  mapred.JobClient -     Total committed heap
usage (bytes)=257425408
2015-02-24 14:47:43,462 INFO  mapred.JobClient -     SPLIT_RAW_BYTES=862
2015-02-24 14:47:43,462 INFO  mapred.JobClient -     Map output records=0

Re: Nutch 2 with Cassandra as a storage is not crawling data properly

Reply via email to