Hi Manikandan,

Did you check your datastore after injecterjob? Has it some rows ? Normally
Gora does not support Hadoop 2.x. You should change gora's dependecies. I
will send my patch to Gora-144

Talat
27 May 2014 06:19 tarihinde "Manikandan Saravanan" <
[email protected]> yazdı:

> Hi,
>
> I’m running Nutch 2 on a 2-node Hadoop cluster. I’m also running Solr 4 on
> a separate machine accessible by private IP. I run the crawl command by
> doing the following.
>
> bin/crawl urls/seed.txt TestCrawl <solrUrl> 2
>
> My problem is that no URLs are fetched. And thus, nothing is indexed. When
> I run stats, this is what I get
>
> {db_stats-job_201405261214_0043=
>         {
>                 jobID=job_201405261214_0043,
>                 jobName=db_stats,
>                 counters=
>                         {File Input Format Counters ={BYTES_READ=0},
>                         Job Counters ={TOTAL_LAUNCHED_REDUCES=1,
> SLOTS_MILLIS_MAPS=7990, FALLOW_SLOTS_MILLIS_REDUCES=0,
> FALLOW_SLOTS_MILLIS_MAPS=0, TOTAL_LAUNCHED_MAPS=1,
> SLOTS_MILLIS_REDUCES=9980},
>                         Map-Reduce
> Framework={MAP_OUTPUT_MATERIALIZED_BYTES=6, MAP_INPUT_RECORDS=0,
> REDUCE_SHUFFLE_BYTES=6, SPILLED_RECORDS=0, MAP_OUTPUT_BYTES=0,
> COMMITTED_HEAP_BYTES=218103808, CPU_MILLISECONDS=1950,
> SPLIT_RAW_BYTES=1017, COMBINE_INPUT_RECORDS=0, REDUCE_INPUT_RECORDS=0,
> REDUCE_INPUT_GROUPS=0, COMBINE_OUTPUT_RECORDS=0,
> PHYSICAL_MEMORY_BYTES=296411136, REDUCE_OUTPUT_RECORDS=0,
> VIRTUAL_MEMORY_BYTES=2251104256, MAP_OUTPUT_RECORDS=0},
> FileSystemCounters={FILE_BYTES_READ=6, HDFS_BYTES_READ=1017,
> FILE_BYTES_WRITTEN=156962, HDFS_BYTES_WRITTEN=86}, File Output Format
> Counters ={BYTES_WRITTEN=86}}}}
> 14/05/26 23:12:34 INFO crawl.WebTableReader: TOTAL urls:        0
> 14/05/26 23:12:34 INFO crawl.WebTableReader: WebTable statistics: done
>
> What am I missing? My regex and normalise filters are allowing all URL
> patterns. I’m trying to do a whole web crawl.
>
> --
> Manikandan Saravanan
> Architect - Technology
> TheSocialPeople

Reply via email to