Re: Injector works. But generator and fetcher don't work.

2014-06-07 Thread Manikandan Saravanan
Hey, I finally solved it! It was to do with my Cassandra cluster. My hadoop and cassandra clusters were in two different datacenters. This caused cassandra requests to timeout. And that meant the generate phase didn’t have any input! Works like a charm now :) Regards --  Manikandan Saravanan

Injector works. But generator and fetcher don't work.

2014-06-05 Thread Manikandan Saravanan
}, FileSystemCounters={FILE_BYTES_READ=6, HDFS_BYTES_READ=1135, FILE_BYTES_WRITTEN=157112, HDFS_BYTES_WRITTEN=86}, File Output Format Counters ={BYTES_WRITTEN=86 14/06/05 15:14:19 INFO crawl.WebTableReader: TOTAL urls:0 --  Manikandan Saravanan Architect - Technology TheSocialPeople

Re: Injector works. But generator and fetcher don't work.

2014-06-05 Thread Manikandan Saravanan
I built it from Nutch 2.2.1 (src-tar.gz). --  Manikandan Saravanan Architect - Technology TheSocialPeople On 6 June 2014 at 1:03:18 am, Lewis John Mcgibbney (lewis.mcgibb...@gmail.com) wrote: which version of Nutch are you using? Nutch 2 what? On Thu, Jun 5, 2014 at 12:14 PM, Manikandan

Re: Injector works. But generator and fetcher don't work.

2014-06-05 Thread Manikandan Saravanan
!-- removes duplicate slashes -- regex   pattern(?lt;!:)/{2,}/pattern   substitution//substitution /regex /regex-normalize --  Manikandan Saravanan Architect - Technology TheSocialPeople On 6 June 2014 at 1:54:02 am, Lewis John Mcgibbney (lewis.mcgibb...@gmail.com) wrote: I suspect that your generator

Nutch not generating any URLs

2014-05-28 Thread Manikandan Saravanan
) snapshot=1992785920 14/05/28 07:19:33 INFO mapred.JobClient:     Map output records=0 14/05/28 07:19:33 INFO mapred.JobClient:     SPLIT_RAW_BYTES=877 14/05/28 07:19:33 INFO solr.SolrIndexerJob: SolrIndexerJob: done.  Am I missing anything? --  Manikandan Saravanan Architect - Technology

Solr Deduplicate - Class Not Found Exception

2014-05-26 Thread Manikandan Saravanan
Hi, I’m running Nutch 2 on a Hadoop 1.2.1 cluster with 2 nodes. I’m running Solr 4 separately on a box and I replaced Solr’s schema with Nutch’s Solr-4 schema. When I run a crawl, I get the following error at the end of the job 14/05/26 14:08:32 INFO solr.SolrDeleteDuplicates:

Total fetched URLs is 0.

2014-05-26 Thread Manikandan Saravanan
statistics: done What am I missing? My regex and normalise filters are allowing all URL patterns. I’m trying to do a whole web crawl. --  Manikandan Saravanan Architect - Technology TheSocialPeople

Re: Nutch - Hadoop Help

2014-02-05 Thread Manikandan Saravanan
I’m using the crawl script you had given before [0]. What might be wrong? [0] https://svn.apache.org/repos/asf/nutch/branches/2.x/src/bin/crawl --  Manikandan Saravanan Architect - Technology TheSocialPeople On 5 February 2014 at 3:21:25 pm, Lewis John Mcgibbney (lewis.mcgibb...@gmail.com

Re: Nutch - Hadoop Help

2014-02-04 Thread Manikandan Saravanan
? --  Manikandan Saravanan Architect - Technology TheSocialPeople On 4 February 2014 at 3:11:36 pm, Lewis John Mcgibbney (lewis.mcgibb...@gmail.com) wrote: https://wiki.apache.org/nutch/NutchTutorial#A3.3._Using_the_crawl_script On Tue, Feb 4, 2014 at 7:04 AM, Manikandan Saravanan manikan

Re: Nutch - Hadoop Help

2014-02-04 Thread Manikandan Saravanan
I’m using the crawl script that you had linked earlier. --  Manikandan Saravanan Architect - Technology TheSocialPeople On 4 February 2014 at 7:43:49 pm, Manikandan Saravanan (manikan...@thesocialpeople.net) wrote: Okay, the crawl runs well for the most part: I’m running the crawl script

Re: Nutch - Hadoop Help

2014-02-04 Thread Manikandan Saravanan
Can you help me out? I think there’s something wrong with what we’re passing to bin/nutch updatedb in the crawl script. --  Manikandan Saravanan Architect - Technology TheSocialPeople On 4 February 2014 at 8:00:24 pm, Manikandan Saravanan (manikan...@thesocialpeople.net) wrote: I’m using

Re: Nutch - Hadoop Help

2014-02-04 Thread Manikandan Saravanan
I’m getting this when running the crawl script right after the parse phase Exception in thread main java.lang.IllegalArgumentException: usage: (-crawlId id)  Something wrong with updatedb? --  Manikandan Saravanan Architect - Technology TheSocialPeople On 5 February 2014 at 1:20:31 am

Nutch - Hadoop Help

2014-02-03 Thread Manikandan Saravanan
] https://svn.apache.org/repos/asf/nutch/branches/2.x/src/bin/crawl --  Manikandan Saravanan Architect - Technology TheSocialPeople

Re: Nutch - Hadoop Help

2014-02-03 Thread Manikandan Saravanan
How do I run the crawl script on hadoop? --  Manikandan Saravanan Architect - Technology TheSocialPeople On 4 February 2014 at 1:28:39 am, Lewis John Mcgibbney (lewis.mcgibb...@gmail.com) wrote: Hi Manikandan, On Mon, Feb 3, 2014 at 3:45 PM, user-digest-h...@nutch.apache.org wrote

Using Gora SNAPSHOT with Nutch

2014-01-07 Thread Manikandan Saravanan
snapshot bundled but I’m not getting other dependencies like thrift etc. Kindly help me with a permanent solution to this problem. --  Manikandan Saravanan Architect - Technology TheSocialPeople