You've not mentioned which version of Nutch you are using. Please use debug logging or look more deeply into your nutch logs for answers, you may even be able to solve the problem before you post to the list.
We also have list archives available through the site. The overwhelming likelihood is that your question has been asked and probably answered over on the archives. http://www.mail-archive.com/user%40nutch.apache.org/ On Wed, May 9, 2012 at 2:02 PM, Tolga <[email protected]> wrote: > Hi again Lewis, > > I finally managed to index my site with your help. But when it was nearing > the end, I got this error: > > Exception in thread "main" java.io.IOException: Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252) > at > org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:373) > at > org.apache.nutch.indexer.solr.SolrDeleteDuplicates.dedup(SolrDeleteDuplicates.java:353) > at org.apache.nutch.crawl.Crawl.run(Crawl.java:153) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.nutch.crawl.Crawl.main(Crawl.java:55) > > What should I do about it? > > Regards, > > On 5/9/12 3:45 PM, Lewis John Mcgibbney wrote: >> >> add an agent name to the http.agent.name property in nutch-site.xml >> >> If this is the only warning that you receive then it should solve it. >> >> hth >> >> Lewis >> >> On Wed, May 9, 2012 at 12:35 PM, Tolga<[email protected]> wrote: >>> >>> I've read that and done accordingly, I still get that error. >>> >>> On 5/9/12 2:31 PM, Lewis John Mcgibbney wrote: >>>> >>>> good to hear. >>>> >>>> please see the tutorial for all required configuration >>>> >>>> http://wiki.apache.org/nutch/NutchTutorial >>>> >>>> On Wed, May 9, 2012 at 11:51 AM, Tolga<[email protected]> wrote: >>>>> >>>>> Dear Lewis, >>>>> >>>>> I've done as you said, and it's beginning to work. Except that it's >>>>> complaining about http.agent.name not having been fed. The tut I have >>>>> read >>>>> states I don't need to fill it, but apparently I do. What should this >>>>> be? >>>>> >>>>> On 5/9/12 1:25 PM, Lewis John Mcgibbney wrote: >>>>>> >>>>>> Hi Tolga, >>>>>> >>>>>> If you were to use Nutch in local mode, you could navigate to >>>>>> nutch/runtime/local and set this environment variable to NUTCH_HOME. >>>>>> If you are then to use either individual commands via bin/nutch or >>>>>> alternatively the crawl command within the same script, you would not >>>>>> need to worry about your class path. >>>>>> >>>>>> Does this make sense? >>>>>> >>>>>> On Wed, May 9, 2012 at 8:03 AM, Tolga<[email protected]> wrote: >>>>>>> >>>>>>> Sorry, there are actually .jar files under the directory, but I still >>>>>>> can't >>>>>>> figure out what path to export to CLASSPATH >>>>>>> >>>>>>> >>>>>>> -------- Original Message -------- >>>>>>> Subject: CLASSPATH >>>>>>> Date: Wed, 09 May 2012 10:00:53 +0300 >>>>>>> From: Tolga<[email protected]> >>>>>>> To: [email protected] >>>>>>> >>>>>>> >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> This is my very first post to the list. In fact, I heard of nutch >>>>>>> only >>>>>>> yesterday. >>>>>>> >>>>>>> Anyway, I'm trying to figure out what path to export CLASSPATH to. >>>>>>> Tutorials tell me it needs to be where my .jar files are. However, >>>>>>> there >>>>>>> are no .jar files under apache-nutch directory. So, please help me >>>>>>> figure this out. >>>>>>> >>>>>>> Regards, >>>>>>> >>>> >> >> > -- Lewis

