Re: Nutch 1.1 Architecture
Hi Israel, I hope some of the documents attached in the wiki can resolve whatever doubts you can have: http://wiki.apache.org/nutch/Presentations 2010/8/16 Israel wego...@gmail.com: Hello, alguien que tenga un documento que explique la arquitectura de nutch si es de la versiĆ³n 1.1 mucho mejor...thnaks
Nutch w Eclipse
Hi, I have been trying all day to get Nutch going in Eclipse I am now getting this error: crawl started in: crawl rootUrlDir = urls threads = 10 depth = 3 indexer=lucene topN = 50 Injector: starting at 2010-08-16 21:38:36 Injector: crawlDb: crawl/crawldb Injector: urlDir: urls Injector: Converting injected urls to crawl db entries. Exception in thread main java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252) at org.apache.nutch.crawl.Injector.inject(Injector.java:217) at org.apache.nutch.crawl.Crawl.main(Crawl.java:124) This error happens when I follow this guide: http://wiki.apache.org/nutch/RunNutchInEclipse1.0 but with 1.2 from the TAG (SVN) I have tried version: from Trunk, 1.1 from site gz, 1.2 from TAGS on svn. All have their own errors. I will try 1.0, but I am hoping to use the latest version of nutch. Regards, J
Re: Nutch w Eclipse
After doing all the steps again, I am now getting this. Nutch 1.2 Getting closer! (I think) crawl started in: crawl rootUrlDir = urls threads = 10 depth = 3 indexer=lucene topN = 50 Injector: starting at 2010-08-16 23:19:52 Injector: crawlDb: crawl/crawldb Injector: urlDir: urls Injector: Converting injected urls to crawl db entries. *Skipping http://lucene.apache.org/nutch/:java.lang.NullPointerException* Injector: Merging injected urls into crawl db. Injector: finished at 2010-08-16 23:19:55, elapsed: 00:00:02 Generator: starting at 2010-08-16 23:19:55 Generator: Selecting best-scoring urls due for fetch. Generator: filtering: true Generator: normalizing: true Generator: topN: 50 Generator: jobtracker is 'local', generating exactly one partition. Generator: 0 records selected for fetching, exiting ... Stopping at depth=0 - no more URLs to fetch. No URLs to fetch - check your seed list and URL filters. crawl finished: crawl I will continue to investigate, but would really appreciate some help ;) J
Re: Nutch w Eclipse
Hi J, you should check logs/hadoop.log for further error messages! Bests Hannes On Mon, Aug 16, 2010 at 6:37 PM, Jay sa...@blastsms.com wrote: After doing all the steps again, I am now getting this. Nutch 1.2 Getting closer! (I think) crawl started in: crawl rootUrlDir = urls threads = 10 depth = 3 indexer=lucene topN = 50 Injector: starting at 2010-08-16 23:19:52 Injector: crawlDb: crawl/crawldb Injector: urlDir: urls Injector: Converting injected urls to crawl db entries. *Skipping http://lucene.apache.org/nutch/:java.lang.NullPointerException* Injector: Merging injected urls into crawl db. Injector: finished at 2010-08-16 23:19:55, elapsed: 00:00:02 Generator: starting at 2010-08-16 23:19:55 Generator: Selecting best-scoring urls due for fetch. Generator: filtering: true Generator: normalizing: true Generator: topN: 50 Generator: jobtracker is 'local', generating exactly one partition. Generator: 0 records selected for fetching, exiting ... Stopping at depth=0 - no more URLs to fetch. No URLs to fetch - check your seed list and URL filters. crawl finished: crawl I will continue to investigate, but would really appreciate some help ;) J