Re: Nutch 1.1 Architecture

2010-08-16 Thread CatOs Mandros
Hi Israel,

I hope some of the documents attached in the wiki can resolve whatever
doubts you can have:

http://wiki.apache.org/nutch/Presentations

2010/8/16 Israel wego...@gmail.com:
 Hello, alguien que tenga un documento que explique la arquitectura de nutch
 si es de la versiĆ³n 1.1 mucho mejor...thnaks



Nutch w Eclipse

2010-08-16 Thread Jay
Hi,

I have been trying all day to get Nutch going in Eclipse

I am now getting this error:

crawl started in: crawl
rootUrlDir = urls
threads = 10
depth = 3
indexer=lucene
topN = 50
Injector: starting at 2010-08-16 21:38:36
Injector: crawlDb: crawl/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Exception in thread main java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
at org.apache.nutch.crawl.Injector.inject(Injector.java:217)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:124)

This error happens when I follow this guide:
http://wiki.apache.org/nutch/RunNutchInEclipse1.0
but with 1.2 from the TAG (SVN)

I have tried version: from Trunk, 1.1 from site gz, 1.2 from TAGS on
svn. All have their own errors.

I will try 1.0, but I am hoping to use the latest version of nutch.

Regards,
J


Re: Nutch w Eclipse

2010-08-16 Thread Jay
After doing all the steps again, I am now getting this.

Nutch 1.2

Getting closer! (I think)

crawl started in: crawl
rootUrlDir = urls
threads = 10
depth = 3
indexer=lucene
topN = 50
Injector: starting at 2010-08-16 23:19:52
Injector: crawlDb: crawl/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
*Skipping http://lucene.apache.org/nutch/:java.lang.NullPointerException*
Injector: Merging injected urls into crawl db.
Injector: finished at 2010-08-16 23:19:55, elapsed: 00:00:02
Generator: starting at 2010-08-16 23:19:55
Generator: Selecting best-scoring urls due for fetch.
Generator: filtering: true
Generator: normalizing: true
Generator: topN: 50
Generator: jobtracker is 'local', generating exactly one partition.
Generator: 0 records selected for fetching, exiting ...
Stopping at depth=0 - no more URLs to fetch.
No URLs to fetch - check your seed list and URL filters.
crawl finished: crawl


I will continue to investigate, but would really appreciate some help ;)

J


Re: Nutch w Eclipse

2010-08-16 Thread Hannes Carl Meyer
Hi J,

you should check logs/hadoop.log for further error messages!

Bests

Hannes

On Mon, Aug 16, 2010 at 6:37 PM, Jay sa...@blastsms.com wrote:

 After doing all the steps again, I am now getting this.

 Nutch 1.2

 Getting closer! (I think)

 crawl started in: crawl
 rootUrlDir = urls
 threads = 10
 depth = 3
 indexer=lucene
 topN = 50
 Injector: starting at 2010-08-16 23:19:52
 Injector: crawlDb: crawl/crawldb
 Injector: urlDir: urls
 Injector: Converting injected urls to crawl db entries.
 *Skipping http://lucene.apache.org/nutch/:java.lang.NullPointerException*
 Injector: Merging injected urls into crawl db.
 Injector: finished at 2010-08-16 23:19:55, elapsed: 00:00:02
 Generator: starting at 2010-08-16 23:19:55
 Generator: Selecting best-scoring urls due for fetch.
 Generator: filtering: true
 Generator: normalizing: true
 Generator: topN: 50
 Generator: jobtracker is 'local', generating exactly one partition.
 Generator: 0 records selected for fetching, exiting ...
 Stopping at depth=0 - no more URLs to fetch.
 No URLs to fetch - check your seed list and URL filters.
 crawl finished: crawl


 I will continue to investigate, but would really appreciate some help ;)

 J