Hey Daniel, You can find more output log in logs/Hadoop files
Remi On Wednesday, February 22, 2012, Daniel Bourrion < [email protected]> wrote: > Hi. > I'm a french librarian (that explains the bad english coming now... :) ) > > Newbie on Nutch, that looks exactly what i'm searching for (an opensource solution that should crawl our specific domaine and have it's crawl results pushed into Solr). > > I've install a test nutch using http://wiki.apache.org/nutch/NutchTutorial > > > Got an error but I don't really know it nor understand where to try to correct what causes that. > > Here's a copy of the error messages - any help welcome. > Best > > -------------------------------------------------- > daniel@daniel-linux:~/Bureau/apache-nutch-1.4-bin/runtime/local$ bin/nutch crawl urls -dir crawl -depth 3 -topN 5 > solrUrl is not set, indexing will be skipped... > crawl started in: crawl > rootUrlDir = urls > threads = 10 > depth = 3 > solrUrl=null > topN = 5 > Injector: starting at 2012-02-22 16:06:04 > Injector: crawlDb: crawl/crawldb > Injector: urlDir: urls > Injector: Converting injected urls to crawl db entries. > Injector: Merging injected urls into crawl db. > Injector: finished at 2012-02-22 16:06:06, elapsed: 00:00:02 > Generator: starting at 2012-02-22 16:06:06 > Generator: Selecting best-scoring urls due for fetch. > Generator: filtering: true > Generator: normalizing: true > Generator: topN: 5 > Generator: jobtracker is 'local', generating exactly one partition. > Generator: Partitioning selected urls for politeness. > Generator: segment: crawl/segments/20120222160609 > Generator: finished at 2012-02-22 16:06:10, elapsed: 00:00:03 > Fetcher: Your 'http.agent.name' value should be listed first in 'http.robots.agents' property. > Fetcher: starting at 2012-02-22 16:06:10 > Fetcher: segment: crawl/segments/20120222160609 > Using queue mode : byHost > Fetcher: threads: 10 > Fetcher: time-out divisor: 2 > QueueFeeder finished: total 2 records + hit by time limit :0 > Using queue mode : byHost > Using queue mode : byHost > Using queue mode : byHost > fetching http://bu.univ-angers.fr/ > Using queue mode : byHost > Using queue mode : byHost > fetching http://www.face-ecran.fr/ > -finishing thread FetcherThread, activeThreads=2 > -finishing thread FetcherThread, activeThreads=2 > Using queue mode : byHost > -finishing thread FetcherThread, activeThreads=2 > Using queue mode : byHost > -finishing thread FetcherThread, activeThreads=2 > Using queue mode : byHost > -finishing thread FetcherThread, activeThreads=2 > -finishing thread FetcherThread, activeThreads=2 > Using queue mode : byHost > Using queue mode : byHost > -finishing thread FetcherThread, activeThreads=2 > -finishing thread FetcherThread, activeThreads=2 > Fetcher: throughput threshold: -1 > Fetcher: throughput threshold retries: 5 > -activeThreads=2, spinWaiting=0, fetchQueues.totalSize=0 > -finishing thread FetcherThread, activeThreads=1 > -finishing thread FetcherThread, activeThreads=0 > -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0 > -activeThreads=0 > Fetcher: finished at 2012-02-22 16:06:13, elapsed: 00:00:03 > ParseSegment: starting at 2012-02-22 16:06:13 > ParseSegment: segment: crawl/segments/20120222160609 > Parsing: http://bu.univ-angers.fr/ > Parsing: http://www.face-ecran.fr/ > Exception in thread "main" java.io.IOException: Job failed! > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252) > at org.apache.nutch.parse.ParseSegment.parse(ParseSegment.java:157) > at org.apache.nutch.crawl.Crawl.run(Crawl.java:138) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.nutch.crawl.Crawl.main(Crawl.java:55) > > > -------------------------------------------------- > > -- > Avec mes salutations les plus cordiales. > __ > > Daniel Bourrion, conservateur des bibliothèques > Responsable de la bibliothèque numérique > Ligne directe : 02.44.68.80.50 > SCD Université d'Angers - http://bu.univ-angers.fr > Bu Saint Serge - 57 Quai Félix Faure - 49100 Angers cedex > > *********************************** > " Et par le pouvoir d'un mot > Je recommence ma vie " > Paul Eluard > *********************************** > blog perso : http://archives.face-ecran.fr/ > >

