Hey Daniel,

You can find more output log in logs/Hadoop files

Remi

On Wednesday, February 22, 2012, Daniel Bourrion <
[email protected]> wrote:
> Hi.
> I'm a french librarian (that explains the bad english coming now... :) )
>
> Newbie on Nutch, that looks exactly what i'm searching for (an opensource
solution that should crawl our specific domaine and have it's crawl results
pushed into Solr).
>
> I've install a test nutch using http://wiki.apache.org/nutch/NutchTutorial
>
>
> Got an error but I don't really know it nor understand where to try to
correct what causes that.
>
> Here's a copy of the error messages - any help welcome.
> Best
>
> --------------------------------------------------
> daniel@daniel-linux:~/Bureau/apache-nutch-1.4-bin/runtime/local$
bin/nutch crawl urls -dir crawl -depth 3 -topN 5
> solrUrl is not set, indexing will be skipped...
> crawl started in: crawl
> rootUrlDir = urls
> threads = 10
> depth = 3
> solrUrl=null
> topN = 5
> Injector: starting at 2012-02-22 16:06:04
> Injector: crawlDb: crawl/crawldb
> Injector: urlDir: urls
> Injector: Converting injected urls to crawl db entries.
> Injector: Merging injected urls into crawl db.
> Injector: finished at 2012-02-22 16:06:06, elapsed: 00:00:02
> Generator: starting at 2012-02-22 16:06:06
> Generator: Selecting best-scoring urls due for fetch.
> Generator: filtering: true
> Generator: normalizing: true
> Generator: topN: 5
> Generator: jobtracker is 'local', generating exactly one partition.
> Generator: Partitioning selected urls for politeness.
> Generator: segment: crawl/segments/20120222160609
> Generator: finished at 2012-02-22 16:06:10, elapsed: 00:00:03
> Fetcher: Your 'http.agent.name' value should be listed first in
'http.robots.agents' property.
> Fetcher: starting at 2012-02-22 16:06:10
> Fetcher: segment: crawl/segments/20120222160609
> Using queue mode : byHost
> Fetcher: threads: 10
> Fetcher: time-out divisor: 2
> QueueFeeder finished: total 2 records + hit by time limit :0
> Using queue mode : byHost
> Using queue mode : byHost
> Using queue mode : byHost
> fetching http://bu.univ-angers.fr/
> Using queue mode : byHost
> Using queue mode : byHost
> fetching http://www.face-ecran.fr/
> -finishing thread FetcherThread, activeThreads=2
> -finishing thread FetcherThread, activeThreads=2
> Using queue mode : byHost
> -finishing thread FetcherThread, activeThreads=2
> Using queue mode : byHost
> -finishing thread FetcherThread, activeThreads=2
> Using queue mode : byHost
> -finishing thread FetcherThread, activeThreads=2
> -finishing thread FetcherThread, activeThreads=2
> Using queue mode : byHost
> Using queue mode : byHost
> -finishing thread FetcherThread, activeThreads=2
> -finishing thread FetcherThread, activeThreads=2
> Fetcher: throughput threshold: -1
> Fetcher: throughput threshold retries: 5
> -activeThreads=2, spinWaiting=0, fetchQueues.totalSize=0
> -finishing thread FetcherThread, activeThreads=1
> -finishing thread FetcherThread, activeThreads=0
> -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0
> -activeThreads=0
> Fetcher: finished at 2012-02-22 16:06:13, elapsed: 00:00:03
> ParseSegment: starting at 2012-02-22 16:06:13
> ParseSegment: segment: crawl/segments/20120222160609
> Parsing: http://bu.univ-angers.fr/
> Parsing: http://www.face-ecran.fr/
> Exception in thread "main" java.io.IOException: Job failed!
>    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
>    at org.apache.nutch.parse.ParseSegment.parse(ParseSegment.java:157)
>    at org.apache.nutch.crawl.Crawl.run(Crawl.java:138)
>    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>    at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)
>
>
> --------------------------------------------------
>
> --
> Avec mes salutations les plus cordiales.
> __
>
> Daniel Bourrion, conservateur des bibliothèques
> Responsable de la bibliothèque numérique
> Ligne directe : 02.44.68.80.50
> SCD Université d'Angers - http://bu.univ-angers.fr
> Bu Saint Serge - 57 Quai Félix Faure - 49100 Angers cedex
>
> ***********************************
> " Et par le pouvoir d'un mot
> Je recommence ma vie "
>                       Paul Eluard
> ***********************************
> blog perso : http://archives.face-ecran.fr/
>
>

Reply via email to