Hello.
I'm trying to run nutch 1.3 in my LAN following the NutchTutorial from
wiki page. When I try to run with this command line options: nutch crawl
urls -dir crawl -depth 3 I get the following output:
solrUrl is not set, indexing will be skipped...
crawl started in: crawl
rootUrlDir = urls
threads = 10
depth = 3
solrUrl=null
Injector: starting at 2011-07-11 09:35:37
Injector: crawlDb: crawl/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Injector: Merging injected urls into crawl db.
Injector: finished at 2011-07-11 09:35:40, elapsed: 00:00:03
Generator: starting at 2011-07-11 09:35:40
Generator: Selecting best-scoring urls due for fetch.
Generator: filtering: true
Generator: normalizing: true
Generator: jobtracker is 'local', generating exactly one partition.
Generator: Partitioning selected urls for politeness.
Generator: segment: crawl/segments/20110711093542
Generator: finished at 2011-07-11 09:35:43, elapsed: 00:00:03
Fetcher: starting at 2011-07-11 09:35:43
Fetcher: segment: crawl/segments/20110711093542
Fetcher: threads: 10
QueueFeeder finished: total 2 records + hit by time limit :0
fetching http://FIRST SITE/
fetching http://SECOND SITE/
-finishing thread FetcherThread, activeThreads=2
-finishing thread FetcherThread, activeThreads=2
-finishing thread FetcherThread, activeThreads=2
-finishing thread FetcherThread, activeThreads=2
-finishing thread FetcherThread, activeThreads=2
-finishing thread FetcherThread, activeThreads=3
-finishing thread FetcherThread, activeThreads=2
-finishing thread FetcherThread, activeThreads=3
fetch of http://FIRST SITE/ failed with: java.net.ConnectException:
Network is unreachable
-finishing thread FetcherThread, activeThreads=1
fetch of http://SECOND SITE/ failed with: java.net.ConnectException:
Network is unreachable
-finishing thread FetcherThread, activeThreads=0
-activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0
-activeThreads=0
Fetcher: finished at 2011-07-11 09:35:45, elapsed: 00:00:02
ParseSegment: starting at 2011-07-11 09:35:45
ParseSegment: segment: crawl/segments/20110711093542
ParseSegment: finished at 2011-07-11 09:35:47, elapsed: 00:00:01
CrawlDb update: starting at 2011-07-11 09:35:47
CrawlDb update: db: crawl/crawldb
CrawlDb update: segments: [crawl/segments/20110711093542]
CrawlDb update: additions allowed: true
CrawlDb update: URL normalizing: true
CrawlDb update: URL filtering: true
CrawlDb update: Merging segment data into db.
CrawlDb update: finished at 2011-07-11 09:35:48, elapsed: 00:00:01
Generator: starting at 2011-07-11 09:35:48
Generator: Selecting best-scoring urls due for fetch.
Generator: filtering: true
Generator: normalizing: true
Generator: jobtracker is 'local', generating exactly one partition.
Generator: 0 records selected for fetching, exiting ...
Stopping at depth=1 - no more URLs to fetch.
LinkDb: starting at 2011-07-11 09:35:49
LinkDb: linkdb: crawl/linkdb
LinkDb: URL normalize: true
LinkDb: URL filter: true
LinkDb: adding segment:
file:/home/yusniel/Programas/nutch-1.3/runtime/local/bin/crawl/segments/20110711093542
LinkDb: finished at 2011-07-11 09:35:50, elapsed: 00:00:01
crawl finished: crawl
According to this output, the problem is related with the access to
network, however, I can access to those web site using Firefox. I'm
using Debian testing version.
Greetings.