Thank you very much Lewis. Greetings.
El 11/07/11 09:55, lewis john mcgibbney escribió:
Hi,
Please see this new tutorial [1] for configuring Nutch 1.3. If you are
familiar/comnfortable working with Solr for improvements to indexing then
you will find it no problem.
If you require to stick with Lucene and the web application front end then
please stcik with Nutch 1.2 or before.
[1] http://wiki.apache.org/nutch/RunningNutchAndSolr
On Mon, Jul 11, 2011 at 3:02 PM, Yusniel Hidalgo Delgado
<[email protected]>wrote:
Hello.
I'm trying to run nutch 1.3 in my LAN following the NutchTutorial from wiki
page. When I try to run with this command line options: nutch crawl urls
-dir crawl -depth 3 I get the following output:
solrUrl is not set, indexing will be skipped...
crawl started in: crawl
rootUrlDir = urls
threads = 10
depth = 3
solrUrl=null
Injector: starting at 2011-07-11 09:35:37
Injector: crawlDb: crawl/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Injector: Merging injected urls into crawl db.
Injector: finished at 2011-07-11 09:35:40, elapsed: 00:00:03
Generator: starting at 2011-07-11 09:35:40
Generator: Selecting best-scoring urls due for fetch.
Generator: filtering: true
Generator: normalizing: true
Generator: jobtracker is 'local', generating exactly one partition.
Generator: Partitioning selected urls for politeness.
Generator: segment: crawl/segments/20110711093542
Generator: finished at 2011-07-11 09:35:43, elapsed: 00:00:03
Fetcher: starting at 2011-07-11 09:35:43
Fetcher: segment: crawl/segments/20110711093542
Fetcher: threads: 10
QueueFeeder finished: total 2 records + hit by time limit :0
fetching http://FIRST<http://first/> SITE/
fetching http://SECOND<http://second/> SITE/
-finishing thread FetcherThread, activeThreads=2
-finishing thread FetcherThread, activeThreads=2
-finishing thread FetcherThread, activeThreads=2
-finishing thread FetcherThread, activeThreads=2
-finishing thread FetcherThread, activeThreads=2
-finishing thread FetcherThread, activeThreads=3
-finishing thread FetcherThread, activeThreads=2
-finishing thread FetcherThread, activeThreads=3
fetch of http://FIRST<http://first/> SITE/ failed with:
java.net.ConnectException: Network is unreachable
-finishing thread FetcherThread, activeThreads=1
fetch of http://SECOND<http://second/> SITE/ failed with:
java.net.ConnectException: Network is unreachable
-finishing thread FetcherThread, activeThreads=0
-activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0
-activeThreads=0
Fetcher: finished at 2011-07-11 09:35:45, elapsed: 00:00:02
ParseSegment: starting at 2011-07-11 09:35:45
ParseSegment: segment: crawl/segments/20110711093542
ParseSegment: finished at 2011-07-11 09:35:47, elapsed: 00:00:01
CrawlDb update: starting at 2011-07-11 09:35:47
CrawlDb update: db: crawl/crawldb
CrawlDb update: segments: [crawl/segments/**20110711093542]
CrawlDb update: additions allowed: true
CrawlDb update: URL normalizing: true
CrawlDb update: URL filtering: true
CrawlDb update: Merging segment data into db.
CrawlDb update: finished at 2011-07-11 09:35:48, elapsed: 00:00:01
Generator: starting at 2011-07-11 09:35:48
Generator: Selecting best-scoring urls due for fetch.
Generator: filtering: true
Generator: normalizing: true
Generator: jobtracker is 'local', generating exactly one partition.
Generator: 0 records selected for fetching, exiting ...
Stopping at depth=1 - no more URLs to fetch.
LinkDb: starting at 2011-07-11 09:35:49
LinkDb: linkdb: crawl/linkdb
LinkDb: URL normalize: true
LinkDb: URL filter: true
LinkDb: adding segment: file:/home/yusniel/Programas/**
nutch-1.3/runtime/local/bin/**crawl/segments/20110711093542
LinkDb: finished at 2011-07-11 09:35:50, elapsed: 00:00:01
crawl finished: crawl
According to this output, the problem is related with the access to
network, however, I can access to those web site using Firefox. I'm using
Debian testing version.
Greetings.