I'm running nutch 1.3 on 64 bit Ubuntu, following are the commands and
relevant output.

----------------------------------
llist@LeosLinux:~$ /usr/share/nutch/runtime/local/bin/nutch
inject /home/llist/nutchData/crawl/crawldb /home/llist/nutchData/seed
Injector: starting at 2011-07-15 18:32:10
Injector: crawlDb: /home/llist/nutchData/crawl/crawldb
Injector: urlDir: /home/llist/nutchData/seed
Injector: Converting injected urls to crawl db entries.
Injector: Merging injected urls into crawl db.
Injector: finished at 2011-07-15 18:32:13, elapsed: 00:00:02
=================
llist@LeosLinux:~$ /usr/share/nutch/runtime/local/bin/nutch
generate /home/llist/nutchData/crawl/crawldb 
/home/llist/nutchData/crawl/segments
Generator: starting at 2011-07-15 18:32:41
Generator: Selecting best-scoring urls due for fetch.
Generator: filtering: true
Generator: normalizing: true
Generator: jobtracker is 'local', generating exactly one partition.
Generator: Partitioning selected urls for politeness.
Generator: segment: /home/llist/nutchData/crawl/segments/20110715183244
Generator: finished at 2011-07-15 18:32:45, elapsed: 00:00:03
==================
llist@LeosLinux:~$ /usr/share/nutch/runtime/local/bin/nutch
fetch /home/llist/nutchData/crawl/segments/20110715183244
Fetcher: Your 'http.agent.name' value should be listed first in
'http.robots.agents' property.
Fetcher: starting at 2011-07-15 18:34:55
Fetcher: segment: /home/llist/nutchData/crawl/segments/20110715183244
Fetcher: threads: 10
QueueFeeder finished: total 1 records + hit by time limit :0
fetching http://www.seek.com.au/
-finishing thread FetcherThread, activeThreads=1
-finishing thread FetcherThread, activeThreads=1
-finishing thread FetcherThread, activeThreads=1
-finishing thread FetcherThread, activeThreads=1
-finishing thread FetcherThread, activeThreads=2
-finishing thread FetcherThread, activeThreads=1
-finishing thread FetcherThread, activeThreads=1
-finishing thread FetcherThread, activeThreads=1
-finishing thread FetcherThread, activeThreads=1
-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0
-finishing thread FetcherThread, activeThreads=0
-activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0
-activeThreads=0
Fetcher: finished at 2011-07-15 18:34:59, elapsed: 00:00:03
=================
llist@LeosLinux:~$ /usr/share/nutch/runtime/local/bin/nutch
updatedb /home/llist/nutchData/crawl/crawldb
-dir /home/llist/nutchData/crawl/segments/20110715183244
CrawlDb update: starting at 2011-07-15 18:36:00
CrawlDb update: db: /home/llist/nutchData/crawl/crawldb
CrawlDb update: segments:
[file:/home/llist/nutchData/crawl/segments/20110715183244/crawl_fetch,
file:/home/llist/nutchData/crawl/segments/20110715183244/crawl_generate,
file:/home/llist/nutchData/crawl/segments/20110715183244/content]
CrawlDb update: additions allowed: true
CrawlDb update: URL normalizing: false
CrawlDb update: URL filtering: false
- skipping invalid segment
file:/home/llist/nutchData/crawl/segments/20110715183244/crawl_fetch
- skipping invalid segment
file:/home/llist/nutchData/crawl/segments/20110715183244/crawl_generate
- skipping invalid segment
file:/home/llist/nutchData/crawl/segments/20110715183244/content
CrawlDb update: Merging segment data into db.
CrawlDb update: finished at 2011-07-15 18:36:01, elapsed: 00:00:01
-----------------------------------

Appreciate any hints on what I'm missing.

Reply via email to