I'm not if I did understand you correct. Here is the complete output
of my crawl:


tom:bin toom$ ./nutch crawl /Users/toom/Downloads/nutch-1.3/crawled
-dir /Users/toom/Downloads/nutch-1.3/sites -depth 3 -topN 50
solrUrl is not set, indexing will be skipped...
crawl started in: /Users/toom/Downloads/nutch-1.3/sites
rootUrlDir = /Users/toom/Downloads/nutch-1.3/crawled
threads = 10
depth = 3
solrUrl=null
topN = 50
Injector: starting at 2011-07-12 12:28:49
Injector: crawlDb: /Users/toom/Downloads/nutch-1.3/sites/crawldb
Injector: urlDir: /Users/toom/Downloads/nutch-1.3/crawled
Injector: Converting injected urls to crawl db entries.
Injector: Merging injected urls into crawl db.
Injector: finished at 2011-07-12 12:28:53, elapsed: 00:00:04
Generator: starting at 2011-07-12 12:28:53
Generator: Selecting best-scoring urls due for fetch.
Generator: filtering: true
Generator: normalizing: true
Generator: topN: 50
Generator: jobtracker is 'local', generating exactly one partition.
Generator: Partitioning selected urls for politeness.
Generator: segment:
/Users/toom/Downloads/nutch-1.3/sites/segments/20110712122856
Generator: finished at 2011-07-12 12:28:57, elapsed: 00:00:04
Fetcher: Your 'http.agent.name' value should be listed first in
'http.robots.agents' property.
Fetcher: starting at 2011-07-12 12:28:57
Fetcher: segment: /Users/toom/Downloads/nutch-1.3/sites/segments/20110712122856
Fetcher: threads: 10
QueueFeeder finished: total 1 records + hit by time limit :0
fetching http://nutch.apache.org/
-finishing thread FetcherThread, activeThreads=1
-finishing thread FetcherThread, activeThreads=1
-finishing thread FetcherThread, activeThreads=1
-finishing thread FetcherThread, activeThreads=1
-finishing thread FetcherThread, activeThreads=1
-finishing thread FetcherThread, activeThreads=1
-finishing thread FetcherThread, activeThreads=1
-finishing thread FetcherThread, activeThreads=1
-finishing thread FetcherThread, activeThreads=1
-activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0
-finishing thread FetcherThread, activeThreads=0
-activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0
-activeThreads=0
Fetcher: finished at 2011-07-12 12:29:01, elapsed: 00:00:03
ParseSegment: starting at 2011-07-12 12:29:01
ParseSegment: segment:
/Users/toom/Downloads/nutch-1.3/sites/segments/20110712122856
ParseSegment: finished at 2011-07-12 12:29:03, elapsed: 00:00:02
CrawlDb update: starting at 2011-07-12 12:29:03
CrawlDb update: db: /Users/toom/Downloads/nutch-1.3/sites/crawldb
CrawlDb update: segments:
[/Users/toom/Downloads/nutch-1.3/sites/segments/20110712122856]
CrawlDb update: additions allowed: true
CrawlDb update: URL normalizing: true
CrawlDb update: URL filtering: true
CrawlDb update: Merging segment data into db.
CrawlDb update: finished at 2011-07-12 12:29:06, elapsed: 00:00:02
Generator: starting at 2011-07-12 12:29:06
Generator: Selecting best-scoring urls due for fetch.
Generator: filtering: true
Generator: normalizing: true
Generator: topN: 50
Generator: jobtracker is 'local', generating exactly one partition.
Generator: Partitioning selected urls for politeness.
Generator: segment:
/Users/toom/Downloads/nutch-1.3/sites/segments/20110712122908
Generator: finished at 2011-07-12 12:29:10, elapsed: 00:00:03
Fetcher: Your 'http.agent.name' value should be listed first in
'http.robots.agents' property.
Fetcher: starting at 2011-07-12 12:29:10
Fetcher: segment: /Users/toom/Downloads/nutch-1.3/sites/segments/20110712122908
Fetcher: threads: 10
QueueFeeder finished: total 50 records + hit by time limit :0
fetching http://www.cafepress.com/nutch/
fetching http://creativecommons.org/press-releases/entry/5064
fetching http://blog.foofactory.fi/2007/03/twice-speed-half-size.html
fetching http://www.apache.org/dist/nutch/CHANGES-1.0.txt
fetching http://eu.apachecon.com/c/aceu2009/sessions/138
fetching http://www.us.apachecon.com/c/acus2009/
fetching http://issues.apache.org/jira/browse/NUTCH
fetching http://forrest.apache.org/
fetching http://hadoop.apache.org/
fetching http://wiki.apache.org/nutch/
fetching http://nutch.apache.org/credits.html
fetching http://tika.apache.org/
fetching http://lucene.apache.org/solr/
fetching http://osuosl.org/news_folder/nutch
fetching http://www.eu.apachecon.com/c/aceu2009/
-activeThreads=10, spinWaiting=1, fetchQueues.totalSize=35
-activeThreads=10, spinWaiting=8, fetchQueues.totalSize=35
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=35
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=35
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=35
fetching http://www.apache.org/
fetching http://eu.apachecon.com/c/aceu2009/sessions/251
fetching http://nutch.apache.org/skin/fontsize.js
-activeThreads=10, spinWaiting=8, fetchQueues.totalSize=32
fetching http://www.us.apachecon.com/c/acus2009/schedule
fetching http://wiki.apache.org/nutch/NutchTutorial
-activeThreads=10, spinWaiting=8, fetchQueues.totalSize=30
fetching http://lucene.apache.org/java/
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=29
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=29
fetching http://www.apache.org/dyn/closer.cgi/nutch/
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=28
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=28
fetching http://eu.apachecon.com/c/aceu2009/sessions/197
fetching http://nutch.apache.org/nightly.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=26
fetching http://wiki.apache.org/nutch/FAQ
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=25
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=25
fetching http://www.apache.org/licenses/
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=24
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=24
fetching http://eu.apachecon.com/c/aceu2009/sessions/136
fetching http://nutch.apache.org/apidocs-1.3/index.html
-activeThreads=10, spinWaiting=8, fetchQueues.totalSize=22
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=22
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=22
fetching http://www.apache.org/dist/nutch/CHANGES-1.2.txt
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=21
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=21
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=21
fetching http://nutch.apache.org/skin/breadcrumbs.js
fetching http://eu.apachecon.com/c/aceu2009/sessions/165
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=19
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=19
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=19
fetching http://www.apache.org/dist/nutch/CHANGES-0.9.txt
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=18
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=18
fetching http://eu.apachecon.com/c/aceu2009/sessions/201
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=17
fetching http://nutch.apache.org/skin/getMenu.js
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=16
fetching http://www.apache.org/dist/nutch/CHANGES-1.1.txt
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=15
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=15
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=15
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=15
fetching http://eu.apachecon.com/c/aceu2009/sessions/137
fetching http://nutch.apache.org/index.html
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=13
fetching http://www.apache.org/dist/nutch/CHANGES-0.8.1.txt
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=12
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=12
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=12
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=12
fetching 
http://www.apache.org/foundation/records/minutes/2010/board_minutes_2010_04_21.txt
fetching http://eu.apachecon.com/c/aceu2009/sessions/250
-activeThreads=10, spinWaiting=8, fetchQueues.totalSize=10
fetching http://nutch.apache.org/mailing_lists.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=9
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=9
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=9
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=9
fetching http://www.apache.org/dist/nutch/CHANGES-1.3.txt
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=8
fetching http://nutch.apache.org/bot.html
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=7
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7
fetching http://nutch.apache.org/issue_tracking.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=6
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=6
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=6
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=6
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=6
fetching http://nutch.apache.org/about.html
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=5
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=5
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=5
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=5
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=5
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=5
fetching http://nutch.apache.org/i18n.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
* queue: http://nutch.apache.org
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1310466617719
  now           = 1310466613063
  0. http://nutch.apache.org/version_control.html
  1. http://nutch.apache.org/skin/getBlank.js
  2. http://nutch.apache.org/index.pdf
  3. http://nutch.apache.org/apidocs-1.2/index.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
* queue: http://nutch.apache.org
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1310466617719
  now           = 1310466614064
  0. http://nutch.apache.org/version_control.html
  1. http://nutch.apache.org/skin/getBlank.js
  2. http://nutch.apache.org/index.pdf
  3. http://nutch.apache.org/apidocs-1.2/index.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
* queue: http://nutch.apache.org
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1310466617719
  now           = 1310466615066
  0. http://nutch.apache.org/version_control.html
  1. http://nutch.apache.org/skin/getBlank.js
  2. http://nutch.apache.org/index.pdf
  3. http://nutch.apache.org/apidocs-1.2/index.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
* queue: http://nutch.apache.org
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1310466617719
  now           = 1310466616068
  0. http://nutch.apache.org/version_control.html
  1. http://nutch.apache.org/skin/getBlank.js
  2. http://nutch.apache.org/index.pdf
  3. http://nutch.apache.org/apidocs-1.2/index.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
* queue: http://nutch.apache.org
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1310466617719
  now           = 1310466617069
  0. http://nutch.apache.org/version_control.html
  1. http://nutch.apache.org/skin/getBlank.js
  2. http://nutch.apache.org/index.pdf
  3. http://nutch.apache.org/apidocs-1.2/index.html
fetching http://nutch.apache.org/version_control.html
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=3
* queue: http://nutch.apache.org
  maxThreads    = 1
  inProgress    = 1
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1310466617719
  now           = 1310466618071
  0. http://nutch.apache.org/skin/getBlank.js
  1. http://nutch.apache.org/index.pdf
  2. http://nutch.apache.org/apidocs-1.2/index.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
* queue: http://nutch.apache.org
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1310466623151
  now           = 1310466619073
  0. http://nutch.apache.org/skin/getBlank.js
  1. http://nutch.apache.org/index.pdf
  2. http://nutch.apache.org/apidocs-1.2/index.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
* queue: http://nutch.apache.org
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1310466623151
  now           = 1310466620075
  0. http://nutch.apache.org/skin/getBlank.js
  1. http://nutch.apache.org/index.pdf
  2. http://nutch.apache.org/apidocs-1.2/index.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
* queue: http://nutch.apache.org
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1310466623151
  now           = 1310466621077
  0. http://nutch.apache.org/skin/getBlank.js
  1. http://nutch.apache.org/index.pdf
  2. http://nutch.apache.org/apidocs-1.2/index.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
* queue: http://nutch.apache.org
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1310466623151
  now           = 1310466622078
  0. http://nutch.apache.org/skin/getBlank.js
  1. http://nutch.apache.org/index.pdf
  2. http://nutch.apache.org/apidocs-1.2/index.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
* queue: http://nutch.apache.org
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1310466623151
  now           = 1310466623080
  0. http://nutch.apache.org/skin/getBlank.js
  1. http://nutch.apache.org/index.pdf
  2. http://nutch.apache.org/apidocs-1.2/index.html
fetching http://nutch.apache.org/skin/getBlank.js
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
* queue: http://nutch.apache.org
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1310466628578
  now           = 1310466624082
  0. http://nutch.apache.org/index.pdf
  1. http://nutch.apache.org/apidocs-1.2/index.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
* queue: http://nutch.apache.org
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1310466628578
  now           = 1310466625084
  0. http://nutch.apache.org/index.pdf
  1. http://nutch.apache.org/apidocs-1.2/index.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
* queue: http://nutch.apache.org
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1310466628578
  now           = 1310466626086
  0. http://nutch.apache.org/index.pdf
  1. http://nutch.apache.org/apidocs-1.2/index.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
* queue: http://nutch.apache.org
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1310466628578
  now           = 1310466627088
  0. http://nutch.apache.org/index.pdf
  1. http://nutch.apache.org/apidocs-1.2/index.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
* queue: http://nutch.apache.org
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1310466628578
  now           = 1310466628089
  0. http://nutch.apache.org/index.pdf
  1. http://nutch.apache.org/apidocs-1.2/index.html
fetching http://nutch.apache.org/index.pdf
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=1
* queue: http://nutch.apache.org
  maxThreads    = 1
  inProgress    = 1
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1310466628578
  now           = 1310466629090
  0. http://nutch.apache.org/apidocs-1.2/index.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
* queue: http://nutch.apache.org
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1310466634844
  now           = 1310466630092
  0. http://nutch.apache.org/apidocs-1.2/index.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
* queue: http://nutch.apache.org
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1310466634844
  now           = 1310466631094
  0. http://nutch.apache.org/apidocs-1.2/index.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
* queue: http://nutch.apache.org
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1310466634844
  now           = 1310466632095
  0. http://nutch.apache.org/apidocs-1.2/index.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
* queue: http://nutch.apache.org
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1310466634844
  now           = 1310466633097
  0. http://nutch.apache.org/apidocs-1.2/index.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
* queue: http://nutch.apache.org
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1310466634844
  now           = 1310466634099
  0. http://nutch.apache.org/apidocs-1.2/index.html
fetching http://nutch.apache.org/apidocs-1.2/index.html
-finishing thread FetcherThread, activeThreads=9
-finishing thread FetcherThread, activeThreads=8
-finishing thread FetcherThread, activeThreads=7
-finishing thread FetcherThread, activeThreads=6
-finishing thread FetcherThread, activeThreads=5
-activeThreads=5, spinWaiting=4, fetchQueues.totalSize=0
-finishing thread FetcherThread, activeThreads=4
-finishing thread FetcherThread, activeThreads=3
-finishing thread FetcherThread, activeThreads=2
-finishing thread FetcherThread, activeThreads=1
-finishing thread FetcherThread, activeThreads=0
-activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0
-activeThreads=0
Fetcher: finished at 2011-07-12 12:30:37, elapsed: 00:01:27
ParseSegment: starting at 2011-07-12 12:30:37
ParseSegment: segment:
/Users/toom/Downloads/nutch-1.3/sites/segments/20110712122908
Error parsing: http://nutch.apache.org/skin/breadcrumbs.js:
failed(2,0): Can't retrieve Tika parser for mime-type
application/javascript
Error parsing: http://nutch.apache.org/skin/fontsize.js: failed(2,0):
Can't retrieve Tika parser for mime-type application/javascript
Error parsing: http://nutch.apache.org/skin/getBlank.js: failed(2,0):
Can't retrieve Tika parser for mime-type application/javascript
Error parsing: http://nutch.apache.org/skin/getMenu.js: failed(2,0):
Can't retrieve Tika parser for mime-type application/javascript
ParseSegment: finished at 2011-07-12 12:30:46, elapsed: 00:00:08
CrawlDb update: starting at 2011-07-12 12:30:46
CrawlDb update: db: /Users/toom/Downloads/nutch-1.3/sites/crawldb
CrawlDb update: segments:
[/Users/toom/Downloads/nutch-1.3/sites/segments/20110712122908]
CrawlDb update: additions allowed: true
CrawlDb update: URL normalizing: true
CrawlDb update: URL filtering: true
CrawlDb update: Merging segment data into db.
CrawlDb update: finished at 2011-07-12 12:30:48, elapsed: 00:00:02
Generator: starting at 2011-07-12 12:30:48
Generator: Selecting best-scoring urls due for fetch.
Generator: filtering: true
Generator: normalizing: true
Generator: topN: 50
Generator: jobtracker is 'local', generating exactly one partition.
Generator: Partitioning selected urls for politeness.
Generator: segment:
/Users/toom/Downloads/nutch-1.3/sites/segments/20110712123051
Generator: finished at 2011-07-12 12:30:52, elapsed: 00:00:03
Fetcher: Your 'http.agent.name' value should be listed first in
'http.robots.agents' property.
Fetcher: starting at 2011-07-12 12:30:52
Fetcher: segment: /Users/toom/Downloads/nutch-1.3/sites/segments/20110712123051
Fetcher: threads: 10
QueueFeeder finished: total 50 records + hit by time limit :0
fetching http://www.onehippo.com/
fetching http://apacheconeu.blogspot.com/
fetching http://www.day.com/
fetching http://www.func.nl/apacheconeu2009
fetching http://www.thawte.com/
fetching http://eu.apachecon.com/c/aceu2009/about
fetching http://www.us.apachecon.com/c/acus2009/sessions/333
fetching http://www.joost.com/
fetching http://developer.yahoo.com/blogs/hadoop/
fetching http://www.springsource.com/
fetching http://www.isi.edu/~koehn/europarl/
fetching http://www.topicus.nl/
fetching http://opensource.hp.com/
fetching http://nutch.apache.org/apidocs-1.3/overview-frame.html
-activeThreads=10, spinWaiting=0, fetchQueues.totalSize=36
fetching http://www.haloworldwide.com/
fetching https://builds.apache.org/job/Nutch-trunk/javadoc/
fetch of https://builds.apache.org/job/Nutch-trunk/javadoc/ failed
with: org.apache.nutch.protocol.ProtocolNotFound: protocol not found
for url=https
fetching http://www.hotwaxmedia.com/
fetching http://lucene.apache.org/hadoop
fetching http://www.cloudera.com/
fetching http://code.google.com/opensource/
fetching http://www.lucidimagination.com/
fetching http://apache.lehtivihrea.org/nutch/
fetching http://www.eu.apachecon.com/c/aceu2009/about/meetups
-activeThreads=10, spinWaiting=4, fetchQueues.totalSize=27
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=27
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=27
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=27
fetching http://www.us.apachecon.com/c/acus2009/sessions/334
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=26
fetching http://nutch.apache.org/apidocs-1.2/allclasses-frame.html
fetching http://eu.apachecon.com/c/aceu2009/about/crowdvine
-activeThreads=10, spinWaiting=8, fetchQueues.totalSize=24
fetching http://www.eu.apachecon.com/c/aceu2009/about/videoStreaming
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=23
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=23
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=23
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=23
fetching http://www.us.apachecon.com/c/acus2009/sessions/335
fetching http://nutch.apache.org/apidocs-1.2/overview-summary.html
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=21
fetching http://eu.apachecon.com/c/aceu2009/speakers
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=20
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=20
fetching http://www.eu.apachecon.com/c/aceu2009/sponsors/sponsor
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=19
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=19
fetching http://www.us.apachecon.com/c/acus2009/sessions/461
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=18
fetching http://nutch.apache.org/apidocs-1.3/allclasses-frame.html
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=17
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=17
fetching http://eu.apachecon.com/c/aceu2009/articles
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=16
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=16
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=16
fetching http://www.us.apachecon.com/c/acus2009/sessions/427
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=15
fetching http://nutch.apache.org/apidocs-1.2/overview-frame.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=14
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=14
fetching http://eu.apachecon.com/c/aceu2009/sessions/
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=13
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=13
fetching http://www.us.apachecon.com/c/acus2009/sessions/430
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=12
fetching http://nutch.apache.org/apidocs-1.3/overview-summary.html
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=11
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=11
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=11
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=11
fetching http://eu.apachecon.com/c/aceu2009/sponsors/sponsors
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=10
fetching http://www.us.apachecon.com/c/acus2009/sessions/375
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=9
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=9
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=9
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=9
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=9
fetching http://eu.apachecon.com/c/
fetching http://www.us.apachecon.com/c/acus2009/sessions/462
-activeThreads=10, spinWaiting=8, fetchQueues.totalSize=7
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7
fetching http://www.us.apachecon.com/c/acus2009/sessions/428
fetching http://eu.apachecon.com/c/aceu2009/schedule
-activeThreads=10, spinWaiting=8, fetchQueues.totalSize=5
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=5
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=5
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=5
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=5
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=5
fetching http://www.us.apachecon.com/c/acus2009/sessions/331
fetching http://eu.apachecon.com/c/aceu2009/
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=3
* queue: http://eu.apachecon.com
  maxThreads    = 1
  inProgress    = 1
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1310466704235
  now           = 1310466704428
  0. http://eu.apachecon.com/js/jquery.akslideshow.js
* queue: http://www.us.apachecon.com
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1310466709214
  now           = 1310466704428
  0. http://www.us.apachecon.com/c/acus2009/sessions/437
  1. http://www.us.apachecon.com/c/acus2009/sessions/332
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=3
* queue: http://eu.apachecon.com
  maxThreads    = 1
  inProgress    = 1
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1310466704235
  now           = 1310466705429
  0. http://eu.apachecon.com/js/jquery.akslideshow.js
* queue: http://www.us.apachecon.com
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1310466709214
  now           = 1310466705430
  0. http://www.us.apachecon.com/c/acus2009/sessions/437
  1. http://www.us.apachecon.com/c/acus2009/sessions/332
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
* queue: http://eu.apachecon.com
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1310466710968
  now           = 1310466706431
  0. http://eu.apachecon.com/js/jquery.akslideshow.js
* queue: http://www.us.apachecon.com
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1310466709214
  now           = 1310466706431
  0. http://www.us.apachecon.com/c/acus2009/sessions/437
  1. http://www.us.apachecon.com/c/acus2009/sessions/332
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
* queue: http://eu.apachecon.com
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1310466710968
  now           = 1310466707433
  0. http://eu.apachecon.com/js/jquery.akslideshow.js
* queue: http://www.us.apachecon.com
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1310466709214
  now           = 1310466707433
  0. http://www.us.apachecon.com/c/acus2009/sessions/437
  1. http://www.us.apachecon.com/c/acus2009/sessions/332
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
* queue: http://eu.apachecon.com
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1310466710968
  now           = 1310466708435
  0. http://eu.apachecon.com/js/jquery.akslideshow.js
* queue: http://www.us.apachecon.com
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1310466709214
  now           = 1310466708435
  0. http://www.us.apachecon.com/c/acus2009/sessions/437
  1. http://www.us.apachecon.com/c/acus2009/sessions/332
fetching http://www.us.apachecon.com/c/acus2009/sessions/437
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=2
* queue: http://eu.apachecon.com
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1310466710968
  now           = 1310466709442
  0. http://eu.apachecon.com/js/jquery.akslideshow.js
* queue: http://www.us.apachecon.com
  maxThreads    = 1
  inProgress    = 1
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1310466709214
  now           = 1310466709442
  0. http://www.us.apachecon.com/c/acus2009/sessions/332
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
* queue: http://eu.apachecon.com
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1310466710968
  now           = 1310466710444
  0. http://eu.apachecon.com/js/jquery.akslideshow.js
* queue: http://www.us.apachecon.com
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1310466714813
  now           = 1310466710444
  0. http://www.us.apachecon.com/c/acus2009/sessions/332
fetching http://eu.apachecon.com/js/jquery.akslideshow.js
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
* queue: http://www.us.apachecon.com
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1310466714813
  now           = 1310466711446
  0. http://www.us.apachecon.com/c/acus2009/sessions/332
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
* queue: http://www.us.apachecon.com
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1310466714813
  now           = 1310466712447
  0. http://www.us.apachecon.com/c/acus2009/sessions/332
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
* queue: http://www.us.apachecon.com
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1310466714813
  now           = 1310466713448
  0. http://www.us.apachecon.com/c/acus2009/sessions/332
-activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
* queue: http://www.us.apachecon.com
  maxThreads    = 1
  inProgress    = 0
  crawlDelay    = 5000
  minCrawlDelay = 0
  nextFetchTime = 1310466714813
  now           = 1310466714450
  0. http://www.us.apachecon.com/c/acus2009/sessions/332
fetching http://www.us.apachecon.com/c/acus2009/sessions/332
-finishing thread FetcherThread, activeThreads=9
-finishing thread FetcherThread, activeThreads=8
-finishing thread FetcherThread, activeThreads=7
-finishing thread FetcherThread, activeThreads=6
-finishing thread FetcherThread, activeThreads=5
-finishing thread FetcherThread, activeThreads=4
-finishing thread FetcherThread, activeThreads=3
-finishing thread FetcherThread, activeThreads=2
-finishing thread FetcherThread, activeThreads=1
-finishing thread FetcherThread, activeThreads=0
-activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0
-activeThreads=0
Fetcher: finished at 2011-07-12 12:31:55, elapsed: 00:01:03
ParseSegment: starting at 2011-07-12 12:31:55
ParseSegment: segment:
/Users/toom/Downloads/nutch-1.3/sites/segments/20110712123051
Error parsing: http://eu.apachecon.com/js/jquery.akslideshow.js:
failed(2,0): Can't retrieve Tika parser for mime-type text/javascript
ParseSegment: finished at 2011-07-12 12:31:59, elapsed: 00:00:03
CrawlDb update: starting at 2011-07-12 12:31:59
CrawlDb update: db: /Users/toom/Downloads/nutch-1.3/sites/crawldb
CrawlDb update: segments:
[/Users/toom/Downloads/nutch-1.3/sites/segments/20110712123051]
CrawlDb update: additions allowed: true
CrawlDb update: URL normalizing: true
CrawlDb update: URL filtering: true
CrawlDb update: Merging segment data into db.
CrawlDb update: finished at 2011-07-12 12:32:03, elapsed: 00:00:03
LinkDb: starting at 2011-07-12 12:32:03
LinkDb: linkdb: /Users/toom/Downloads/nutch-1.3/sites/linkdb
LinkDb: URL normalize: true
LinkDb: URL filter: true
LinkDb: adding segment:
file:/Users/toom/Downloads/nutch-1.3/sites/segments/20110707140238
LinkDb: adding segment:
file:/Users/toom/Downloads/nutch-1.3/sites/segments/20110712113732
LinkDb: adding segment:
file:/Users/toom/Downloads/nutch-1.3/sites/segments/20110712114256
LinkDb: adding segment:
file:/Users/toom/Downloads/nutch-1.3/sites/segments/20110712122856
LinkDb: adding segment:
file:/Users/toom/Downloads/nutch-1.3/sites/segments/20110712122908
LinkDb: adding segment:
file:/Users/toom/Downloads/nutch-1.3/sites/segments/20110712123051
Exception in thread "main"
org.apache.hadoop.mapred.InvalidInputException: Input path does not
exist: 
file:/Users/toom/Downloads/nutch-1.3/sites/segments/20110707140238/parse_data
Input path does not exist:
file:/Users/toom/Downloads/nutch-1.3/sites/segments/20110712113732/parse_data
Input path does not exist:
file:/Users/toom/Downloads/nutch-1.3/sites/segments/20110712114256/parse_data
        at 
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:190)
        at 
org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:44)
        at 
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:201)
        at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
        at 
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249)
        at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:175)
        at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:149)
        at org.apache.nutch.crawl.Crawl.run(Crawl.java:142)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:54)


2011/7/12 Julien Nioche <[email protected]>:
>> Actually I'm not shure if I look at the right log lines. Please
>> explain in more detail for what exactly I should look for. Anyway I
>> found the following line just before the error:
>>
>> Error parsing: http://eu.apachecon.com/js/jquery.akslideshow.js:
>> failed(2,0): Can't retrieve Tika parser for mime-type text/javascript
>>
>> But I can see that parsing erros like this already appeared earlier
>> during the crawl.
>>
>
> This simply means that the javascript parser is not enabled in your conf
> (which is the default behaviour) and as a consequence the default parser
> (Tika) was used to try and parse it but has no resources for doing so.
>
> Note : we should probably add .js to the default url filters. The javascript
> parser has been deactivated by default because it generates atrocious URLs
> so we might as well prevent such URLs form being fetched in the first place.
>
> Anyway this is not the source of the problem. You seem to have unparsed
> segments among the ones specified. Could be that you interrupted a previous
> crawl or got a problem with it and did not delete these segments or the
> whole crawl directory. Removing the segments and calling the last couple of
> steps manually should do the trick.
>
>
>
>>
>>
>>
>> 2011/7/12 Markus Jelsma <[email protected]>:
>> > Were there errors during parsing of that last segment?
>> >
>> >> I'm starting with nutch and I ran a simple job as described in the
>> >> nutch tutorial. After a while I get the following error:
>> >>
>> >>
>> >> CrawlDb update: URL filtering: true
>> >> CrawlDb update: Merging segment data into db.
>> >> CrawlDb update: finished at 2011-07-12 12:32:03, elapsed: 00:00:03
>> >> LinkDb: starting at 2011-07-12 12:32:03
>> >> LinkDb: linkdb: /Users/toom/Downloads/nutch-1.3/sites/linkdb
>> >> LinkDb: URL normalize: true
>> >> LinkDb: URL filter: true
>> >> LinkDb: adding segment:
>> >> file:/Users/toom/Downloads/nutch-1.3/sites/segments/20110707140238
>> >> LinkDb: adding segment:
>> >> file:/Users/toom/Downloads/nutch-1.3/sites/segments/20110712113732
>> >> LinkDb: adding segment:
>> >> file:/Users/toom/Downloads/nutch-1.3/sites/segments/20110712114256
>> >> LinkDb: adding segment:
>> >> file:/Users/toom/Downloads/nutch-1.3/sites/segments/20110712122856
>> >> LinkDb: adding segment:
>> >> file:/Users/toom/Downloads/nutch-1.3/sites/segments/20110712122908
>> >> LinkDb: adding segment:
>> >> file:/Users/toom/Downloads/nutch-1.3/sites/segments/20110712123051
>> >> Exception in thread "main"
>> >> org.apache.hadoop.mapred.InvalidInputException: Input path does not
>> >> exist:
>> >>
>> file:/Users/toom/Downloads/nutch-1.3/sites/segments/20110707140238/parse_d
>> >> ata Input path does not exist:
>> >>
>> file:/Users/toom/Downloads/nutch-1.3/sites/segments/20110712113732/parse_da
>> >> ta Input path does not exist:
>> >>
>> file:/Users/toom/Downloads/nutch-1.3/sites/segments/20110712114256/parse_da
>> >> ta at
>> >>
>> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:1
>> >> 90) at
>> >>
>> org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileIn
>> >> putFormat.java:44) at
>> >>
>> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:20
>> >> 1) at
>> org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
>> >> at
>> >> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
>> >> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730) at
>> >> org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249) at
>> >> org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:175)
>> >>       at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:149)
>> >>       at org.apache.nutch.crawl.Crawl.run(Crawl.java:142)
>> >>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>> >>       at org.apache.nutch.crawl.Crawl.main(Crawl.java:54)
>> >
>>
>
>
>
> --
> *
> *Open Source Solutions for Text Engineering
>
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
>

Reply via email to