Okay, and what does that mean? How can I repair the error?

2011/7/12 Markus Jelsma <[email protected]>:
> I don't see this segment 20110712114256 being parsed.
>
> On Tuesday 12 July 2011 13:38:35 Paul van Hoven wrote:
>> I'm not if I did understand you correct. Here is the complete output
>> of my crawl:
>>
>>
>> tom:bin toom$ ./nutch crawl /Users/toom/Downloads/nutch-1.3/crawled
>> -dir /Users/toom/Downloads/nutch-1.3/sites -depth 3 -topN 50
>> solrUrl is not set, indexing will be skipped...
>> crawl started in: /Users/toom/Downloads/nutch-1.3/sites
>> rootUrlDir = /Users/toom/Downloads/nutch-1.3/crawled
>> threads = 10
>> depth = 3
>> solrUrl=null
>> topN = 50
>> Injector: starting at 2011-07-12 12:28:49
>> Injector: crawlDb: /Users/toom/Downloads/nutch-1.3/sites/crawldb
>> Injector: urlDir: /Users/toom/Downloads/nutch-1.3/crawled
>> Injector: Converting injected urls to crawl db entries.
>> Injector: Merging injected urls into crawl db.
>> Injector: finished at 2011-07-12 12:28:53, elapsed: 00:00:04
>> Generator: starting at 2011-07-12 12:28:53
>> Generator: Selecting best-scoring urls due for fetch.
>> Generator: filtering: true
>> Generator: normalizing: true
>> Generator: topN: 50
>> Generator: jobtracker is 'local', generating exactly one partition.
>> Generator: Partitioning selected urls for politeness.
>> Generator: segment:
>> /Users/toom/Downloads/nutch-1.3/sites/segments/20110712122856
>> Generator: finished at 2011-07-12 12:28:57, elapsed: 00:00:04
>> Fetcher: Your 'http.agent.name' value should be listed first in
>> 'http.robots.agents' property.
>> Fetcher: starting at 2011-07-12 12:28:57
>> Fetcher: segment:
>> /Users/toom/Downloads/nutch-1.3/sites/segments/20110712122856 Fetcher:
>> threads: 10
>> QueueFeeder finished: total 1 records + hit by time limit :0
>> fetching http://nutch.apache.org/
>> -finishing thread FetcherThread, activeThreads=1
>> -finishing thread FetcherThread, activeThreads=1
>> -finishing thread FetcherThread, activeThreads=1
>> -finishing thread FetcherThread, activeThreads=1
>> -finishing thread FetcherThread, activeThreads=1
>> -finishing thread FetcherThread, activeThreads=1
>> -finishing thread FetcherThread, activeThreads=1
>> -finishing thread FetcherThread, activeThreads=1
>> -finishing thread FetcherThread, activeThreads=1
>> -activeThreads=1, spinWaiting=0, fetchQueues.totalSize=0
>> -finishing thread FetcherThread, activeThreads=0
>> -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0
>> -activeThreads=0
>> Fetcher: finished at 2011-07-12 12:29:01, elapsed: 00:00:03
>> ParseSegment: starting at 2011-07-12 12:29:01
>> ParseSegment: segment:
>> /Users/toom/Downloads/nutch-1.3/sites/segments/20110712122856
>> ParseSegment: finished at 2011-07-12 12:29:03, elapsed: 00:00:02
>> CrawlDb update: starting at 2011-07-12 12:29:03
>> CrawlDb update: db: /Users/toom/Downloads/nutch-1.3/sites/crawldb
>> CrawlDb update: segments:
>> [/Users/toom/Downloads/nutch-1.3/sites/segments/20110712122856]
>> CrawlDb update: additions allowed: true
>> CrawlDb update: URL normalizing: true
>> CrawlDb update: URL filtering: true
>> CrawlDb update: Merging segment data into db.
>> CrawlDb update: finished at 2011-07-12 12:29:06, elapsed: 00:00:02
>> Generator: starting at 2011-07-12 12:29:06
>> Generator: Selecting best-scoring urls due for fetch.
>> Generator: filtering: true
>> Generator: normalizing: true
>> Generator: topN: 50
>> Generator: jobtracker is 'local', generating exactly one partition.
>> Generator: Partitioning selected urls for politeness.
>> Generator: segment:
>> /Users/toom/Downloads/nutch-1.3/sites/segments/20110712122908
>> Generator: finished at 2011-07-12 12:29:10, elapsed: 00:00:03
>> Fetcher: Your 'http.agent.name' value should be listed first in
>> 'http.robots.agents' property.
>> Fetcher: starting at 2011-07-12 12:29:10
>> Fetcher: segment:
>> /Users/toom/Downloads/nutch-1.3/sites/segments/20110712122908 Fetcher:
>> threads: 10
>> QueueFeeder finished: total 50 records + hit by time limit :0
>> fetching http://www.cafepress.com/nutch/
>> fetching http://creativecommons.org/press-releases/entry/5064
>> fetching http://blog.foofactory.fi/2007/03/twice-speed-half-size.html
>> fetching http://www.apache.org/dist/nutch/CHANGES-1.0.txt
>> fetching http://eu.apachecon.com/c/aceu2009/sessions/138
>> fetching http://www.us.apachecon.com/c/acus2009/
>> fetching http://issues.apache.org/jira/browse/NUTCH
>> fetching http://forrest.apache.org/
>> fetching http://hadoop.apache.org/
>> fetching http://wiki.apache.org/nutch/
>> fetching http://nutch.apache.org/credits.html
>> fetching http://tika.apache.org/
>> fetching http://lucene.apache.org/solr/
>> fetching http://osuosl.org/news_folder/nutch
>> fetching http://www.eu.apachecon.com/c/aceu2009/
>> -activeThreads=10, spinWaiting=1, fetchQueues.totalSize=35
>> -activeThreads=10, spinWaiting=8, fetchQueues.totalSize=35
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=35
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=35
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=35
>> fetching http://www.apache.org/
>> fetching http://eu.apachecon.com/c/aceu2009/sessions/251
>> fetching http://nutch.apache.org/skin/fontsize.js
>> -activeThreads=10, spinWaiting=8, fetchQueues.totalSize=32
>> fetching http://www.us.apachecon.com/c/acus2009/schedule
>> fetching http://wiki.apache.org/nutch/NutchTutorial
>> -activeThreads=10, spinWaiting=8, fetchQueues.totalSize=30
>> fetching http://lucene.apache.org/java/
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=29
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=29
>> fetching http://www.apache.org/dyn/closer.cgi/nutch/
>> -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=28
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=28
>> fetching http://eu.apachecon.com/c/aceu2009/sessions/197
>> fetching http://nutch.apache.org/nightly.html
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=26
>> fetching http://wiki.apache.org/nutch/FAQ
>> -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=25
>> -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=25
>> fetching http://www.apache.org/licenses/
>> -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=24
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=24
>> fetching http://eu.apachecon.com/c/aceu2009/sessions/136
>> fetching http://nutch.apache.org/apidocs-1.3/index.html
>> -activeThreads=10, spinWaiting=8, fetchQueues.totalSize=22
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=22
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=22
>> fetching http://www.apache.org/dist/nutch/CHANGES-1.2.txt
>> -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=21
>> -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=21
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=21
>> fetching http://nutch.apache.org/skin/breadcrumbs.js
>> fetching http://eu.apachecon.com/c/aceu2009/sessions/165
>> -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=19
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=19
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=19
>> fetching http://www.apache.org/dist/nutch/CHANGES-0.9.txt
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=18
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=18
>> fetching http://eu.apachecon.com/c/aceu2009/sessions/201
>> -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=17
>> fetching http://nutch.apache.org/skin/getMenu.js
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=16
>> fetching http://www.apache.org/dist/nutch/CHANGES-1.1.txt
>> -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=15
>> -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=15
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=15
>> -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=15
>> fetching http://eu.apachecon.com/c/aceu2009/sessions/137
>> fetching http://nutch.apache.org/index.html
>> -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=13
>> fetching http://www.apache.org/dist/nutch/CHANGES-0.8.1.txt
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=12
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=12
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=12
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=12
>> fetching
>> http://www.apache.org/foundation/records/minutes/2010/board_minutes_2010_0
>> 4_21.txt fetching http://eu.apachecon.com/c/aceu2009/sessions/250
>> -activeThreads=10, spinWaiting=8, fetchQueues.totalSize=10
>> fetching http://nutch.apache.org/mailing_lists.html
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=9
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=9
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=9
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=9
>> fetching http://www.apache.org/dist/nutch/CHANGES-1.3.txt
>> -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=8
>> fetching http://nutch.apache.org/bot.html
>> -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=7
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7
>> fetching http://nutch.apache.org/issue_tracking.html
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=6
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=6
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=6
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=6
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=6
>> fetching http://nutch.apache.org/about.html
>> -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=5
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=5
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=5
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=5
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=5
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=5
>> fetching http://nutch.apache.org/i18n.html
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
>> * queue: http://nutch.apache.org
>>   maxThreads    = 1
>>   inProgress    = 0
>>   crawlDelay    = 5000
>>   minCrawlDelay = 0
>>   nextFetchTime = 1310466617719
>>   now           = 1310466613063
>>   0. http://nutch.apache.org/version_control.html
>>   1. http://nutch.apache.org/skin/getBlank.js
>>   2. http://nutch.apache.org/index.pdf
>>   3. http://nutch.apache.org/apidocs-1.2/index.html
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
>> * queue: http://nutch.apache.org
>>   maxThreads    = 1
>>   inProgress    = 0
>>   crawlDelay    = 5000
>>   minCrawlDelay = 0
>>   nextFetchTime = 1310466617719
>>   now           = 1310466614064
>>   0. http://nutch.apache.org/version_control.html
>>   1. http://nutch.apache.org/skin/getBlank.js
>>   2. http://nutch.apache.org/index.pdf
>>   3. http://nutch.apache.org/apidocs-1.2/index.html
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
>> * queue: http://nutch.apache.org
>>   maxThreads    = 1
>>   inProgress    = 0
>>   crawlDelay    = 5000
>>   minCrawlDelay = 0
>>   nextFetchTime = 1310466617719
>>   now           = 1310466615066
>>   0. http://nutch.apache.org/version_control.html
>>   1. http://nutch.apache.org/skin/getBlank.js
>>   2. http://nutch.apache.org/index.pdf
>>   3. http://nutch.apache.org/apidocs-1.2/index.html
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
>> * queue: http://nutch.apache.org
>>   maxThreads    = 1
>>   inProgress    = 0
>>   crawlDelay    = 5000
>>   minCrawlDelay = 0
>>   nextFetchTime = 1310466617719
>>   now           = 1310466616068
>>   0. http://nutch.apache.org/version_control.html
>>   1. http://nutch.apache.org/skin/getBlank.js
>>   2. http://nutch.apache.org/index.pdf
>>   3. http://nutch.apache.org/apidocs-1.2/index.html
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
>> * queue: http://nutch.apache.org
>>   maxThreads    = 1
>>   inProgress    = 0
>>   crawlDelay    = 5000
>>   minCrawlDelay = 0
>>   nextFetchTime = 1310466617719
>>   now           = 1310466617069
>>   0. http://nutch.apache.org/version_control.html
>>   1. http://nutch.apache.org/skin/getBlank.js
>>   2. http://nutch.apache.org/index.pdf
>>   3. http://nutch.apache.org/apidocs-1.2/index.html
>> fetching http://nutch.apache.org/version_control.html
>> -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=3
>> * queue: http://nutch.apache.org
>>   maxThreads    = 1
>>   inProgress    = 1
>>   crawlDelay    = 5000
>>   minCrawlDelay = 0
>>   nextFetchTime = 1310466617719
>>   now           = 1310466618071
>>   0. http://nutch.apache.org/skin/getBlank.js
>>   1. http://nutch.apache.org/index.pdf
>>   2. http://nutch.apache.org/apidocs-1.2/index.html
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
>> * queue: http://nutch.apache.org
>>   maxThreads    = 1
>>   inProgress    = 0
>>   crawlDelay    = 5000
>>   minCrawlDelay = 0
>>   nextFetchTime = 1310466623151
>>   now           = 1310466619073
>>   0. http://nutch.apache.org/skin/getBlank.js
>>   1. http://nutch.apache.org/index.pdf
>>   2. http://nutch.apache.org/apidocs-1.2/index.html
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
>> * queue: http://nutch.apache.org
>>   maxThreads    = 1
>>   inProgress    = 0
>>   crawlDelay    = 5000
>>   minCrawlDelay = 0
>>   nextFetchTime = 1310466623151
>>   now           = 1310466620075
>>   0. http://nutch.apache.org/skin/getBlank.js
>>   1. http://nutch.apache.org/index.pdf
>>   2. http://nutch.apache.org/apidocs-1.2/index.html
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
>> * queue: http://nutch.apache.org
>>   maxThreads    = 1
>>   inProgress    = 0
>>   crawlDelay    = 5000
>>   minCrawlDelay = 0
>>   nextFetchTime = 1310466623151
>>   now           = 1310466621077
>>   0. http://nutch.apache.org/skin/getBlank.js
>>   1. http://nutch.apache.org/index.pdf
>>   2. http://nutch.apache.org/apidocs-1.2/index.html
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
>> * queue: http://nutch.apache.org
>>   maxThreads    = 1
>>   inProgress    = 0
>>   crawlDelay    = 5000
>>   minCrawlDelay = 0
>>   nextFetchTime = 1310466623151
>>   now           = 1310466622078
>>   0. http://nutch.apache.org/skin/getBlank.js
>>   1. http://nutch.apache.org/index.pdf
>>   2. http://nutch.apache.org/apidocs-1.2/index.html
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
>> * queue: http://nutch.apache.org
>>   maxThreads    = 1
>>   inProgress    = 0
>>   crawlDelay    = 5000
>>   minCrawlDelay = 0
>>   nextFetchTime = 1310466623151
>>   now           = 1310466623080
>>   0. http://nutch.apache.org/skin/getBlank.js
>>   1. http://nutch.apache.org/index.pdf
>>   2. http://nutch.apache.org/apidocs-1.2/index.html
>> fetching http://nutch.apache.org/skin/getBlank.js
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
>> * queue: http://nutch.apache.org
>>   maxThreads    = 1
>>   inProgress    = 0
>>   crawlDelay    = 5000
>>   minCrawlDelay = 0
>>   nextFetchTime = 1310466628578
>>   now           = 1310466624082
>>   0. http://nutch.apache.org/index.pdf
>>   1. http://nutch.apache.org/apidocs-1.2/index.html
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
>> * queue: http://nutch.apache.org
>>   maxThreads    = 1
>>   inProgress    = 0
>>   crawlDelay    = 5000
>>   minCrawlDelay = 0
>>   nextFetchTime = 1310466628578
>>   now           = 1310466625084
>>   0. http://nutch.apache.org/index.pdf
>>   1. http://nutch.apache.org/apidocs-1.2/index.html
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
>> * queue: http://nutch.apache.org
>>   maxThreads    = 1
>>   inProgress    = 0
>>   crawlDelay    = 5000
>>   minCrawlDelay = 0
>>   nextFetchTime = 1310466628578
>>   now           = 1310466626086
>>   0. http://nutch.apache.org/index.pdf
>>   1. http://nutch.apache.org/apidocs-1.2/index.html
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
>> * queue: http://nutch.apache.org
>>   maxThreads    = 1
>>   inProgress    = 0
>>   crawlDelay    = 5000
>>   minCrawlDelay = 0
>>   nextFetchTime = 1310466628578
>>   now           = 1310466627088
>>   0. http://nutch.apache.org/index.pdf
>>   1. http://nutch.apache.org/apidocs-1.2/index.html
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
>> * queue: http://nutch.apache.org
>>   maxThreads    = 1
>>   inProgress    = 0
>>   crawlDelay    = 5000
>>   minCrawlDelay = 0
>>   nextFetchTime = 1310466628578
>>   now           = 1310466628089
>>   0. http://nutch.apache.org/index.pdf
>>   1. http://nutch.apache.org/apidocs-1.2/index.html
>> fetching http://nutch.apache.org/index.pdf
>> -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=1
>> * queue: http://nutch.apache.org
>>   maxThreads    = 1
>>   inProgress    = 1
>>   crawlDelay    = 5000
>>   minCrawlDelay = 0
>>   nextFetchTime = 1310466628578
>>   now           = 1310466629090
>>   0. http://nutch.apache.org/apidocs-1.2/index.html
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
>> * queue: http://nutch.apache.org
>>   maxThreads    = 1
>>   inProgress    = 0
>>   crawlDelay    = 5000
>>   minCrawlDelay = 0
>>   nextFetchTime = 1310466634844
>>   now           = 1310466630092
>>   0. http://nutch.apache.org/apidocs-1.2/index.html
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
>> * queue: http://nutch.apache.org
>>   maxThreads    = 1
>>   inProgress    = 0
>>   crawlDelay    = 5000
>>   minCrawlDelay = 0
>>   nextFetchTime = 1310466634844
>>   now           = 1310466631094
>>   0. http://nutch.apache.org/apidocs-1.2/index.html
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
>> * queue: http://nutch.apache.org
>>   maxThreads    = 1
>>   inProgress    = 0
>>   crawlDelay    = 5000
>>   minCrawlDelay = 0
>>   nextFetchTime = 1310466634844
>>   now           = 1310466632095
>>   0. http://nutch.apache.org/apidocs-1.2/index.html
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
>> * queue: http://nutch.apache.org
>>   maxThreads    = 1
>>   inProgress    = 0
>>   crawlDelay    = 5000
>>   minCrawlDelay = 0
>>   nextFetchTime = 1310466634844
>>   now           = 1310466633097
>>   0. http://nutch.apache.org/apidocs-1.2/index.html
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
>> * queue: http://nutch.apache.org
>>   maxThreads    = 1
>>   inProgress    = 0
>>   crawlDelay    = 5000
>>   minCrawlDelay = 0
>>   nextFetchTime = 1310466634844
>>   now           = 1310466634099
>>   0. http://nutch.apache.org/apidocs-1.2/index.html
>> fetching http://nutch.apache.org/apidocs-1.2/index.html
>> -finishing thread FetcherThread, activeThreads=9
>> -finishing thread FetcherThread, activeThreads=8
>> -finishing thread FetcherThread, activeThreads=7
>> -finishing thread FetcherThread, activeThreads=6
>> -finishing thread FetcherThread, activeThreads=5
>> -activeThreads=5, spinWaiting=4, fetchQueues.totalSize=0
>> -finishing thread FetcherThread, activeThreads=4
>> -finishing thread FetcherThread, activeThreads=3
>> -finishing thread FetcherThread, activeThreads=2
>> -finishing thread FetcherThread, activeThreads=1
>> -finishing thread FetcherThread, activeThreads=0
>> -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0
>> -activeThreads=0
>> Fetcher: finished at 2011-07-12 12:30:37, elapsed: 00:01:27
>> ParseSegment: starting at 2011-07-12 12:30:37
>> ParseSegment: segment:
>> /Users/toom/Downloads/nutch-1.3/sites/segments/20110712122908
>> Error parsing: http://nutch.apache.org/skin/breadcrumbs.js:
>> failed(2,0): Can't retrieve Tika parser for mime-type
>> application/javascript
>> Error parsing: http://nutch.apache.org/skin/fontsize.js: failed(2,0):
>> Can't retrieve Tika parser for mime-type application/javascript
>> Error parsing: http://nutch.apache.org/skin/getBlank.js: failed(2,0):
>> Can't retrieve Tika parser for mime-type application/javascript
>> Error parsing: http://nutch.apache.org/skin/getMenu.js: failed(2,0):
>> Can't retrieve Tika parser for mime-type application/javascript
>> ParseSegment: finished at 2011-07-12 12:30:46, elapsed: 00:00:08
>> CrawlDb update: starting at 2011-07-12 12:30:46
>> CrawlDb update: db: /Users/toom/Downloads/nutch-1.3/sites/crawldb
>> CrawlDb update: segments:
>> [/Users/toom/Downloads/nutch-1.3/sites/segments/20110712122908]
>> CrawlDb update: additions allowed: true
>> CrawlDb update: URL normalizing: true
>> CrawlDb update: URL filtering: true
>> CrawlDb update: Merging segment data into db.
>> CrawlDb update: finished at 2011-07-12 12:30:48, elapsed: 00:00:02
>> Generator: starting at 2011-07-12 12:30:48
>> Generator: Selecting best-scoring urls due for fetch.
>> Generator: filtering: true
>> Generator: normalizing: true
>> Generator: topN: 50
>> Generator: jobtracker is 'local', generating exactly one partition.
>> Generator: Partitioning selected urls for politeness.
>> Generator: segment:
>> /Users/toom/Downloads/nutch-1.3/sites/segments/20110712123051
>> Generator: finished at 2011-07-12 12:30:52, elapsed: 00:00:03
>> Fetcher: Your 'http.agent.name' value should be listed first in
>> 'http.robots.agents' property.
>> Fetcher: starting at 2011-07-12 12:30:52
>> Fetcher: segment:
>> /Users/toom/Downloads/nutch-1.3/sites/segments/20110712123051 Fetcher:
>> threads: 10
>> QueueFeeder finished: total 50 records + hit by time limit :0
>> fetching http://www.onehippo.com/
>> fetching http://apacheconeu.blogspot.com/
>> fetching http://www.day.com/
>> fetching http://www.func.nl/apacheconeu2009
>> fetching http://www.thawte.com/
>> fetching http://eu.apachecon.com/c/aceu2009/about
>> fetching http://www.us.apachecon.com/c/acus2009/sessions/333
>> fetching http://www.joost.com/
>> fetching http://developer.yahoo.com/blogs/hadoop/
>> fetching http://www.springsource.com/
>> fetching http://www.isi.edu/~koehn/europarl/
>> fetching http://www.topicus.nl/
>> fetching http://opensource.hp.com/
>> fetching http://nutch.apache.org/apidocs-1.3/overview-frame.html
>> -activeThreads=10, spinWaiting=0, fetchQueues.totalSize=36
>> fetching http://www.haloworldwide.com/
>> fetching https://builds.apache.org/job/Nutch-trunk/javadoc/
>> fetch of https://builds.apache.org/job/Nutch-trunk/javadoc/ failed
>> with: org.apache.nutch.protocol.ProtocolNotFound: protocol not found
>> for url=https
>> fetching http://www.hotwaxmedia.com/
>> fetching http://lucene.apache.org/hadoop
>> fetching http://www.cloudera.com/
>> fetching http://code.google.com/opensource/
>> fetching http://www.lucidimagination.com/
>> fetching http://apache.lehtivihrea.org/nutch/
>> fetching http://www.eu.apachecon.com/c/aceu2009/about/meetups
>> -activeThreads=10, spinWaiting=4, fetchQueues.totalSize=27
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=27
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=27
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=27
>> fetching http://www.us.apachecon.com/c/acus2009/sessions/334
>> -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=26
>> fetching http://nutch.apache.org/apidocs-1.2/allclasses-frame.html
>> fetching http://eu.apachecon.com/c/aceu2009/about/crowdvine
>> -activeThreads=10, spinWaiting=8, fetchQueues.totalSize=24
>> fetching http://www.eu.apachecon.com/c/aceu2009/about/videoStreaming
>> -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=23
>> -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=23
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=23
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=23
>> fetching http://www.us.apachecon.com/c/acus2009/sessions/335
>> fetching http://nutch.apache.org/apidocs-1.2/overview-summary.html
>> -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=21
>> fetching http://eu.apachecon.com/c/aceu2009/speakers
>> -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=20
>> -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=20
>> fetching http://www.eu.apachecon.com/c/aceu2009/sponsors/sponsor
>> -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=19
>> -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=19
>> fetching http://www.us.apachecon.com/c/acus2009/sessions/461
>> -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=18
>> fetching http://nutch.apache.org/apidocs-1.3/allclasses-frame.html
>> -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=17
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=17
>> fetching http://eu.apachecon.com/c/aceu2009/articles
>> -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=16
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=16
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=16
>> fetching http://www.us.apachecon.com/c/acus2009/sessions/427
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=15
>> fetching http://nutch.apache.org/apidocs-1.2/overview-frame.html
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=14
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=14
>> fetching http://eu.apachecon.com/c/aceu2009/sessions/
>> -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=13
>> -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=13
>> fetching http://www.us.apachecon.com/c/acus2009/sessions/430
>> -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=12
>> fetching http://nutch.apache.org/apidocs-1.3/overview-summary.html
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=11
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=11
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=11
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=11
>> fetching http://eu.apachecon.com/c/aceu2009/sponsors/sponsors
>> -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=10
>> fetching http://www.us.apachecon.com/c/acus2009/sessions/375
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=9
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=9
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=9
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=9
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=9
>> fetching http://eu.apachecon.com/c/
>> fetching http://www.us.apachecon.com/c/acus2009/sessions/462
>> -activeThreads=10, spinWaiting=8, fetchQueues.totalSize=7
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7
>> fetching http://www.us.apachecon.com/c/acus2009/sessions/428
>> fetching http://eu.apachecon.com/c/aceu2009/schedule
>> -activeThreads=10, spinWaiting=8, fetchQueues.totalSize=5
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=5
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=5
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=5
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=5
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=5
>> fetching http://www.us.apachecon.com/c/acus2009/sessions/331
>> fetching http://eu.apachecon.com/c/aceu2009/
>> -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=3
>> * queue: http://eu.apachecon.com
>>   maxThreads    = 1
>>   inProgress    = 1
>>   crawlDelay    = 5000
>>   minCrawlDelay = 0
>>   nextFetchTime = 1310466704235
>>   now           = 1310466704428
>>   0. http://eu.apachecon.com/js/jquery.akslideshow.js
>> * queue: http://www.us.apachecon.com
>>   maxThreads    = 1
>>   inProgress    = 0
>>   crawlDelay    = 5000
>>   minCrawlDelay = 0
>>   nextFetchTime = 1310466709214
>>   now           = 1310466704428
>>   0. http://www.us.apachecon.com/c/acus2009/sessions/437
>>   1. http://www.us.apachecon.com/c/acus2009/sessions/332
>> -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=3
>> * queue: http://eu.apachecon.com
>>   maxThreads    = 1
>>   inProgress    = 1
>>   crawlDelay    = 5000
>>   minCrawlDelay = 0
>>   nextFetchTime = 1310466704235
>>   now           = 1310466705429
>>   0. http://eu.apachecon.com/js/jquery.akslideshow.js
>> * queue: http://www.us.apachecon.com
>>   maxThreads    = 1
>>   inProgress    = 0
>>   crawlDelay    = 5000
>>   minCrawlDelay = 0
>>   nextFetchTime = 1310466709214
>>   now           = 1310466705430
>>   0. http://www.us.apachecon.com/c/acus2009/sessions/437
>>   1. http://www.us.apachecon.com/c/acus2009/sessions/332
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
>> * queue: http://eu.apachecon.com
>>   maxThreads    = 1
>>   inProgress    = 0
>>   crawlDelay    = 5000
>>   minCrawlDelay = 0
>>   nextFetchTime = 1310466710968
>>   now           = 1310466706431
>>   0. http://eu.apachecon.com/js/jquery.akslideshow.js
>> * queue: http://www.us.apachecon.com
>>   maxThreads    = 1
>>   inProgress    = 0
>>   crawlDelay    = 5000
>>   minCrawlDelay = 0
>>   nextFetchTime = 1310466709214
>>   now           = 1310466706431
>>   0. http://www.us.apachecon.com/c/acus2009/sessions/437
>>   1. http://www.us.apachecon.com/c/acus2009/sessions/332
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
>> * queue: http://eu.apachecon.com
>>   maxThreads    = 1
>>   inProgress    = 0
>>   crawlDelay    = 5000
>>   minCrawlDelay = 0
>>   nextFetchTime = 1310466710968
>>   now           = 1310466707433
>>   0. http://eu.apachecon.com/js/jquery.akslideshow.js
>> * queue: http://www.us.apachecon.com
>>   maxThreads    = 1
>>   inProgress    = 0
>>   crawlDelay    = 5000
>>   minCrawlDelay = 0
>>   nextFetchTime = 1310466709214
>>   now           = 1310466707433
>>   0. http://www.us.apachecon.com/c/acus2009/sessions/437
>>   1. http://www.us.apachecon.com/c/acus2009/sessions/332
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
>> * queue: http://eu.apachecon.com
>>   maxThreads    = 1
>>   inProgress    = 0
>>   crawlDelay    = 5000
>>   minCrawlDelay = 0
>>   nextFetchTime = 1310466710968
>>   now           = 1310466708435
>>   0. http://eu.apachecon.com/js/jquery.akslideshow.js
>> * queue: http://www.us.apachecon.com
>>   maxThreads    = 1
>>   inProgress    = 0
>>   crawlDelay    = 5000
>>   minCrawlDelay = 0
>>   nextFetchTime = 1310466709214
>>   now           = 1310466708435
>>   0. http://www.us.apachecon.com/c/acus2009/sessions/437
>>   1. http://www.us.apachecon.com/c/acus2009/sessions/332
>> fetching http://www.us.apachecon.com/c/acus2009/sessions/437
>> -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=2
>> * queue: http://eu.apachecon.com
>>   maxThreads    = 1
>>   inProgress    = 0
>>   crawlDelay    = 5000
>>   minCrawlDelay = 0
>>   nextFetchTime = 1310466710968
>>   now           = 1310466709442
>>   0. http://eu.apachecon.com/js/jquery.akslideshow.js
>> * queue: http://www.us.apachecon.com
>>   maxThreads    = 1
>>   inProgress    = 1
>>   crawlDelay    = 5000
>>   minCrawlDelay = 0
>>   nextFetchTime = 1310466709214
>>   now           = 1310466709442
>>   0. http://www.us.apachecon.com/c/acus2009/sessions/332
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
>> * queue: http://eu.apachecon.com
>>   maxThreads    = 1
>>   inProgress    = 0
>>   crawlDelay    = 5000
>>   minCrawlDelay = 0
>>   nextFetchTime = 1310466710968
>>   now           = 1310466710444
>>   0. http://eu.apachecon.com/js/jquery.akslideshow.js
>> * queue: http://www.us.apachecon.com
>>   maxThreads    = 1
>>   inProgress    = 0
>>   crawlDelay    = 5000
>>   minCrawlDelay = 0
>>   nextFetchTime = 1310466714813
>>   now           = 1310466710444
>>   0. http://www.us.apachecon.com/c/acus2009/sessions/332
>> fetching http://eu.apachecon.com/js/jquery.akslideshow.js
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
>> * queue: http://www.us.apachecon.com
>>   maxThreads    = 1
>>   inProgress    = 0
>>   crawlDelay    = 5000
>>   minCrawlDelay = 0
>>   nextFetchTime = 1310466714813
>>   now           = 1310466711446
>>   0. http://www.us.apachecon.com/c/acus2009/sessions/332
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
>> * queue: http://www.us.apachecon.com
>>   maxThreads    = 1
>>   inProgress    = 0
>>   crawlDelay    = 5000
>>   minCrawlDelay = 0
>>   nextFetchTime = 1310466714813
>>   now           = 1310466712447
>>   0. http://www.us.apachecon.com/c/acus2009/sessions/332
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
>> * queue: http://www.us.apachecon.com
>>   maxThreads    = 1
>>   inProgress    = 0
>>   crawlDelay    = 5000
>>   minCrawlDelay = 0
>>   nextFetchTime = 1310466714813
>>   now           = 1310466713448
>>   0. http://www.us.apachecon.com/c/acus2009/sessions/332
>> -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
>> * queue: http://www.us.apachecon.com
>>   maxThreads    = 1
>>   inProgress    = 0
>>   crawlDelay    = 5000
>>   minCrawlDelay = 0
>>   nextFetchTime = 1310466714813
>>   now           = 1310466714450
>>   0. http://www.us.apachecon.com/c/acus2009/sessions/332
>> fetching http://www.us.apachecon.com/c/acus2009/sessions/332
>> -finishing thread FetcherThread, activeThreads=9
>> -finishing thread FetcherThread, activeThreads=8
>> -finishing thread FetcherThread, activeThreads=7
>> -finishing thread FetcherThread, activeThreads=6
>> -finishing thread FetcherThread, activeThreads=5
>> -finishing thread FetcherThread, activeThreads=4
>> -finishing thread FetcherThread, activeThreads=3
>> -finishing thread FetcherThread, activeThreads=2
>> -finishing thread FetcherThread, activeThreads=1
>> -finishing thread FetcherThread, activeThreads=0
>> -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0
>> -activeThreads=0
>> Fetcher: finished at 2011-07-12 12:31:55, elapsed: 00:01:03
>> ParseSegment: starting at 2011-07-12 12:31:55
>> ParseSegment: segment:
>> /Users/toom/Downloads/nutch-1.3/sites/segments/20110712123051
>> Error parsing: http://eu.apachecon.com/js/jquery.akslideshow.js:
>> failed(2,0): Can't retrieve Tika parser for mime-type text/javascript
>> ParseSegment: finished at 2011-07-12 12:31:59, elapsed: 00:00:03
>> CrawlDb update: starting at 2011-07-12 12:31:59
>> CrawlDb update: db: /Users/toom/Downloads/nutch-1.3/sites/crawldb
>> CrawlDb update: segments:
>> [/Users/toom/Downloads/nutch-1.3/sites/segments/20110712123051]
>> CrawlDb update: additions allowed: true
>> CrawlDb update: URL normalizing: true
>> CrawlDb update: URL filtering: true
>> CrawlDb update: Merging segment data into db.
>> CrawlDb update: finished at 2011-07-12 12:32:03, elapsed: 00:00:03
>> LinkDb: starting at 2011-07-12 12:32:03
>> LinkDb: linkdb: /Users/toom/Downloads/nutch-1.3/sites/linkdb
>> LinkDb: URL normalize: true
>> LinkDb: URL filter: true
>> LinkDb: adding segment:
>> file:/Users/toom/Downloads/nutch-1.3/sites/segments/20110707140238
>> LinkDb: adding segment:
>> file:/Users/toom/Downloads/nutch-1.3/sites/segments/20110712113732
>> LinkDb: adding segment:
>> file:/Users/toom/Downloads/nutch-1.3/sites/segments/20110712114256
>> LinkDb: adding segment:
>> file:/Users/toom/Downloads/nutch-1.3/sites/segments/20110712122856
>> LinkDb: adding segment:
>> file:/Users/toom/Downloads/nutch-1.3/sites/segments/20110712122908
>> LinkDb: adding segment:
>> file:/Users/toom/Downloads/nutch-1.3/sites/segments/20110712123051
>> Exception in thread "main"
>> org.apache.hadoop.mapred.InvalidInputException: Input path does not
>> exist:
>> file:/Users/toom/Downloads/nutch-1.3/sites/segments/20110707140238/parse_d
>> ata Input path does not exist:
>> file:/Users/toom/Downloads/nutch-1.3/sites/segments/20110712113732/parse_da
>> ta Input path does not exist:
>> file:/Users/toom/Downloads/nutch-1.3/sites/segments/20110712114256/parse_da
>> ta at
>> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:1
>> 90) at
>> org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileIn
>> putFormat.java:44) at
>> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:20
>> 1) at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
>> at
>> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
>> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730) at
>> org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249) at
>> org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:175)
>>       at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:149)
>>       at org.apache.nutch.crawl.Crawl.run(Crawl.java:142)
>>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>       at org.apache.nutch.crawl.Crawl.main(Crawl.java:54)
>>
>> 2011/7/12 Julien Nioche <[email protected]>:
>> >> Actually I'm not shure if I look at the right log lines. Please
>> >> explain in more detail for what exactly I should look for. Anyway I
>> >> found the following line just before the error:
>> >>
>> >> Error parsing: http://eu.apachecon.com/js/jquery.akslideshow.js:
>> >> failed(2,0): Can't retrieve Tika parser for mime-type text/javascript
>> >>
>> >> But I can see that parsing erros like this already appeared earlier
>> >> during the crawl.
>> >
>> > This simply means that the javascript parser is not enabled in your conf
>> > (which is the default behaviour) and as a consequence the default parser
>> > (Tika) was used to try and parse it but has no resources for doing so.
>> >
>> > Note : we should probably add .js to the default url filters. The
>> > javascript parser has been deactivated by default because it generates
>> > atrocious URLs so we might as well prevent such URLs form being fetched
>> > in the first place.
>> >
>> > Anyway this is not the source of the problem. You seem to have unparsed
>> > segments among the ones specified. Could be that you interrupted a
>> > previous crawl or got a problem with it and did not delete these
>> > segments or the whole crawl directory. Removing the segments and calling
>> > the last couple of steps manually should do the trick.
>> >
>> >> 2011/7/12 Markus Jelsma <[email protected]>:
>> >> > Were there errors during parsing of that last segment?
>> >> >
>> >> >> I'm starting with nutch and I ran a simple job as described in the
>> >> >> nutch tutorial. After a while I get the following error:
>> >> >>
>> >> >>
>> >> >> CrawlDb update: URL filtering: true
>> >> >> CrawlDb update: Merging segment data into db.
>> >> >> CrawlDb update: finished at 2011-07-12 12:32:03, elapsed: 00:00:03
>> >> >> LinkDb: starting at 2011-07-12 12:32:03
>> >> >> LinkDb: linkdb: /Users/toom/Downloads/nutch-1.3/sites/linkdb
>> >> >> LinkDb: URL normalize: true
>> >> >> LinkDb: URL filter: true
>> >> >> LinkDb: adding segment:
>> >> >> file:/Users/toom/Downloads/nutch-1.3/sites/segments/20110707140238
>> >> >> LinkDb: adding segment:
>> >> >> file:/Users/toom/Downloads/nutch-1.3/sites/segments/20110712113732
>> >> >> LinkDb: adding segment:
>> >> >> file:/Users/toom/Downloads/nutch-1.3/sites/segments/20110712114256
>> >> >> LinkDb: adding segment:
>> >> >> file:/Users/toom/Downloads/nutch-1.3/sites/segments/20110712122856
>> >> >> LinkDb: adding segment:
>> >> >> file:/Users/toom/Downloads/nutch-1.3/sites/segments/20110712122908
>> >> >> LinkDb: adding segment:
>> >> >> file:/Users/toom/Downloads/nutch-1.3/sites/segments/20110712123051
>> >> >> Exception in thread "main"
>> >> >> org.apache.hadoop.mapred.InvalidInputException: Input path does not
>> >>
>> >> >> exist:
>> >> file:/Users/toom/Downloads/nutch-1.3/sites/segments/20110707140238/parse
>> >> _d
>> >>
>> >> >> ata Input path does not exist:
>> >> file:/Users/toom/Downloads/nutch-1.3/sites/segments/20110712113732/parse
>> >> _da
>> >>
>> >> >> ta Input path does not exist:
>> >> file:/Users/toom/Downloads/nutch-1.3/sites/segments/20110712114256/parse
>> >> _da
>> >>
>> >> >> ta at
>> >>
>> >> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java
>> >> :1
>> >>
>> >> >> 90) at
>> >>
>> >> org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFile
>> >> In
>> >>
>> >> >> putFormat.java:44) at
>> >>
>> >> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:
>> >> 20
>> >>
>> >> >> 1) at
>> >>
>> >> org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
>> >>
>> >> >> at
>> >> >> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:7
>> >> >> 81) at
>> >> >> org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730) at
>> >> >> org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249) at
>> >> >> org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:175)
>> >> >>       at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:149)
>> >> >>       at org.apache.nutch.crawl.Crawl.run(Crawl.java:142)
>> >> >>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>> >> >>       at org.apache.nutch.crawl.Crawl.main(Crawl.java:54)
>> >
>> > --
>> > *
>> > *Open Source Solutions for Text Engineering
>> >
>> > http://digitalpebble.blogspot.com/
>> > http://www.digitalpebble.com
>
> --
> Markus Jelsma - CTO - Openindex
> http://www.linkedin.com/in/markus17
> 050-8536620 / 06-50258350
>

Reply via email to