Dear Tejas Patil

Here's a longer excerpt:

2013-05-09 02:05:02,011 INFO  crawl.Injector - Injector: starting at 2013-05-09 
02:05:02
2013-05-09 02:05:02,012 INFO  crawl.Injector - Injector: crawlDb: 
/opt/nutch/nutch_portal/crawldb
2013-05-09 02:05:02,012 INFO  crawl.Injector - Injector: urlDir: /opt/nutch/urls
2013-05-09 02:05:02,012 INFO  crawl.Injector - Injector: Converting injected 
urls to crawl db entries.
2013-05-09 02:05:02,167 WARN  util.NativeCodeLoader - Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable
2013-05-09 02:05:02,187 WARN  snappy.LoadSnappy - Snappy native library not 
loaded
2013-05-09 02:05:02,518 INFO  plugin.PluginRepository - Plugins: looking in: 
/opt/nutch/plugins
2013-05-09 02:05:02,573 INFO  plugin.PluginRepository - Plugin Auto-activation 
mode: [true]
2013-05-09 02:05:02,573 INFO  plugin.PluginRepository - Registered Plugins:
2013-05-09 02:05:02,573 INFO  plugin.PluginRepository -         the nutch core 
extension points (nutch-extensionpoints)
2013-05-09 02:05:02,573 INFO  plugin.PluginRepository -         Basic URL 
Normalizer (urlnormalizer-basic)
2013-05-09 02:05:02,573 INFO  plugin.PluginRepository -         Basic Indexing 
Filter (index-basic)
2013-05-09 02:05:02,573 INFO  plugin.PluginRepository -         Html Parse 
Plug-in (parse-html)
2013-05-09 02:05:02,573 INFO  plugin.PluginRepository -         HTTP Framework 
(lib-http)
2013-05-09 02:05:02,573 INFO  plugin.PluginRepository -         Regex URL 
Filter (urlfilter-regex)
2013-05-09 02:05:02,573 INFO  plugin.PluginRepository -         Pass-through 
URL Normalizer (urlnormalizer-pass)
2013-05-09 02:05:02,573 INFO  plugin.PluginRepository -         Http Protocol 
Plug-in (protocol-http)
2013-05-09 02:05:02,573 INFO  plugin.PluginRepository -         Regex URL 
Normalizer (urlnormalizer-regex)
2013-05-09 02:05:02,573 INFO  plugin.PluginRepository -         CyberNeko HTML 
Parser (lib-nekohtml)
2013-05-09 02:05:02,573 INFO  plugin.PluginRepository -         OPIC Scoring 
Plug-in (scoring-opic)
2013-05-09 02:05:02,573 INFO  plugin.PluginRepository -         Tika Parser 
Plug-in (parse-tika)
2013-05-09 02:05:02,573 INFO  plugin.PluginRepository -         Anchor Indexing 
Filter (index-anchor)
2013-05-09 02:05:02,573 INFO  plugin.PluginRepository -         URL Meta 
Indexing Filter (urlmeta)
2013-05-09 02:05:02,573 INFO  plugin.PluginRepository -         Regex URL 
Filter Framework (lib-regex-filter)
2013-05-09 02:05:02,573 INFO  plugin.PluginRepository -         MetaTags 
(parse-metatags)
2013-05-09 02:05:02,573 INFO  plugin.PluginRepository -         Index Metadata 
(index-metadata)
2013-05-09 02:05:02,573 INFO  plugin.PluginRepository - Registered 
Extension-Points:
2013-05-09 02:05:02,573 INFO  plugin.PluginRepository -         Nutch URL 
Normalizer (org.apache.nutch.net.URLNormalizer)
2013-05-09 02:05:02,573 INFO  plugin.PluginRepository -         Nutch Protocol 
(org.apache.nutch.protocol.Protocol)
2013-05-09 02:05:02,573 INFO  plugin.PluginRepository -         Nutch Segment 
Merge Filter (org.apache.nutch.segment.SegmentMergeFilter)
2013-05-09 02:05:02,573 INFO  plugin.PluginRepository -         Nutch URL 
Filter (org.apache.nutch.net.URLFilter)
2013-05-09 02:05:02,573 INFO  plugin.PluginRepository -         Nutch Indexing 
Filter (org.apache.nutch.indexer.IndexingFilter)
2013-05-09 02:05:02,573 INFO  plugin.PluginRepository -         HTML Parse 
Filter (org.apache.nutch.parse.HtmlParseFilter)
2013-05-09 02:05:02,573 INFO  plugin.PluginRepository -         Nutch Content 
Parser (org.apache.nutch.parse.Parser)
2013-05-09 02:05:02,573 INFO  plugin.PluginRepository -         Nutch Scoring 
(org.apache.nutch.scoring.ScoringFilter)
2013-05-09 02:05:02,597 INFO  regex.RegexURLNormalizer - can't find rules for 
scope 'inject', using default
2013-05-09 02:05:05,473 INFO  regex.RegexURLNormalizer - can't find rules for 
scope 'inject', using default
2013-05-09 02:05:08,528 INFO  regex.RegexURLNormalizer - can't find rules for 
scope 'inject', using default

------------ snip -------------------

2013-05-09 03:57:56,963 INFO  fetcher.Fetcher - fetch of 
http://www.erz.be.ch/erz/fr/index/kultur/kulturfoerderung/fotografie/ankaeufe_von_fotografien.html
 failed with: java.lang.NullPointerException
        at java.lang.System.arraycopy(Native Method)
        at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1277)
        at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1194)
        at java.io.DataOutputStream.writeByte(DataOutputStream.java:153)
        at org.apache.hadoop.io.WritableUtils.writeVLong(WritableUtils.java:264)
        at org.apache.hadoop.io.WritableUtils.writeVInt(WritableUtils.java:244)
        at org.apache.hadoop.io.Text.write(Text.java:281)
        at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
        at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
        at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1061)
        at 
org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:592)
        at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:976)
        at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:714)

2013-05-09 03:57:56,963 INFO  fetcher.Fetcher - fetch of 
http://www.writtenby.ch/suchen/autoren.php?page=8&show=1 failed with: 
java.lang.NullPointerException
        at java.lang.System.arraycopy(Native Method)
        at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1277)
        at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1194)
        at java.io.DataOutputStream.writeByte(DataOutputStream.java:153)
        at org.apache.hadoop.io.WritableUtils.writeVLong(WritableUtils.java:264)
        at org.apache.hadoop.io.WritableUtils.writeVInt(WritableUtils.java:244)
        at org.apache.hadoop.io.Text.write(Text.java:281)
        at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
        at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
        at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1061)
        at 
org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:592)
        at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:976)
        at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:714)

2013-05-09 03:57:56,963 ERROR fetcher.Fetcher - fetcher 
caught:java.lang.NullPointerException
2013-05-09 03:57:56,963 INFO  fetcher.Fetcher - -finishing thread 
FetcherThread, activeThreads=81
2013-05-09 03:57:56,963 ERROR fetcher.Fetcher - fetcher 
caught:java.lang.NullPointerException
2013-05-09 03:57:56,963 WARN  fetcher.Fetcher - Attempting to finish item from 
unknown queue: org.apache.nutch.fetcher.Fetcher$FetchItem@504e3c43
2013-05-09 03:57:56,967 INFO  fetcher.Fetcher - -finishing thread 
FetcherThread, activeThreads=28
2013-05-09 03:57:56,967 INFO  fetcher.Fetcher - -finishing thread 
FetcherThread, activeThreads=29
2013-05-09 03:57:56,967 ERROR fetcher.Fetcher - fetcher 
caught:java.lang.NullPointerException
2013-05-09 03:57:56,967 ERROR fetcher.Fetcher - fetcher 
caught:java.lang.NullPointerException
2013-05-09 03:57:56,967 ERROR fetcher.Fetcher - fetcher 
caught:java.lang.NullPointerException
2013-05-09 03:57:56,967 INFO  fetcher.Fetcher - -finishing thread 
FetcherThread, activeThreads=26
2013-05-09 03:57:56,967 INFO  fetcher.Fetcher - -finishing thread 
FetcherThread, activeThreads=30
2013-05-09 03:57:56,967 ERROR fetcher.Fetcher - fetcher 
caught:java.lang.NullPointerException
2013-05-09 03:57:56,967 INFO  fetcher.Fetcher - fetch of 
http://www.baden.ch/default.cfm?DomainID=1&TreeID=2257&language=de failed with: 
java.lang.NullPointerException
        at java.lang.System.arraycopy(Native Method)
        at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1277)
        at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1194)
        at java.io.DataOutputStream.writeByte(DataOutputStream.java:153)
        at org.apache.hadoop.io.WritableUtils.writeVLong(WritableUtils.java:264)
        at org.apache.hadoop.io.WritableUtils.writeVInt(WritableUtils.java:244)
        at org.apache.hadoop.io.Text.write(Text.java:281)
        at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
        at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
        at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1061)
        at 
org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:592)
        at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:976)
        at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:923)
        at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:760)

2013-05-09 03:57:56,967 WARN  fetcher.Fetcher - Attempting to finish item from 
unknown queue: org.apache.nutch.fetcher.Fetcher$FetchItem@68aa41cd
2013-05-09 03:57:56,967 ERROR fetcher.Fetcher - fetcher 
caught:java.lang.NullPointerException
2013-05-09 03:57:56,967 ERROR fetcher.Fetcher - fetcher 
caught:java.lang.NullPointerException
2013-05-09 03:57:56,967 ERROR fetcher.Fetcher - fetcher 
caught:java.lang.NullPointerException
2013-05-09 03:57:56,967 INFO  fetcher.Fetcher - -finishing thread 
FetcherThread, activeThreads=32
2013-05-09 03:57:56,968 WARN  fetcher.Fetcher - Attempting to finish item from 
unknown queue: org.apache.nutch.fetcher.Fetcher$FetchItem@562e7f78
2013-05-09 03:57:56,966 INFO  fetcher.Fetcher - -finishing thread 
FetcherThread, activeThreads=33
2013-05-09 03:57:56,966 WARN  fetcher.Fetcher - Attempting to finish item from 
unknown queue: org.apache.nutch.fetcher.Fetcher$FetchItem@3e1f9310
2013-05-09 03:57:56,966 INFO  fetcher.Fetcher - -finishing thread 
FetcherThread, activeThreads=35
2013-05-09 03:57:56,966 ERROR fetcher.Fetcher - fetcher 
caught:java.lang.NullPointerException
2013-05-09 03:57:56,966 ERROR fetcher.Fetcher - fetcher 
caught:java.lang.NullPointerException
2013-05-09 03:57:56,968 INFO  fetcher.Fetcher - -finishing thread 
FetcherThread, activeThreads=21
2013-05-09 03:57:56,968 WARN  fetcher.Fetcher - Attempting to finish item from 
unknown queue: org.apache.nutch.fetcher.Fetcher$FetchItem@179d4544
2013-05-09 03:57:56,966 ERROR fetcher.Fetcher - fetcher 
caught:java.lang.NullPointerException
2013-05-09 03:57:56,966 ERROR fetcher.Fetcher - fetcher 
caught:java.lang.NullPointerException
2013-05-09 03:57:56,968 INFO  fetcher.Fetcher - -finishing thread 
FetcherThread, activeThreads=19
2013-05-09 03:57:56,966 INFO  fetcher.Fetcher - -finishing thread 
FetcherThread, activeThreads=38
2013-05-09 03:57:56,966 INFO  fetcher.Fetcher - -finishing thread 
FetcherThread, activeThreads=39
2013-05-09 03:57:56,966 INFO  fetcher.Fetcher - -finishing thread 
FetcherThread, activeThreads=40
2013-05-09 03:57:56,966 ERROR fetcher.Fetcher - fetcher 
caught:java.lang.NullPointerException
2013-05-09 03:57:56,966 INFO  fetcher.Fetcher - -finishing thread 
FetcherThread, activeThreads=44
2013-05-09 03:57:56,968 INFO  fetcher.Fetcher - -finishing thread 
FetcherThread, activeThreads=17
2013-05-09 03:57:56,966 ERROR fetcher.Fetcher - fetcher 
caught:java.lang.NullPointerException
2013-05-09 03:57:56,966 ERROR fetcher.Fetcher - fetcher 
caught:java.lang.NullPointerException
2013-05-09 03:57:56,966 WARN  fetcher.Fetcher - Attempting to finish item from 
unknown queue: org.apache.nutch.fetcher.Fetcher$FetchItem@2be0c055
2013-05-09 03:57:56,966 ERROR fetcher.Fetcher - fetcher 
caught:java.lang.NullPointerException
2013-05-09 03:57:56,966 INFO  fetcher.Fetcher - -finishing thread 
FetcherThread, activeThreads=46
2013-05-09 03:57:56,968 INFO  fetcher.Fetcher - -finishing thread 
FetcherThread, activeThreads=13
2013-05-09 03:57:56,966 INFO  fetcher.Fetcher - -finishing thread 
FetcherThread, activeThreads=48
2013-05-09 03:57:56,965 ERROR fetcher.Fetcher - fetcher 
caught:java.lang.NullPointerException
2013-05-09 03:57:56,965 ERROR fetcher.Fetcher - fetcher 
caught:java.lang.NullPointerException
2013-05-09 03:57:56,968 INFO  fetcher.Fetcher - -finishing thread 
FetcherThread, activeThreads=12
2013-05-09 03:57:56,968 INFO  fetcher.Fetcher - -finishing thread 
FetcherThread, activeThreads=11
2013-05-09 03:57:56,965 INFO  fetcher.Fetcher - -finishing thread 
FetcherThread, activeThreads=51
2013-05-09 03:57:56,965 INFO  fetcher.Fetcher - -finishing thread 
FetcherThread, activeThreads=53
2013-05-09 03:57:56,965 ERROR fetcher.Fetcher - fetcher 
caught:java.lang.NullPointerException
2013-05-09 03:57:56,965 INFO  fetcher.Fetcher - -finishing thread 
FetcherThread, activeThreads=56
2013-05-09 03:57:56,965 ERROR fetcher.Fetcher - fetcher 
caught:java.lang.NullPointerException
2013-05-09 03:57:56,965 INFO  fetcher.Fetcher - -finishing thread 
FetcherThread, activeThreads=57
2013-05-09 03:57:56,965 ERROR fetcher.Fetcher - fetcher 
caught:java.lang.NullPointerException
2013-05-09 03:57:56,965 INFO  fetcher.Fetcher - -finishing thread 
FetcherThread, activeThreads=58
2013-05-09 03:57:56,965 WARN  fetcher.Fetcher - Attempting to finish item from 
unknown queue: org.apache.nutch.fetcher.Fetcher$FetchItem@25a732a4
2013-05-09 03:57:56,965 ERROR fetcher.Fetcher - fetcher 
caught:java.lang.NullPointerException
2013-05-09 03:57:56,965 INFO  fetcher.Fetcher - -finishing thread 
FetcherThread, activeThreads=59
2013-05-09 03:57:56,965 ERROR fetcher.Fetcher - fetcher 
caught:java.lang.NullPointerException
2013-05-09 03:57:56,965 ERROR fetcher.Fetcher - fetcher 
caught:java.lang.NullPointerException
2013-05-09 03:57:56,969 INFO  fetcher.Fetcher - -finishing thread 
FetcherThread, activeThreads=5
2013-05-09 03:57:56,965 INFO  fetcher.Fetcher - -finishing thread 
FetcherThread, activeThreads=62
2013-05-09 03:57:56,969 INFO  fetcher.Fetcher - -finishing thread 
FetcherThread, activeThreads=4
2013-05-09 03:57:56,969 INFO  fetcher.Fetcher - -finishing thread 
FetcherThread, activeThreads=6
2013-05-09 03:57:56,969 INFO  fetcher.Fetcher - -finishing thread 
FetcherThread, activeThreads=7
2013-05-09 03:57:56,968 INFO  fetcher.Fetcher - -finishing thread 
FetcherThread, activeThreads=8
2013-05-09 03:57:56,968 INFO  fetcher.Fetcher - -finishing thread 
FetcherThread, activeThreads=9
2013-05-09 03:57:56,968 INFO  fetcher.Fetcher - -finishing thread 
FetcherThread, activeThreads=10
2013-05-09 03:57:56,968 INFO  fetcher.Fetcher - -finishing thread 
FetcherThread, activeThreads=14
2013-05-09 03:57:56,968 INFO  fetcher.Fetcher - -finishing thread 
FetcherThread, activeThreads=15
2013-05-09 03:57:56,968 INFO  fetcher.Fetcher - -finishing thread 
FetcherThread, activeThreads=16
2013-05-09 03:57:56,968 INFO  fetcher.Fetcher - -finishing thread 
FetcherThread, activeThreads=18
2013-05-09 03:57:56,968 INFO  fetcher.Fetcher - -finishing thread 
FetcherThread, activeThreads=20
2013-05-09 03:57:56,968 INFO  fetcher.Fetcher - -finishing thread 
FetcherThread, activeThreads=22
2013-05-09 03:57:56,968 INFO  fetcher.Fetcher - -finishing thread 
FetcherThread, activeThreads=23
2013-05-09 03:57:56,968 WARN  fetcher.Fetcher - Attempting to finish item from 
unknown queue: org.apache.nutch.fetcher.Fetcher$FetchItem@2ef33be4
2013-05-09 03:57:56,968 INFO  fetcher.Fetcher - -finishing thread 
FetcherThread, activeThreads=24
2013-05-09 03:57:56,969 INFO  fetcher.Fetcher - -finishing thread 
FetcherThread, activeThreads=3
2013-05-09 03:57:56,967 INFO  fetcher.Fetcher - -finishing thread 
FetcherThread, activeThreads=25
2013-05-09 03:57:56,967 WARN  fetcher.Fetcher - Attempting to finish item from 
unknown queue: org.apache.nutch.fetcher.Fetcher$FetchItem@181259d0
2013-05-09 03:57:56,967 INFO  fetcher.Fetcher - fetch of 
http://www.aargauer-literaturhaus.ch/typo3temp/stylesheet_342078aeca.css?1344437251
 failed with: java.lang.NullPointerException
        at java.lang.System.arraycopy(Native Method)
        at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1277)
        at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1194)
        at java.io.DataOutputStream.writeByte(DataOutputStream.java:153)
        at org.apache.hadoop.io.WritableUtils.writeVLong(WritableUtils.java:264)
        at org.apache.hadoop.io.WritableUtils.writeVInt(WritableUtils.java:244)
        at org.apache.hadoop.io.Text.write(Text.java:281)
        at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
        at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
        at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1061)
        at 
org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:592)
        at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:976)
        at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:714)

2013-05-09 03:57:56,967 WARN  fetcher.Fetcher - Attempting to finish item from 
unknown queue: org.apache.nutch.fetcher.Fetcher$FetchItem@307322f4
2013-05-09 03:57:56,967 INFO  fetcher.Fetcher - -finishing thread 
FetcherThread, activeThreads=27
2013-05-09 03:57:56,969 INFO  fetcher.Fetcher - -finishing thread 
FetcherThread, activeThreads=1
2013-05-09 03:57:56,969 ERROR fetcher.Fetcher - fetcher 
caught:java.lang.NullPointerException
2013-05-09 03:57:56,969 INFO  fetcher.Fetcher - -finishing thread 
FetcherThread, activeThreads=2
2013-05-09 03:57:56,969 WARN  fetcher.Fetcher - Attempting to finish item from 
unknown queue: org.apache.nutch.fetcher.Fetcher$FetchItem@504e3c43
2013-05-09 03:57:56,969 INFO  fetcher.Fetcher - -finishing thread 
FetcherThread, activeThreads=0
2013-05-09 03:57:57,942 ERROR fetcher.Fetcher - Fetcher: java.io.IOException: 
Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1265)
        at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1332)
        at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:1368)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:1341)



Hm… Too many threads (140) ?

I got many errors in the log, but only 1 IOException…

Can it be a workaround to remove the exit-code check of the fetcher in the 
bin/crawl script? Not really, I suppose…

Best
Urs


Am 09.05.2013 um 12:45 schrieb Tejas Patil <[email protected]>:

> Hey Urs,
> Please see the logs/hadoop.log file and share the stack trace of the
> exception.
> 
> 
> On Thu, May 9, 2013 at 3:36 AM, Urs Hofer <[email protected]> wrote:
> 
>> Dear List
>> 
>> i'm currently running the nutch crawl script with solr4.
>> additionally, I'm using the urlmeta plugin and I'm parsing keyword and
>> description
>> metadata as well. the crawl script runs in local mode. currently, I'm
>> seeding about 500 domains.
>> 
>> the crawl script runs without problems the first time. on a recrawl, it
>> dies with the error
>> 
>> 2013-05-09 03:57:56,967 WARN  fetcher.Fetcher - Attempting to finish item
>> from unknown queue: org.apache.nutch.fetcher.Fetcher$FetchItem@307322f4
>> 2013-05-09 03:57:56,967 INFO  fetcher.Fetcher - -finishing thread
>> FetcherThread, activeThreads=27
>> 2013-05-09 03:57:56,969 INFO  fetcher.Fetcher - -finishing thread
>> FetcherThread, activeThreads=1
>> 2013-05-09 03:57:56,969 ERROR fetcher.Fetcher - fetcher
>> caught:java.lang.NullPointerException
>> 2013-05-09 03:57:56,969 INFO  fetcher.Fetcher - -finishing thread
>> FetcherThread, activeThreads=2
>> 2013-05-09 03:57:56,969 WARN  fetcher.Fetcher - Attempting to finish item
>> from unknown queue: org.apache.nutch.fetcher.Fetcher$FetchItem@504e3c43
>> 2013-05-09 03:57:56,969 INFO  fetcher.Fetcher - -finishing thread
>> FetcherThread, activeThreads=0
>> 2013-05-09 03:57:57,942 ERROR fetcher.Fetcher - Fetcher:
>> java.io.IOException: Job failed!
>>        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1265)
>>        at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1332)
>>        at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:1368)
>>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>        at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:1341)
>> 
>> 
>> after the fetcher part.
>> 
>> I have a very liberal regex-urlfilter configuration:
>> 
>> -^(file|ftp|mailto):
>> 
>> -\.(mp3|MP3|gif|GIF|jpg|JPG|png|PNG|ico|ICO|css|CSS|sit|SIT|eps|EPS|wmf|WMF|zip|ZIP|ppt|PPT|mpg|MPG|xls|XLS|gz|GZ|rpm|RPM|tgz|TGZ|mov|MOV|exe|EXE|jpeg|JPEG|bmp|BMP|js|JS)$
>> +.
>> 
>> But I am restricting the crawl to db.ignore.external.links = true
>> Can it be because I've removed the line
>> 
>> # skip URLs containing certain characters as probable queries, etc.
>> -.*[?*!@=].*
>> 
>> in regex-urlfilter? I got several seed entries like index.php?language=fr
>> 
>> Since I'm new to nutch, I don't know where to search and how to continue.
>> Thanks for any help
>> Best regards
>> Urs Hofer

Reply via email to