Dear Tejas Patil Here's a longer excerpt:
2013-05-09 02:05:02,011 INFO crawl.Injector - Injector: starting at 2013-05-09 02:05:02 2013-05-09 02:05:02,012 INFO crawl.Injector - Injector: crawlDb: /opt/nutch/nutch_portal/crawldb 2013-05-09 02:05:02,012 INFO crawl.Injector - Injector: urlDir: /opt/nutch/urls 2013-05-09 02:05:02,012 INFO crawl.Injector - Injector: Converting injected urls to crawl db entries. 2013-05-09 02:05:02,167 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2013-05-09 02:05:02,187 WARN snappy.LoadSnappy - Snappy native library not loaded 2013-05-09 02:05:02,518 INFO plugin.PluginRepository - Plugins: looking in: /opt/nutch/plugins 2013-05-09 02:05:02,573 INFO plugin.PluginRepository - Plugin Auto-activation mode: [true] 2013-05-09 02:05:02,573 INFO plugin.PluginRepository - Registered Plugins: 2013-05-09 02:05:02,573 INFO plugin.PluginRepository - the nutch core extension points (nutch-extensionpoints) 2013-05-09 02:05:02,573 INFO plugin.PluginRepository - Basic URL Normalizer (urlnormalizer-basic) 2013-05-09 02:05:02,573 INFO plugin.PluginRepository - Basic Indexing Filter (index-basic) 2013-05-09 02:05:02,573 INFO plugin.PluginRepository - Html Parse Plug-in (parse-html) 2013-05-09 02:05:02,573 INFO plugin.PluginRepository - HTTP Framework (lib-http) 2013-05-09 02:05:02,573 INFO plugin.PluginRepository - Regex URL Filter (urlfilter-regex) 2013-05-09 02:05:02,573 INFO plugin.PluginRepository - Pass-through URL Normalizer (urlnormalizer-pass) 2013-05-09 02:05:02,573 INFO plugin.PluginRepository - Http Protocol Plug-in (protocol-http) 2013-05-09 02:05:02,573 INFO plugin.PluginRepository - Regex URL Normalizer (urlnormalizer-regex) 2013-05-09 02:05:02,573 INFO plugin.PluginRepository - CyberNeko HTML Parser (lib-nekohtml) 2013-05-09 02:05:02,573 INFO plugin.PluginRepository - OPIC Scoring Plug-in (scoring-opic) 2013-05-09 02:05:02,573 INFO plugin.PluginRepository - Tika Parser Plug-in (parse-tika) 2013-05-09 02:05:02,573 INFO plugin.PluginRepository - Anchor Indexing Filter (index-anchor) 2013-05-09 02:05:02,573 INFO plugin.PluginRepository - URL Meta Indexing Filter (urlmeta) 2013-05-09 02:05:02,573 INFO plugin.PluginRepository - Regex URL Filter Framework (lib-regex-filter) 2013-05-09 02:05:02,573 INFO plugin.PluginRepository - MetaTags (parse-metatags) 2013-05-09 02:05:02,573 INFO plugin.PluginRepository - Index Metadata (index-metadata) 2013-05-09 02:05:02,573 INFO plugin.PluginRepository - Registered Extension-Points: 2013-05-09 02:05:02,573 INFO plugin.PluginRepository - Nutch URL Normalizer (org.apache.nutch.net.URLNormalizer) 2013-05-09 02:05:02,573 INFO plugin.PluginRepository - Nutch Protocol (org.apache.nutch.protocol.Protocol) 2013-05-09 02:05:02,573 INFO plugin.PluginRepository - Nutch Segment Merge Filter (org.apache.nutch.segment.SegmentMergeFilter) 2013-05-09 02:05:02,573 INFO plugin.PluginRepository - Nutch URL Filter (org.apache.nutch.net.URLFilter) 2013-05-09 02:05:02,573 INFO plugin.PluginRepository - Nutch Indexing Filter (org.apache.nutch.indexer.IndexingFilter) 2013-05-09 02:05:02,573 INFO plugin.PluginRepository - HTML Parse Filter (org.apache.nutch.parse.HtmlParseFilter) 2013-05-09 02:05:02,573 INFO plugin.PluginRepository - Nutch Content Parser (org.apache.nutch.parse.Parser) 2013-05-09 02:05:02,573 INFO plugin.PluginRepository - Nutch Scoring (org.apache.nutch.scoring.ScoringFilter) 2013-05-09 02:05:02,597 INFO regex.RegexURLNormalizer - can't find rules for scope 'inject', using default 2013-05-09 02:05:05,473 INFO regex.RegexURLNormalizer - can't find rules for scope 'inject', using default 2013-05-09 02:05:08,528 INFO regex.RegexURLNormalizer - can't find rules for scope 'inject', using default ------------ snip ------------------- 2013-05-09 03:57:56,963 INFO fetcher.Fetcher - fetch of http://www.erz.be.ch/erz/fr/index/kultur/kulturfoerderung/fotografie/ankaeufe_von_fotografien.html failed with: java.lang.NullPointerException at java.lang.System.arraycopy(Native Method) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1277) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1194) at java.io.DataOutputStream.writeByte(DataOutputStream.java:153) at org.apache.hadoop.io.WritableUtils.writeVLong(WritableUtils.java:264) at org.apache.hadoop.io.WritableUtils.writeVInt(WritableUtils.java:244) at org.apache.hadoop.io.Text.write(Text.java:281) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1061) at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:592) at org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:976) at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:714) 2013-05-09 03:57:56,963 INFO fetcher.Fetcher - fetch of http://www.writtenby.ch/suchen/autoren.php?page=8&show=1 failed with: java.lang.NullPointerException at java.lang.System.arraycopy(Native Method) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1277) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1194) at java.io.DataOutputStream.writeByte(DataOutputStream.java:153) at org.apache.hadoop.io.WritableUtils.writeVLong(WritableUtils.java:264) at org.apache.hadoop.io.WritableUtils.writeVInt(WritableUtils.java:244) at org.apache.hadoop.io.Text.write(Text.java:281) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1061) at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:592) at org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:976) at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:714) 2013-05-09 03:57:56,963 ERROR fetcher.Fetcher - fetcher caught:java.lang.NullPointerException 2013-05-09 03:57:56,963 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=81 2013-05-09 03:57:56,963 ERROR fetcher.Fetcher - fetcher caught:java.lang.NullPointerException 2013-05-09 03:57:56,963 WARN fetcher.Fetcher - Attempting to finish item from unknown queue: org.apache.nutch.fetcher.Fetcher$FetchItem@504e3c43 2013-05-09 03:57:56,967 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=28 2013-05-09 03:57:56,967 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=29 2013-05-09 03:57:56,967 ERROR fetcher.Fetcher - fetcher caught:java.lang.NullPointerException 2013-05-09 03:57:56,967 ERROR fetcher.Fetcher - fetcher caught:java.lang.NullPointerException 2013-05-09 03:57:56,967 ERROR fetcher.Fetcher - fetcher caught:java.lang.NullPointerException 2013-05-09 03:57:56,967 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=26 2013-05-09 03:57:56,967 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=30 2013-05-09 03:57:56,967 ERROR fetcher.Fetcher - fetcher caught:java.lang.NullPointerException 2013-05-09 03:57:56,967 INFO fetcher.Fetcher - fetch of http://www.baden.ch/default.cfm?DomainID=1&TreeID=2257&language=de failed with: java.lang.NullPointerException at java.lang.System.arraycopy(Native Method) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1277) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1194) at java.io.DataOutputStream.writeByte(DataOutputStream.java:153) at org.apache.hadoop.io.WritableUtils.writeVLong(WritableUtils.java:264) at org.apache.hadoop.io.WritableUtils.writeVInt(WritableUtils.java:244) at org.apache.hadoop.io.Text.write(Text.java:281) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1061) at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:592) at org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:976) at org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:923) at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:760) 2013-05-09 03:57:56,967 WARN fetcher.Fetcher - Attempting to finish item from unknown queue: org.apache.nutch.fetcher.Fetcher$FetchItem@68aa41cd 2013-05-09 03:57:56,967 ERROR fetcher.Fetcher - fetcher caught:java.lang.NullPointerException 2013-05-09 03:57:56,967 ERROR fetcher.Fetcher - fetcher caught:java.lang.NullPointerException 2013-05-09 03:57:56,967 ERROR fetcher.Fetcher - fetcher caught:java.lang.NullPointerException 2013-05-09 03:57:56,967 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=32 2013-05-09 03:57:56,968 WARN fetcher.Fetcher - Attempting to finish item from unknown queue: org.apache.nutch.fetcher.Fetcher$FetchItem@562e7f78 2013-05-09 03:57:56,966 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=33 2013-05-09 03:57:56,966 WARN fetcher.Fetcher - Attempting to finish item from unknown queue: org.apache.nutch.fetcher.Fetcher$FetchItem@3e1f9310 2013-05-09 03:57:56,966 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=35 2013-05-09 03:57:56,966 ERROR fetcher.Fetcher - fetcher caught:java.lang.NullPointerException 2013-05-09 03:57:56,966 ERROR fetcher.Fetcher - fetcher caught:java.lang.NullPointerException 2013-05-09 03:57:56,968 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=21 2013-05-09 03:57:56,968 WARN fetcher.Fetcher - Attempting to finish item from unknown queue: org.apache.nutch.fetcher.Fetcher$FetchItem@179d4544 2013-05-09 03:57:56,966 ERROR fetcher.Fetcher - fetcher caught:java.lang.NullPointerException 2013-05-09 03:57:56,966 ERROR fetcher.Fetcher - fetcher caught:java.lang.NullPointerException 2013-05-09 03:57:56,968 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=19 2013-05-09 03:57:56,966 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=38 2013-05-09 03:57:56,966 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=39 2013-05-09 03:57:56,966 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=40 2013-05-09 03:57:56,966 ERROR fetcher.Fetcher - fetcher caught:java.lang.NullPointerException 2013-05-09 03:57:56,966 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=44 2013-05-09 03:57:56,968 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=17 2013-05-09 03:57:56,966 ERROR fetcher.Fetcher - fetcher caught:java.lang.NullPointerException 2013-05-09 03:57:56,966 ERROR fetcher.Fetcher - fetcher caught:java.lang.NullPointerException 2013-05-09 03:57:56,966 WARN fetcher.Fetcher - Attempting to finish item from unknown queue: org.apache.nutch.fetcher.Fetcher$FetchItem@2be0c055 2013-05-09 03:57:56,966 ERROR fetcher.Fetcher - fetcher caught:java.lang.NullPointerException 2013-05-09 03:57:56,966 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=46 2013-05-09 03:57:56,968 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=13 2013-05-09 03:57:56,966 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=48 2013-05-09 03:57:56,965 ERROR fetcher.Fetcher - fetcher caught:java.lang.NullPointerException 2013-05-09 03:57:56,965 ERROR fetcher.Fetcher - fetcher caught:java.lang.NullPointerException 2013-05-09 03:57:56,968 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=12 2013-05-09 03:57:56,968 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=11 2013-05-09 03:57:56,965 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=51 2013-05-09 03:57:56,965 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=53 2013-05-09 03:57:56,965 ERROR fetcher.Fetcher - fetcher caught:java.lang.NullPointerException 2013-05-09 03:57:56,965 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=56 2013-05-09 03:57:56,965 ERROR fetcher.Fetcher - fetcher caught:java.lang.NullPointerException 2013-05-09 03:57:56,965 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=57 2013-05-09 03:57:56,965 ERROR fetcher.Fetcher - fetcher caught:java.lang.NullPointerException 2013-05-09 03:57:56,965 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=58 2013-05-09 03:57:56,965 WARN fetcher.Fetcher - Attempting to finish item from unknown queue: org.apache.nutch.fetcher.Fetcher$FetchItem@25a732a4 2013-05-09 03:57:56,965 ERROR fetcher.Fetcher - fetcher caught:java.lang.NullPointerException 2013-05-09 03:57:56,965 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=59 2013-05-09 03:57:56,965 ERROR fetcher.Fetcher - fetcher caught:java.lang.NullPointerException 2013-05-09 03:57:56,965 ERROR fetcher.Fetcher - fetcher caught:java.lang.NullPointerException 2013-05-09 03:57:56,969 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=5 2013-05-09 03:57:56,965 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=62 2013-05-09 03:57:56,969 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=4 2013-05-09 03:57:56,969 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=6 2013-05-09 03:57:56,969 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=7 2013-05-09 03:57:56,968 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=8 2013-05-09 03:57:56,968 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=9 2013-05-09 03:57:56,968 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=10 2013-05-09 03:57:56,968 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=14 2013-05-09 03:57:56,968 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=15 2013-05-09 03:57:56,968 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=16 2013-05-09 03:57:56,968 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=18 2013-05-09 03:57:56,968 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=20 2013-05-09 03:57:56,968 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=22 2013-05-09 03:57:56,968 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=23 2013-05-09 03:57:56,968 WARN fetcher.Fetcher - Attempting to finish item from unknown queue: org.apache.nutch.fetcher.Fetcher$FetchItem@2ef33be4 2013-05-09 03:57:56,968 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=24 2013-05-09 03:57:56,969 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=3 2013-05-09 03:57:56,967 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=25 2013-05-09 03:57:56,967 WARN fetcher.Fetcher - Attempting to finish item from unknown queue: org.apache.nutch.fetcher.Fetcher$FetchItem@181259d0 2013-05-09 03:57:56,967 INFO fetcher.Fetcher - fetch of http://www.aargauer-literaturhaus.ch/typo3temp/stylesheet_342078aeca.css?1344437251 failed with: java.lang.NullPointerException at java.lang.System.arraycopy(Native Method) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1277) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1194) at java.io.DataOutputStream.writeByte(DataOutputStream.java:153) at org.apache.hadoop.io.WritableUtils.writeVLong(WritableUtils.java:264) at org.apache.hadoop.io.WritableUtils.writeVInt(WritableUtils.java:244) at org.apache.hadoop.io.Text.write(Text.java:281) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1061) at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:592) at org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:976) at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:714) 2013-05-09 03:57:56,967 WARN fetcher.Fetcher - Attempting to finish item from unknown queue: org.apache.nutch.fetcher.Fetcher$FetchItem@307322f4 2013-05-09 03:57:56,967 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=27 2013-05-09 03:57:56,969 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=1 2013-05-09 03:57:56,969 ERROR fetcher.Fetcher - fetcher caught:java.lang.NullPointerException 2013-05-09 03:57:56,969 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=2 2013-05-09 03:57:56,969 WARN fetcher.Fetcher - Attempting to finish item from unknown queue: org.apache.nutch.fetcher.Fetcher$FetchItem@504e3c43 2013-05-09 03:57:56,969 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=0 2013-05-09 03:57:57,942 ERROR fetcher.Fetcher - Fetcher: java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1265) at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1332) at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:1368) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:1341) Hm… Too many threads (140) ? I got many errors in the log, but only 1 IOException… Can it be a workaround to remove the exit-code check of the fetcher in the bin/crawl script? Not really, I suppose… Best Urs Am 09.05.2013 um 12:45 schrieb Tejas Patil <[email protected]>: > Hey Urs, > Please see the logs/hadoop.log file and share the stack trace of the > exception. > > > On Thu, May 9, 2013 at 3:36 AM, Urs Hofer <[email protected]> wrote: > >> Dear List >> >> i'm currently running the nutch crawl script with solr4. >> additionally, I'm using the urlmeta plugin and I'm parsing keyword and >> description >> metadata as well. the crawl script runs in local mode. currently, I'm >> seeding about 500 domains. >> >> the crawl script runs without problems the first time. on a recrawl, it >> dies with the error >> >> 2013-05-09 03:57:56,967 WARN fetcher.Fetcher - Attempting to finish item >> from unknown queue: org.apache.nutch.fetcher.Fetcher$FetchItem@307322f4 >> 2013-05-09 03:57:56,967 INFO fetcher.Fetcher - -finishing thread >> FetcherThread, activeThreads=27 >> 2013-05-09 03:57:56,969 INFO fetcher.Fetcher - -finishing thread >> FetcherThread, activeThreads=1 >> 2013-05-09 03:57:56,969 ERROR fetcher.Fetcher - fetcher >> caught:java.lang.NullPointerException >> 2013-05-09 03:57:56,969 INFO fetcher.Fetcher - -finishing thread >> FetcherThread, activeThreads=2 >> 2013-05-09 03:57:56,969 WARN fetcher.Fetcher - Attempting to finish item >> from unknown queue: org.apache.nutch.fetcher.Fetcher$FetchItem@504e3c43 >> 2013-05-09 03:57:56,969 INFO fetcher.Fetcher - -finishing thread >> FetcherThread, activeThreads=0 >> 2013-05-09 03:57:57,942 ERROR fetcher.Fetcher - Fetcher: >> java.io.IOException: Job failed! >> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1265) >> at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1332) >> at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:1368) >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) >> at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:1341) >> >> >> after the fetcher part. >> >> I have a very liberal regex-urlfilter configuration: >> >> -^(file|ftp|mailto): >> >> -\.(mp3|MP3|gif|GIF|jpg|JPG|png|PNG|ico|ICO|css|CSS|sit|SIT|eps|EPS|wmf|WMF|zip|ZIP|ppt|PPT|mpg|MPG|xls|XLS|gz|GZ|rpm|RPM|tgz|TGZ|mov|MOV|exe|EXE|jpeg|JPEG|bmp|BMP|js|JS)$ >> +. >> >> But I am restricting the crawl to db.ignore.external.links = true >> Can it be because I've removed the line >> >> # skip URLs containing certain characters as probable queries, etc. >> -.*[?*!@=].* >> >> in regex-urlfilter? I got several seed entries like index.php?language=fr >> >> Since I'm new to nutch, I don't know where to search and how to continue. >> Thanks for any help >> Best regards >> Urs Hofer

