Ok, as I have written, the problem was in an old version of nutch (2.1). After updating to 2.2.1 the message about different batch id disabled but I have a new problem now.
Everytime I start the script bin/crawl it fetch only the urls from seed (no pages) fetching http://www.museumhetvalkhof.nl/ (queue crawl delay=5000ms) fetching http://www.eisbaeren.de/ (queue crawl delay=5000ms) fetching http://www.s-bahn-berlin.de/ (queue crawl delay=5000ms) ...but I want to fetch and then parse also fetching http://www.museumhetvalkhof.nl/something.html fetching http://www.eisbaeren.de/something/something.html etc... Where is the problem please? The urls in my seed are defined like: http://www.funkhauseuropa.de/ http://www.swr.de/ http://www.swrmediathek.de/ And regex-urlfilter.txt: +^http://([a-z0-9]*\.)*funkhauseuropa.de/ +^http://([a-z0-9]*\.)*swr.de/ +^http://([a-z0-9]*\.)*swrmediathek.de/ -- View this message in context: http://lucene.472066.n3.nabble.com/New-script-bin-crawl-skipping-urls-different-batch-id-XXXXXXXX-YYYYYYYYY-tp4075441p4075577.html Sent from the Nutch - User mailing list archive at Nabble.com.

