Thanks a ton volli. I wasted 2 days trying to figure this out, never noticed crawl-urifilter.txt also contains regex expressions for filtering urls.
Volli wrote: > > Did you try already to switch off the regexp in > crawl-urlfilter.txt? > > if you use > bin/nutch crawl... > for crawling crawl-urlfilter.txt must be changed. > > compare other lines, too. see "# skip everything else" and > "# accept anything else" > > -- View this message in context: http://lucene.472066.n3.nabble.com/Help-Extracted-Links-with-characters-like-are-getting-filtered-out-tp1392986p1395592.html Sent from the Nutch - User mailing list archive at Nabble.com.

