Thanks a ton volli.
 I wasted 2 days trying to figure this out, never noticed
crawl-urifilter.txt
also contains regex expressions for filtering urls.


Volli wrote:
> 
> Did you try already to switch off the regexp in 
> crawl-urlfilter.txt?
> 
> if you use
> bin/nutch crawl...
> for crawling crawl-urlfilter.txt must be changed.
> 
> compare other lines, too. see "# skip everything else" and 
> "# accept anything else"
> 
> 

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Help-Extracted-Links-with-characters-like-are-getting-filtered-out-tp1392986p1395592.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to