> > > > However, I'm still seeing youtube urls in the fetch logs. I'm using the > > -noFilter and -noNorm options with generate. I'm also not using the > > -filter and -normalize options for updatedb. > > You must either filter out all YT records from the CrawlDB or filter > during generating. > > I just tried this and it didn't work.
In my nutch-site.xml I have urlfilter-regex in the plugin.includes. In my regex-urlfilter.txt I have -^http://www\.youtube\.com/ right above the +. at the bottom. Yet when I run a crawldb dump, the youtube urls still show up. What am I missing? Thanks.

