> >
> > However, I'm still seeing youtube urls in the fetch logs.  I'm using the
> > -noFilter and -noNorm options with generate.  I'm also not using the
> > -filter and -normalize options for updatedb.
>
> You must either filter out all YT records from the CrawlDB or filter
> during generating.
>
>
I just tried this and it didn't work.

In my nutch-site.xml I have urlfilter-regex in the plugin.includes.
In my regex-urlfilter.txt I have -^http://www\.youtube\.com/ right above
the +. at the bottom.

Yet when I run a crawldb dump, the youtube urls still show up.  What am I
missing?

Thanks.

Reply via email to