Somehow my crawler started fetching youtube. I'm not really sure why as I have db.ignore.external.links set to true.
I've since added the following line to my regex-urlfilter.txt file. -^http://www\.youtube\.com/ However, I'm still seeing youtube urls in the fetch logs. I'm using the -noFilter and -noNorm options with generate. I'm also not using the -filter and -normalize options for updatedb. According to Markus in this thread, the normalization and filtering should still occur even when using the above options and using 1.4 http://lucene.472066.n3.nabble.com/Re-Re-generate-update-times-and-crawldb-size-td3564078.html Is there a setting I'm missing? I'm not seeing anything in the logs regarding this. Thanks.

