Somehow my crawler started fetching youtube.  I'm not really sure why as I
have db.ignore.external.links set to true.

I've since added the following line to my regex-urlfilter.txt file.

-^http://www\.youtube\.com/

However, I'm still seeing youtube urls in the fetch logs.  I'm using the
-noFilter and -noNorm options with generate.  I'm also not using the
-filter and -normalize options for updatedb.

According to Markus in this thread, the normalization and filtering should
still occur even when using the above options and using 1.4

http://lucene.472066.n3.nabble.com/Re-Re-generate-update-times-and-crawldb-size-td3564078.html


Is there a setting I'm missing?  I'm not seeing anything in the logs
regarding this.

Thanks.

Reply via email to