Currently, I have roughly 10M records in my crawldb. I added some regex's to remove some urls from my crawldb. Nothing complicated. However, when I run with filtering turned on, the updatedb job took 118 hours.
Looking in the regex-urlfilter.txt file, I noticed some of the other regex's are pretty broad. So I commented them out and the updatedb job took 6 minutes. -[?*!@=] -.*(/[^/]+)/[^/]+\1/[^/]+\1/ These two regexs are what cause url filtering to be so slow.

