Filtering URLs in CrawlDB

Dennis Kubes Tue, 09 Jan 2007 08:31:25 -0800

If I wrote a new normalizer and added some regex filters to filter outurls in crawldb and then I ran mergedb with a single db to filter andthen ran mergesegs with a single segment to filter does anyone know if Iwould then be required to run through a re-parse?

Reason I am asking is because I went through this process without are-parse and upon indexing I get blank index files. So what I wasthinking is that urls weren't matching up because they were now normalized.


Dennis

Filtering URLs in CrawlDB

Reply via email to