Dennis Kubes wrote:
If I wrote a new normalizer and added some regex filters to filter out urls in crawldb and then I ran mergedb with a single db to filter and then ran mergesegs with a single segment to filter does anyone know if I would then be required to run through a re-parse?
Re-parse - no; re-index - yes.
Reason I am asking is because I went through this process without a re-parse and upon indexing I get blank index files. So what I was thinking is that urls weren't matching up because they were now normalized.
Most likely your index is out of sync with your merged segment. Indexes contain segment names and document id-s inside, so if you have merged/sliced your segments you have to rebuild the index too.
-- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
