My stupid mistake. I am using an older version, customized .8 branch
which didn't have normalization. I added normalization to it but in the
process wasn't updating the key with the normalized url for mergesegs
filtering.
Dennis
Andrzej Bialecki wrote:
Dennis Kubes wrote:
If I wrote a new normalizer and added some regex filters to filter out
urls in crawldb and then I ran mergedb with a single db to filter and
then ran mergesegs with a single segment to filter does anyone know if
I would then be required to run through a re-parse?
Re-parse - no; re-index - yes.
Reason I am asking is because I went through this process without a
re-parse and upon indexing I get blank index files. So what I was
thinking is that urls weren't matching up because they were now
normalized.
Most likely your index is out of sync with your merged segment. Indexes
contain segment names and document id-s inside, so if you have
merged/sliced your segments you have to rebuild the index too.