Hi, When working on some patches for both trunk and Nutchgora branch I ended up doing some code analysis of the generator mappers [0] & [1] respectively. With specific reference to the code blocks in trunk (lines 175 - 185) and Nutchgora branch (lines 57 - 74) where in trunk we initially check if filter is true whereas in Nutchgora we check whether normalize is true, then check whether filter is true before proceeding to catch any nasties... it seems to me that there may be a bug in trunk but I am not sure and would like someone to comment.
Thanks Lewis [0] https://svn.apache.org/viewvc/nutch/trunk/src/java/org/apache/nutch/crawl/Generator.java?view=markup [1] https://svn.apache.org/viewvc/nutch/branches/nutchgora/src/java/org/apache/nutch/crawl/GeneratorMapper.java?view=markup -- Lewis

