Hi all,
This might be a pretty trivial question, but I'm hung up on it. I've got a crawl, and it's displaying through the java servlet, and the RSS feed works great - but I'm getting two results per hostname. Not more than that, just two. I'd thought it could be reeled in with searcher.hostgrouping.rawhits.factor, but this doesn't seem to be the case. I'm trying to bring this down to one result per hostname. A little further digging makes me believe that I'm also a victim of the md5 hash bug <https://issues.apache.org/jira/browse/NUTCH-835> , but there are definitely instances where the results aren't duplicates, but are too similar to display one right after another (ie http://www.dunkmall.com/ and http://www.dunkmall.com/order.php <http://www.dunkmall.com/order.php> ). Any ideas? Is there a config setting I'm missing (hopefully)? Alternatively, do I have to dig into how the searcher works? Thanks! Rob

