Two questions have come up over the last week:
We use the xml output format found in the RSS tab to pipe into another process. Due to volume constraints, we would like there to be no required post-processing on the xml - just push it to its target container. By default, Nutch wraps the search terms in the snippet with <span class="highlight"></span> tags - is there a config file somewhere to modify that output (we're looking to change it to <b></b>)? Is there somewhere else I might change that - maybe the java files for the servlet? Secondly, and I feel like I already know the answer - we need to be able to delete offensive urls. Through our crawl, we'll have adult or otherwise irrelevant results climb too high in the results and need to remove them on a case by case basis. We have a plan to add blacklisted urls to our crawl-urlfilter.txt file, so they're effectively removed on recrawls - is that the best we can do? Is there a way to deindex them manually, without needing to recrawl the whole url list? Thanks, Rob

