Is there some "out of the box" way to get Nutch to remove carriage returns and/or line feeds from content as it parses? I'm finding some places in a crawl I did recently of one of our sites where for some reason there are \n characters in places and I'd like to cut them out. I'm finding that if there's a \n in the middle of quoted text (such as "Some \n String") the " come out in a browser as ?. As far as I can tell it's an issue with the content being formatted strangely. I'm guessing this is a common thing and I'm just missing something?
