Is there some "out of the box" way to get Nutch to remove carriage returns 
and/or line feeds from content as it parses?  I'm finding some places in a 
crawl I did recently of one of our sites where for some reason there are \n 
characters in places and I'd like to cut them out.  I'm finding that if there's 
a \n in the middle of quoted text (such as "Some \n String") the " come out in 
a browser as ?.  As far as I can tell it's an issue with the content being 
formatted strangely.  I'm guessing this is a common thing and I'm just missing 
something?

Reply via email to