Hi, We use nutch 1.0 I found that for certain web pages, e.g. http://www.funnycorner.net/funny-pictures/4060/funny-people-pictures/thieves-snort-dogs-ashes-in-cocaine-bungle.html, <http://www.funnycorner.net/funny-pictures/4060/funny-people-pictures/thieves-snort-dogs-ashes-in-cocaine-bungle.html> org.apache.nutch.parse.ParseText contains newline - see sample below.
"Forums: Sites: Share This Funny Picture on : <a href=" http://www.funnycorner.net/funny-pictures/4060/funny-people-pictures/thieves-snort-dogs-ashes-in -cocaine-bungle.html" title="Thieves Snort Dogs Ashes In Cocaine Bungle" target="_blank">" Our downstream parsing utility assumes that parse text is a single line. Is there a JIRA that is going to fix this issue ? Thanks