I am starting with SAX and am trying to parse a file that contains non-ascii characters. The xml file uses 'ISO-8859-1'. When it parses text containing non-ascii characters the output is across multiple lines.
Example Trying to output 'Der Einfluss kleiner naturnaher Retentionsmaßnahmen in der Fläche auf den Hochwasserabfluss - Kleinrückhaltebecken' The output I get is Start ELEMENT ='title' String read is 'Der Einfluss kleiner naturnaher Retentionsma' String read is '▀' String read is 'nahmen in der Fl' String read is 'Σ' String read is 'che auf den Hochwasserabfluss - Kleinr' String read is 'ⁿ' String read is 'ckhaltebecken -.' End ELEMENT ='title' whereas I want a single string something like... Start ELEMENT ='title' String read is 'Der Einfluss kleiner naturnaher Retentionsma▀nahmen in der FlΣche auf den Hochwasserabfluss - Kleinrⁿckhaltebecken -. End ELEMENT ='title' My code is: def characters(self, chars): newchars=[] newchars.append(chars.encode('ISO-8859-1')) if newchars[-1] == '\n': newchars = newchars[:-1] if len(newchars)> 0: output = 'String read is ' + "'" + ''.join(newchars) + "'\n" sys.stdout.write(output) return Does anyone have any ideas? -- View this message in context: http://www.nabble.com/SAX-characters%28%29-output-on-multiple-lines-for-non-ascii-tp15248449p15248449.html Sent from the Python - xml-sig mailing list archive at Nabble.com. _______________________________________________ XML-SIG maillist - XML-SIG@python.org http://mail.python.org/mailman/listinfo/xml-sig