Thanks Marc,
But I think that got rid of all of my carriage returns. Everything on just one line now. Matthew Pirritano, Ph.D. Research Analyst IV Medical Services Initiative (MSI) Orange County Health Care Agency (714) 568-5648 ________________________________ From: tutor-bounces+mpirritano=ochca....@python.org [mailto:tutor-bounces+mpirritano=ochca....@python.org] On Behalf Of Marc Tompkins Sent: Tuesday, April 07, 2009 10:12 AM To: tutor@python.org Subject: Re: [Tutor] why is unicode converted file double spaced? On Tue, Apr 7, 2009 at 9:52 AM, Pirritano, Matthew <mpirrit...@ochca.com> wrote: So Kent's syntax worked to convert my Unicode file to plain text. But now my data is double space. How can I fix this. Here is the code I'm using. Sounds like you're being stung by the difference in newline handling between operating systems - to recap, MS-DOS and Windows terminate a line with a carriage return and linefeed (aka CRLF or '\r\n'); *nixes use just LF ('\n'); Mac OS up to version 9 uses just CR ('\r'). You will have noticed this, on Windows, if you ever open a text file in Notepad that was created on a different OS - instead of breaking into separate lines, everything appears on one long line with funky characters where the breaks should be. If you use a more sophisticated text editor such as Notepad++ or Textpad, everything looks normal. Python has automatic newline conversion; generally, you can read a text file from any OS and write to it correctly regardless of the OS that you happen to be running yourself. However, the automatic newline handling (from my perfunctory Googling) appears to break down when you're also converting between Unicode and ASCII; or it could be because you're essentially doing a read() from one file and a writelines() to the other; or something else entirely. Anyway, try this - import codecs inp = codecs.open('g:\\data\\amm\\text files\\test20090320.txt', 'r', 'utf-16') outp = open('g:\\data\\amm\\text files\\new_text_file.txt', 'w') for outLine in inp: outp.write(outLine.strip()) inp.close() outp.close() strip() will remove any leading or trailing whitespace - which should include any leftover CR or LF characters. HTH - -- www.fsrtechnologies.com
_______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor