Re: Writing a Carriage Return in Unicode
In article mailman.773.1258787463.2873.python-l...@python.org, Dennis Lee Bieber wlfr...@ix.netcom.com wrote: On Thu, 19 Nov 2009 23:22:22 -0800, Scott David Daniels scott.dani...@acm.org declaimed the following in gmane.comp.python.general: If you've actually typed on a physical typewriter, you know that moving the carriage back is a distinct operation from rolling the platen forward; both operations are accomplished when you push the carriage back using the bar, but you know they are distinct. Of course, if you are describing a /real/ /manual/ typewriter, you would rapidly discover that the sequence is lfcr -- since pushing the bar would often trigger the line feed before it would slide the carriage to the right. Often, but not always; it certainly was possible on most typewriters to return the carriage without a line feed -- and occasionally desirable for overstrike. -- Aahz (a...@pythoncraft.com) * http://www.pythoncraft.com/ The best way to get information on Usenet is not to ask a question, but to post the wrong information. -- http://mail.python.org/mailman/listinfo/python-list
Re: Writing a Carriage Return in Unicode
On Nov 21, 11:33 pm, Gregory Ewing greg.ew...@canterbury.ac.nz wrote: Steve Howell wrote: If you are going to couple character sets to their legacy physical implementations, you should also have a special extra character to dot your i's and cross your t's. No, no, no. For that device you need to output a series of motion vectors for the scribing point. Plus control characters for dip nib and apply blotter, and possibly also pluck goose for when the print head becomes worn. Greg, at the first reading of your response, it sounded overly complicated for me to have to dip nib and pluck goose every time I just want to semantically indicate the ninth letter of the English alphabet, but that's easily solved with a wizard interface, I guess. Maybe every time I am trying to decide which letter to type in Word, there could be some kind of animated persona that helps me choose the character. There could be a visual icon of an eye that reminds me of the letter that I am trying to type, and I could configure the depth to which I dib the nib with some kind of slider interface. It actually sounds quite simple and elegant, the more that I think about it. -- http://mail.python.org/mailman/listinfo/python-list
Re: Writing a Carriage Return in Unicode
On Thu, 19 Nov 2009 23:22:22 -0800, Scott David Daniels wrote: MRAB wrote: u'\u240D' isn't a carriage return (that's u'\r') but a symbol (a visible CR graphic) for carriage return. Windows programs normally expect lines to end with '\r\n'; just use u'\n' in programs and open the text files in text mode ('r' or 'w'). rant This is the one thing from standards that I believe Microsoft got right where others did not. Oh please, that's historical revisionism -- \r\n wasn't invented by Microsoft. Microsoft didn't get it right, they simply copied what CP/M did, on account of the original MS-DOS being essentially a clone of CP/M. And of course the use of \r\n predates computers -- CR+LF (Carriage Return + LineFeed) were necessary to instruct the print head on teletype printers to move down one line and return to the left. It was a physical necessity for the oldest computer operating systems, because the only printers available were teletypes. The ASCII (American Standard for Information Interchange) standard end of line is _both_ carriage return (\r) _and_ line feed (\n) I doubt that very much. Do you have a reference for this? It is true that the predecessor to ANSI (not ASCII), ASA, specified \r\n as the line terminator, but ISO specified that both \n and \r\n should be accepted. I believe in that order. You believe in that order? But you're not sure? That's the trouble with \r\n, or \n\r -- it's an arbitrary choice, and therefore hard to remember which it is. I've even seen proprietary business-to-business software where the developers (apparently) couldn't remember which was the standard, so when exporting data to text, you had to choose which to use for line breaks. Of course, being Windows software, they didn't think that you might want to transfer the text file to a Unix system, or a Mac, and so didn't offer \n or \r alone as line terminators. The Unix operating system, in its enthusiasm to make _everything_ simpler (against Einstein's advice, Everything should be made as simple as possible, but not simpler.) decided that end-of-line should be a simple line feed and not carriage return line feed. Why is it too simple to have line breaks be a single character? What is the downside of the Unix way? Why is \r\n better? We're not using teletypes any more. Or for that matter, classic Mac OS, which used a single \r as newline. Likewise for other OSes, such as Commodore, Amiga, Multics... Before they made that decision, there was debate about the order of cr-lf or lf-cr, or inventing a new EOL character ('\037' == '\x1F' was the candidate). IBM operating systems that use EBCDIC used the NEL (NExt Line) character for line breaks, keeping CR and LF for other uses. The Unicode standard also specifies that any of the following be recognised as line separators or terminators: LF, CR, CR+LF, NEL, FF (FormFeed, \f), LS (LineSeparator, U+2028) and PS (ParagraphSeparator, U+2029). If you've actually typed on a physical typewriter, you know that moving the carriage back is a distinct operation from rolling the platen forward; I haven't typed on a physical typewriter for nearly a quarter of a century. If you've typed on a physical typewriter, you'll know that to start a new page, you have to roll the platen forward until the page ejects, then move the typewriter guide forward to leave space, then feed a new piece of paper into the typewriter by hand, then roll the platen again until the page is under the guide, then push the guide back down again. That's FIVE distinct actions, and if you failed to do them, you would type but no letters would appear on the (non-existent) page. Perhaps we should specify that text files need a five-character sequence to specify a new page too? both operations are accomplished when you push the carriage back using the bar, but you know they are distinct. Hell, MIT even had line starve character that moved the cursor up (or rolled the platen back). /rant Lots of people talk about dos-mode files and windows files as if Microsoft got it wrong; it did not -- Unix made up a convenient fiction and people went along with it. (And, yes, if Unix had been there first, their convention was, in fact, better). This makes zero sense. If Microsoft got it right, then why is the Unix convention convenient and better? Since we're not using teletype machines, I would say Microsoft is now using an *inconvenient* fiction. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: Writing a Carriage Return in Unicode
On 21 Nov, 09:12, Steven D'Aprano st...@remove-this- cybersource.com.au wrote: Oh please, that's historical revisionism -- \r\n wasn't invented by Microsoft. Microsoft didn't get it right, they simply copied what CP/M did, on account of the original MS-DOS being essentially a clone of CP/M. Actyually \r\n goes back to early mechanical typewriters with typebars, such as the Hermes. The operator would hit CR to return the paper carriage and LF to move down to the next line. -- http://mail.python.org/mailman/listinfo/python-list
Re: Writing a Carriage Return in Unicode
On 21 Nov, 08:10, Dennis Lee Bieber wlfr...@ix.netcom.com wrote: Of course, if you are describing a /real/ /manual/ typewriter, you would rapidly discover that the sequence is lfcr -- since pushing the bar would often trigger the line feed before it would slide the carriage to the right. But on a teletype, it would be crlf, and maybe a few rub-outs for timing -- as the cr was the slower operation, and would complete while the other characters were operated upon... Ah, yes you are right :-) The sequence is lfcr on a typewriter. Which is why the RETURN button often had the symbol | | -- http://mail.python.org/mailman/listinfo/python-list
Re: Writing a Carriage Return in Unicode
On Nov 21, 12:12 am, Steven D'Aprano st...@remove-this- cybersource.com.au wrote: On Thu, 19 Nov 2009 23:22:22 -0800, Scott David Daniels wrote: If you've actually typed on a physical typewriter, you know that moving the carriage back is a distinct operation from rolling the platen forward; I haven't typed on a physical typewriter for nearly a quarter of a century. If you've typed on a physical typewriter, you'll know that to start a new page, you have to roll the platen forward until the page ejects, then move the typewriter guide forward to leave space, then feed a new piece of paper into the typewriter by hand, then roll the platen again until the page is under the guide, then push the guide back down again. That's FIVE distinct actions, and if you failed to do them, you would type but no letters would appear on the (non-existent) page. Perhaps we should specify that text files need a five-character sequence to specify a new page too? both operations are accomplished when you push the carriage back using the bar, but you know they are distinct. Hell, MIT even had line starve character that moved the cursor up (or rolled the platen back). /rant Lots of people talk about dos-mode files and windows files as if Microsoft got it wrong; it did not -- Unix made up a convenient fiction and people went along with it. (And, yes, if Unix had been there first, their convention was, in fact, better). This makes zero sense. If Microsoft got it right, then why is the Unix convention convenient and better? Since we're not using teletype machines, I would say Microsoft is now using an *inconvenient* fiction. -- Steven It's been a long time since I have typed on a physical typewriter as well, but I still vaguely remember all the crazy things I had to do to get the tab key to produce a predictable indentation on the paper output. I agree with Steven that \r\n is completely insane. If you are going to couple character sets to their legacy physical implementations, you should also have a special extra character to dot your i's and cross your t's. Apparently neither Unix or Microsoft got that right. I mean, think about it, dotting the i is a distinct operation from creating the undotted i. ;) -- http://mail.python.org/mailman/listinfo/python-list
Re: Writing a Carriage Return in Unicode
Steve Howell wrote: If you are going to couple character sets to their legacy physical implementations, you should also have a special extra character to dot your i's and cross your t's. No, no, no. For that device you need to output a series of motion vectors for the scribing point. Plus control characters for dip nib and apply blotter, and possibly also pluck goose for when the print head becomes worn. -- Greg -- http://mail.python.org/mailman/listinfo/python-list
Re: Writing a Carriage Return in Unicode
MRAB wrote: u'\u240D' isn't a carriage return (that's u'\r') but a symbol (a visible CR graphic) for carriage return. Windows programs normally expect lines to end with '\r\n'; just use u'\n' in programs and open the text files in text mode ('r' or 'w'). rant This is the one thing from standards that I believe Microsoft got right where others did not. The ASCII (American Standard for Information Interchange) standard end of line is _both_ carriage return (\r) _and_ line feed (\n) -- I believe in that order. The Unix operating system, in its enthusiasm to make _everything_ simpler (against Einstein's advice, Everything should be made as simple as possible, but not simpler.) decided that end-of-line should be a simple line feed and not carriage return line feed. Before they made that decision, there was debate about the order of cr-lf or lf-cr, or inventing a new EOL character ('\037' == '\x1F' was the candidate). If you've actually typed on a physical typewriter, you know that moving the carriage back is a distinct operation from rolling the platen forward; both operations are accomplished when you push the carriage back using the bar, but you know they are distinct. Hell, MIT even had line starve character that moved the cursor up (or rolled the platen back). /rant Lots of people talk about dos-mode files and windows files as if Microsoft got it wrong; it did not -- Unix made up a convenient fiction and people went along with it. (And, yes, if Unix had been there first, their convention was, in fact, better). So, sorry for venting, but I have bee wanting to say this in public for years. --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: Writing a Carriage Return in Unicode
On 19 Nov, 01:14, Doug caldwelli...@verizon.net wrote: Thanks for your help!! A carriage return in unicode is u\r how this is written as bytes is dependent on the encoder. Don't try to outsmart the UTF-8 codec, it knows how to translate \r to UTF-8. Sturla Molden -- http://mail.python.org/mailman/listinfo/python-list
Re: Writing a Carriage Return in Unicode
Hi! Thanks for clearing this up!! -- http://mail.python.org/mailman/listinfo/python-list
Writing a Carriage Return in Unicode
Hi! I am trying to write a UTF-8 file of UNICODE strings with a carriage return at the end of each line (code below). filOpen = codecs.open(c:\\temp\\unicode.txt,'w','utf-8') str1 = u'This is a test.' str2 = u'This is the second line.' str3 = u'This is the third line.' strCR = u\u240D filOpen.write(str1 + strCR) filOpen.write(str2 + strCR) filOpen.write(str3 + strCR) filOpen.close() The output looks like This is a test.âThis is the second line.âThis is the third line.â when opened in Wordpad as a UNICODE file. Thanks for your help!! -- http://mail.python.org/mailman/listinfo/python-list
Re: Writing a Carriage Return in Unicode
Doug wrote: Hi! I am trying to write a UTF-8 file of UNICODE strings with a carriage return at the end of each line (code below). filOpen = codecs.open(c:\\temp\\unicode.txt,'w','utf-8') str1 = u'This is a test.' str2 = u'This is the second line.' str3 = u'This is the third line.' strCR = u\u240D filOpen.write(str1 + strCR) filOpen.write(str2 + strCR) filOpen.write(str3 + strCR) filOpen.close() The output looks like This is a test.�This is the second line.�This is the third line.� when opened in Wordpad as a UNICODE file. Thanks for your help!! u'\u240D' isn't a carriage return (that's u'\r') but a symbol (a visible CR graphic) for carriage return. Windows programs normally expect lines to end with '\r\n'; just use u'\n' in programs and open the text files in text mode ('r' or 'w'). Some Windows programs won't recognise UTF-8 text as UTF-8 in files unless they start with a BOM; this will be handled automatically in Python if you specify the encoding as 'utf-8-sig'. -- http://mail.python.org/mailman/listinfo/python-list