> ########################################################################
> for encoding in ('utf-8', 'utf-16', 'utf-32'):
>   for i in range(0x110000):
>     aChar = unichr(i)
>     try:
>       someBytes = aChar.encode(encoding)
>       if '\n' in someBytes:
>         print("%r contains a newline in its bytes encoded with %s" %
> (aChar, encoding))
>     except:
>       ## Normally, try/catches with an empty except is a bad idea.
>       ## Here, this is toy code, and we're just exploring.
>       pass
> ########################################################################


Gaa...  Sorry about the bad indenting.  Let me try that again.


####################################
for encoding in ('utf-8', 'utf-16', 'utf-32'):
    for i in range(0x110000):
        aChar = unichr(i)
        try:
            someBytes = aChar.encode(encoding)
            if '\n' in someBytes:
                print("%r contains a newline in its bytes encoded with
%s" % (aChar, encoding))
        except:
            ## Normally, try/catches with an empty except is a bad idea.
            ## Here, this is toy code, and we're just exploring.
            pass
####################################



> Hopefully, this makes the point clearer: we must not try to decode
> individual lines.  By that time, the damage has been done: the act of
> trying to break the file into lines by looking naively at newline byte
> characters is invalid when certain characters can themselves have
> newline characters.

Confusing last sentence.  Let me try that again.  The act of trying to
break the file into lines by looking naively at newline byte
characters is invalid because certain characters, under encoding,
themselves consist of newline characters.  We've got to open the file
with the right encoding in play.


Joel Spolsky's article on "The Absolute minimum Every Software
Developer Absolutely, Positively Must Know About Unicode and Character
Sets (No Excuses!)" needs to be referenced.   :P

    http://www.joelonsoftware.com/articles/Unicode.html
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Reply via email to