On 23/07/13 09:39, Marc Tompkins wrote:
On Mon, Jul 22, 2013 at 3:22 PM, Jim Mooney <cybervigila...@gmail.com>wrote:

On 22 July 2013 14:11, Marc Tompkins <marc.tompk...@gmail.com> wrote:


One way to deal with this is to specify an encoding:
     newchar = char.decode('cp437').encode('utf-8')


Works fine, but I decided to add a dos graphics dash to the existing dash
to expand the tree
visually. Except I got a complaint from IDLE that I should add this:

# -*- coding: utf-8 -*-

Will that always work? Setting coding in a comment? Or am I looking at a
Linux hash line?


I speak under correction here, but:  what you're setting there is the
encoding for the script file itself (and - the real point here - any
strings you specify, without explicit encoding, inside the script), NOT the
default encoding that Python is going to use while executing your script.
Unless I'm very much mistaken, Python will still use the default encoding
('ascii' in your case) when reading strings from external files.


Correct. The encoding declaration ONLY tells Python how to read the script. 
Remember, source code is text, but has to be stored on disk as bytes. If you 
only use ASCII characters, pretty much every program will agree what the bytes 
represent (since IBM mainframes using EBCDIC are pretty rare, and few programs 
expect double-byte encodings). But if you include non-ASCII characters, your 
text editor has to convert them to bytes. How does it do so? Nearly every 
editor is different, a plain text file doesn't have any way of storing metadata 
such as the encoding. Contrast this to things like JPEG files, which can store 
metadata like the camera you used to take the photo.

So, some programmer's editors have taken up the convention of using so-called "mode 
lines" to record editor settings as comments in source code, usually in the first 
couple or last couple of lines. Especially on Linux systems, Emacs and Vim uses 
frequently include such mode lines.

Python stole this idea from them. If the first or second line in the source code file is 
a comment containing something like "encoding = SPAM", then Python will read 
that source code using encoding SPAM. The form shown above

-*- coding: utf-8 -*-

is copied from Emacs. Python is pretty flexible though.

However, the encoding must be a known encoding (naturally), and the comment 
must be in the first or second line. You can't use it anywhere else. Well, you 
actually can, since it is a comment, but it will have no effect anywhere else.


--
Steven
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to