At 11:43 2005-10-24 +0200, M.-A. Lemburg wrote: >Bengt Richter wrote: >> Please bear with me for a few paragraphs ;-) > >Please note that source code encoding doesn't really have >anything to do with the way the interpreter executes the >program - it's merely a way to tell the parser how to >convert string literals (currently on the Unicode ones) >into constant Unicode objects within the program text. >It's also a nice way to let other people know what kind of >encoding you used to write your comments ;-) > >Nothing more. I think somehow I didn't make things clear, sorry ;-) As I tried to show in the example of module_a.cs vs module_b.cs, the source encoding currently results in two different str-type strings representing the source _character_ sequence, which is the _same_ in both cases. To make it more clear, try the following little program (untested except on NT4 with Python 2.4b1 (#56, Nov 3 2004, 01:47:27) [GCC 3.2.3 (mingw special 20030504-1)] on win32 ;-):
----< t_srcenc.py >-------------------------------- import os def test(): open('module_a.py','wb').write( "# -*- coding: latin-1 -*-" + os.linesep + "cs = '\xfcber-cool'" + os.linesep) open('module_b.py','wb').write( "# -*- coding: utf-8 -*-" + os.linesep + "cs = '\xc3\xbcber-cool'" + os.linesep) # show that we have two modules differing only in encoding: print ''.join(line.decode('latin-1') for line in open('module_a.py')) print ''.join(line.decode('utf-8') for line in open('module_b.py')) # see how results are affected: import module_a, module_b print module_a.cs + ' =?= ' + module_b.cs print module_a.cs.decode('latin-1') + ' =?= ' + module_b.cs.decode('utf-8') if __name__ == '__main__': test() --------------------------------------------------- The result copied from NT4 console to clipboard and pasted into eudora: __________________________________________________________ [17:39] C:\pywk\python-dev>py24 t_srcenc.py # -*- coding: latin-1 -*- cs = 'über-cool' # -*- coding: utf-8 -*- cs = 'über-cool' nber-cool =?= ++ber-cool über-cool =?= über-cool __________________________________________________________ (I'd say NT did the best it could, rendering the the copied cp437 superscript n as the 'n' above, and the '++' coming from the cp437 box characters corresponding to the '\xc3\xbc'. Not sure how it will show on your screen, but try the program to see ;-) >Once a module is compiled, there's no distinction between >a module using the latin-1 source code encoding or one using >the utf-8 encoding. ISTM module_a.cs and module_b.cs can readily be distinguished after compilation, whereas the sources displayed according to their declared encodings as above (or as e.g. different editors using different native encoding might) cannot (other than the encoding cookie itself) ;-) Perhaps you meant something else? >Thanks, You're welcome. Regards, Bengt Richter _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com