Re: the official way of printing unicode strings

2008-12-14 Thread Marc 'BlackJack' Rintsch
On Sun, 14 Dec 2008 06:48:19 +0100, Piotr Sobolewski wrote:

 Then I tried to do this that way:
 sys.stdout = codecs.getwriter(utf-8)(sys.__stdout__)
 s = uStanisław Lem
 print u
 This works but is even more combersome.
 
 So, my question is: what is the official, recommended Python way?

I'd make that first line:

sys.stdout = codecs.getwriter('utf-8')(sys.stdout)

Why is it even more cumbersome to execute that line *once* instead 
encoding at every ``print`` statement?

Ciao,
Marc 'BlackJack' Rintsch
--
http://mail.python.org/mailman/listinfo/python-list


Re: the official way of printing unicode strings

2008-12-14 Thread Piotr Sobolewski
Marc 'BlackJack' Rintsch wrote:

 I'd make that first line:
 sys.stdout = codecs.getwriter('utf-8')(sys.stdout)
 
 Why is it even more cumbersome to execute that line *once* instead
 encoding at every ``print`` statement?

Oh, maybe it's not cumbersome, but a little bit strange - but sure, I can
get used to it. 

My main problem is that when I use some language I want to use it the way it
is supposed to be used. Usually doing like that saves many problems.
Especially in Python, where there is one official way to do any elementary
task. And I just want to know what is the normal, official way of printing
unicode strings. I mean, the question is not how can I print the unicode
string but how the creators of the language suppose me to print the
unicode string. I couldn't find an answer to this question in docs, so I
hope somebody here knows it.

So, is it _the_ python way of printing unicode?


--
http://mail.python.org/mailman/listinfo/python-list


Re: the official way of printing unicode strings

2008-12-14 Thread J. Clifford Dyer
On Sun, 2008-12-14 at 11:16 +0100, Piotr Sobolewski wrote:
 Marc 'BlackJack' Rintsch wrote:
 
  I'd make that first line:
  sys.stdout = codecs.getwriter('utf-8')(sys.stdout)
  
  Why is it even more cumbersome to execute that line *once* instead
  encoding at every ``print`` statement?
 
 Oh, maybe it's not cumbersome, but a little bit strange - but sure, I can
 get used to it. 
 
 My main problem is that when I use some language I want to use it the way it
 is supposed to be used. Usually doing like that saves many problems.
 Especially in Python, where there is one official way to do any elementary
 task. And I just want to know what is the normal, official way of printing
 unicode strings. I mean, the question is not how can I print the unicode
 string but how the creators of the language suppose me to print the
 unicode string. I couldn't find an answer to this question in docs, so I
 hope somebody here knows it.
 
 So, is it _the_ python way of printing unicode?
 

The right way to print a unicode string is to encode it in the
encoding that is appropriate for your needs (which may or may not be
UTF-8), and then to print it.  What this means in terms of your three
examples is that the first and third are correct, and the second is
incorrect.  The second one breaks when writing to a file, so don't use
it.  Both the first and third behave in the way that I suggest.  The
first (print u'foo'.encode('utf-8')) is less cumbersome if you do it
once,  but the third method (rebinding sys.stdout using codecs.open) is
less cumbersome if you'll be doing a lot of printing on stdout.  

In the end, they are the same method, but one of them introduces another
layer of abstraction.  If you'll be using more than two print statements
that need to be bound to a non-ascii encoding, I'd recommend the third,
as it rapidly becomes less cumbersome, the more you print.  

That said, you should also consider whether you want to rebind
sys.stdout or not.  It makes your print statements less verbose, but it
also loses your reference to the basic stdout.  What if you want to
print using UTF-8 for a while, but then you need to switch to another
encoding later?  If you've used a new name, you can still refer back to
the original sys.stdout.

Right:

my_out = codecs.getwriter('utf-8')(sys.stdout)
print  my_out uStuff
my_out = codecs.getwriter('ebcdic')(sys.stdout)
print  my_out uStuff

Wrong

sys.stdout = codecs.getwriter('utf-8')(sys.stdout)
print uStuff
sys.stdout = codecs.getwriter('ebcdic')(sys.stdout)
# Now sys.stdout is geting encoded twice, and you'll probably 
# get garbage out. :(
print uStuff

Though I guess this is why the OP is doing:

sys.stdout = codecs.getwriter('utf-8')(sys.__stdout__)

That avoids the problem by not rebinding the original file object.
sys.__stdout__ is still in its original state. 

Carry on, then.

Cheers,
Cliff

--
http://mail.python.org/mailman/listinfo/python-list


Re: the official way of printing unicode strings

2008-12-14 Thread Ben Finney
Piotr Sobolewski nie_dzi...@gazeta.pl writes:

 in Python (contrary to Perl, for instance) there is one way to do
 common tasks.

More accurately: the ideal is that there should be only one *obvious*
way to do things. Other ways may also exist.

 Could somebody explain me what is the official python way of
 printing unicode strings?

Try these:

URL:http://effbot.org/zone/unicode-objects.htm
URL:http://www.reportlab.com/i18n/python_unicode_tutorial.html
URL:http://www.amk.ca/python/howto/unicode

If you want something more official, try the PEP that introduced
Unicode objects, PEP 100:

URL:http://www.python.org/dev/peps/pep-0100/.

 I tried to do this such way:
 s = uStanisław Lem
 print u.encode('utf-8')
 This works, but is very cumbersome.

Nevertheless, that says everything that needs to be said: You've
defined a Unicode text object, and you've printed it specifying which
character encoding to use.

When dealing with text, the reality is that there is *always* an
encoding at the point where program objects must interface to or from
a device, such as a file, a keyboard, or a display. There is *no*
sensible default encoding, except for the increasingly-inadequate
7-bit ASCII.

URL:http://www.joelonsoftware.com/articles/Unicode.html

Since there is no sensible default, Python needs to be explicitly told
at some point which encoding to use.

 Then I tried to do this that way:
 s = uStanisław Lem
 print u
 This breaks when I redirect the output of my program to some file,
 like that:
 $ example.py  log

How does it “break”? What behaviour did you expect, and what
behaviour did you get instead?

-- 
 \ “I hope that after I die, people will say of me: ‘That guy sure |
  `\owed me a lot of money’.” —Jack Handey |
_o__)  |
Ben Finney
--
http://mail.python.org/mailman/listinfo/python-list


Re: the official way of printing unicode strings

2008-12-14 Thread Martin v. Löwis
 My main problem is that when I use some language I want to use it the way it
 is supposed to be used. Usually doing like that saves many problems.
 Especially in Python, where there is one official way to do any elementary
 task. And I just want to know what is the normal, official way of printing
 unicode strings. I mean, the question is not how can I print the unicode
 string but how the creators of the language suppose me to print the
 unicode string. I couldn't find an answer to this question in docs, so I
 hope somebody here knows it.
 
 So, is it _the_ python way of printing unicode?

The official way to write Unicode strings into a file is not to do that.
Explicit is better then implicit - always explicitly pick an encoding,
and encode the Unicode string to that encoding. Doing so is possible in
any of the forms that you have shown.

Now, Python does not mandate any choice of encoding. The right way to
encode data is in the encoding that readers of your data expect it in.

For printing to the terminal, it is clear what the encoding needs to
be (namely, the one that is used by the terminal), hence Python choses
that one when printing to the terminal. When printing to the file, the
application needs to make a choice.

If you have no idea what encoding to use, your best choice is the one
returned by locale.getpreferredencoding(). This is the encoding that
the user is most likely to expect.

Regards,
Martin
--
http://mail.python.org/mailman/listinfo/python-list


the official way of printing unicode strings

2008-12-13 Thread Piotr Sobolewski
Hello,

in Python (contrary to Perl, for instance) there is one way to do common
tasks. Could somebody explain me what is the official python way of
printing unicode strings?

I tried to do this such way:
s = uStanisław Lem
print u.encode('utf-8')
This works, but is very cumbersome.

Then I tried to do this that way:
s = uStanisław Lem
print u
This breaks when I redirect the output of my program to some file, like
that:
$ example.py  log

Then I tried to do this that way:
sys.stdout = codecs.getwriter(utf-8)(sys.__stdout__)
s = uStanisław Lem
print u
This works but is even more combersome.

So, my question is: what is the official, recommended Python way?


--
http://mail.python.org/mailman/listinfo/python-list