Thanks a lot Steven, you gave me a good AHA experience! :) Now I understand why I had to use encoding when calling the urllib2! So basically Eclipse PyDev does this in the background for me, and its console supports utf-8, so thats why i never had to think about it before (and why some scripts tends to fail with unicode errors when run outside of the Eclipse IDE).
cheers Magnus > Start here: > > > > "The Absolute Minimum Every Software Developer Absolutely, Positively Must > > Know About Unicode and Character Sets (No Excuses!)" > > > > http://www.joelonsoftware.com/articles/Unicode.html > > > > > > Basically, Unicode is an in-memory data format. Python knows about Unicode > > characters (to be technical: code points), but files on disk do not. > > Neither do network protocols, or terminals, or other simple devices. They > > only understand bytes. > > > > So when you have Unicode text, and you want to write it to a file on disk, > > or print it, or send it over the network to another machine, it has to be > > *encoded* into bytes, and then *decoded* back into Unicode when you read it > > from the file again. Sometimes the system will "helpfully" do that encoding > > and decoding automatically for you, which is fine when it works but when it > > doesn't it can be perplexing. > > > > There are many, many, many different *encoding schemes*. ASCII is one. UTF-8 > > is another. And then there are about a bazillion legacy encodings which, if > > you are lucky, you will never need to care about. Only some encodings can > > deal with the entire range of Unicode characters, most can only deal with a > > (typically small) subset of possible characters. E.g. ASCII only knows > > about 127 characters out of the million-plus that Unicode deals with. > > Latin-1 can handle close to 256 different characters. If you have a say in > > the matter, always use UTF-8, since it can handle the full set of Unicode > > characters in the most efficient manner. > > > > > > -- > > Steven -- http://mail.python.org/mailman/listinfo/python-list