David Convent wrote:
Hi Bjorn,

I always believed that unicode and utf-8 were same encoding, but reading you let me think i was wrong.
Can you tell me what the difference is between unicode and utf-8 ?

Unicode should not be seen as an encoding as such. While Python internally uses an encoding for unicode strings (which are the strings that if you represent them python will add a 'u' in front of them), you shouldn't care about what that is, and Python can in fact be recompiled to use another.


UTF-8 is one particular way to represent unicode data, in this case as 8 bit strings. UTF-8 happens to be popular for two (related) reasons:

* since UTF-8 includes ASCII, ASCII is automatically UTF-8 and UTF-8 without a lot of special characters looks like ASCII.

* Software that can deal with 8 bit strings can usually deal with UTF-8.

Anyway, in my experience most programmers have only a vague grasp of encoding issues. The basics are in Python not that hard to understand, but:

* Python is not very educational if you do it wrong; you basically get weird errors

* you get weird errors frequently in a different place in the code than where you made them, when some code is trying to combine unicode strings with classic strings.

* you can 'hack' your way around it and survive for a long time. You don't notice the problem as it works with the test text which happens to be ascii. Etc.

Regards,

Martijn

_______________________________________________
Zope-Dev maillist - [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
** No cross posts or HTML encoding! **
(Related lists - http://mail.zope.org/mailman/listinfo/zope-announce
http://mail.zope.org/mailman/listinfo/zope )

Reply via email to