Maksim Kasimov napisaĆ(a): >> 'utf8' codec can't decode bytes in position 176-177: invalid data >>>>> iMessage[176:178] >> '\xd1]' >> >> And that's your problem. In general you can't just truncate a utf-8 >> encoded string anywhere and expect the result to be valid utf-8. The >> \xd1 at the very end of your CDATA section is the first byte of a >> two-byte sequence that represents some unicode code-point between \u0440 >> and \u047f, but it's missing the second byte that says which one. > > > in previous message i've explain already that the situation widely > appears with > memory limited devices, such as mobile terminals of Nokia, SonyEriccson, > Siemens and so on. > > and i've notice you that it is a part of a splited string.
No, it is not a part of string. It's a part of byte stream, split in a middle of multibyte-encoded character. You cann't get only dot from small letter "i" and ask the parser to treat it as a complete "i". -- Jarek Zgoda http://jpa.berlios.de/ -- http://mail.python.org/mailman/listinfo/python-list