On Fri, 12 Mar 2010 10:29:17 pm spir wrote: > Hello again, > > A different issue. On the custom Unicode type discussed in another > thread, I have overloaded __str__ and __repr__ to get encoded byte > strings (here with debug prints & special formats to distinguish from > builtin forms): [...] > Note that Unicode.__str__ is called neither by "print us", nore by > %s. What happens? Why does the issue only occur when using both > format %s & %s?
The print statement understands how to directly print strings (byte-strings and unicode-strings) and doesn't call your __str__ method. http://docs.python.org/reference/simple_stmts.html#the-print-statement You can demonstrate that with a much simpler example: >>> class K(unicode): ... def __str__(self): return "xyz" ... def __repr__(self): return "XYZ" ... >>> k = K("some text") >>> str(k) 'xyz' >>> repr(k) 'XYZ' >>> print k some text print only calls __str__ if the object isn't already a string. As for string interpolation, I have reported this as a bug: http://bugs.python.org/issue8128 I have some additional comments on your class below: > class Unicode(unicode): > ENCODING = "utf8" > def __new__(self, string='', encoding=None): This is broken according to the Liskov substitution principle. http://en.wikipedia.org/wiki/Liskov_substitution_principle The short summary: subclasses should only ever *add* functionality, they should never take it away. The unicode type has a function signature that accepts an encoding and an errors argument, but you've missed errors. That means that code that works with built-in unicode objects will break if your class is used instead. If that's intentional, you need to clearly document that your class is *not* entirely compatible with the built-in unicode, and preferably explain why you have done so. If it's accidental, you should fix it. A good start is the __new__ method I posted earlier. > if isinstance(string,str): > encoding = Unicode.ENCODING if encoding is None else > encoding string = string.decode(encoding) > return unicode.__new__(Unicode, string) > def __repr__(self): > print '+', > return '"%s"' %(self.__str__()) This may be a problem. Why are you making your unicode class pretend to be a byte-string? Ideally, the output of repr(obj) should follow this rule: eval(repr(obj)) == obj For instance, for built-in unicode strings: >>> u"éâÄ" == eval(repr(u"éâÄ")) True but for your subclass, us != eval(repr(us)). So again, code that works perfectly with built-in unicode objects will fail with your subclass. Ideally, repr of your class should return a string like: "Unicode('...')" but if that's too verbose, it is acceptable to just inherit the __repr__ of unicode and return something like "u'...'". Anything else should be considered non-standard behaviour and is STRONGLY discouraged. > def __str__(self): > print '*', > return '`'+ self.encode(Unicode.ENCODING) + '`' What's the purpose of the print statements in the __str__ and __repr__ methods? Again, unless you have a good reason to do different, you are best to just inherit __str__ from unicode. Anything else is strongly discouraged. > An issue happens in particuliar cases, when using both %s and %r: > > s = "éâÄ" This may be a problem. "éâÄ" is not a valid str, because it contains non-ASCII characters. The result that you get may depend on your external environment. For instance, if I run it in my terminal, with encoding set to UTF-8, I get this: >>> s = "éâÄ" >>> print s éâÄ >>> len(s) 6 >>> list(s) ['\xc3', '\xa9', '\xc3', '\xa2', '\xc3', '\x84'] but if I set it to ISO 8859-1, I get this: >>> list("éâÄ") ['\xe9', '\xe2', '\xc4'] As far as I know, the behaviour of stuffing unicode characters into byte-strings is not well-defined in Python, and will depend on external factors like the terminal you are running in, if any. It may or may not work as you expect. It is better to do this: u = u"éâÄ" s = u.encode('uft-8') which will always work consistently so long as you declare a source encoding at the top of your module: # -*- coding: UTF-8 -*- -- Steven D'Aprano _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor