On Sun, 20 Aug 2006, Nick Coghlan wrote: > John J Lee wrote: >> Is this a bug? > > I don't believe so - the string formatting documentation states that the > result will be unicode if either the format string is unicode or any of the > objects passed to a %s format code is unicode. > > That latter part has just been extended to include any object that returns > Unicode from __str__, instead of being restricted to actual Unicode > instances. > > Note that the following behaves the same way regardless of whether you use > 2.4 or 2.5: > "%s" % 'hi' > "%s" % u'hi'
Given that, the following wording should be changed: http://docs.python.org/lib/typesseq-strings.html Conversion Meaning Notes ... s String (converts any python object using str()). (4) ... (4) If the object or format provided is a unicode string, the resulting string will also be unicode. The note (4) says that the result will be unicode, but it doesn't say how, in this case, that comes about. This case is confusing because the docs claim string formatting with %s "converts ... using str()", and yet str(a()) returns a bytestring. Does it *really* use str, or just __str__? Surely the latter? (given the observed behaviour, and not reading the C source) FWIW, this change broke epydoc (fails with an AssertionError -- so perhaps without the assert it would still "work", dunno). > And once the result has been promoted to unicode, __unicode__ is used > directly: > >> > > print repr("%s%s" % (a(), a())) > __str__ > accessing <__main__.a object at 0x00AF66F0>.__unicode__ > __str__ > accessing <__main__.a object at 0x00AF6390>.__unicode__ > __str__ > u'hihi' I don't understand this part. Why is __unicode__ called? Your example doesn't appear to show this happening "once [i.e., because?] the result has been promoted to unicode" -- if that were true, it would "stand to reason" <wink> that the interpreter would then conclude it should call __unicode__ for all remaining %s, and not bother with __str__. If OTOH __unicode__ is called because __str__ returned a unicode object, it makes (very slightly) more sense that it goes through the same __str__-then-__unicode__ rigmarole for each object on the RHS of the %. But none of that seems to make a huge amount of sense. I've now found the September 2004 discussion of this, and I'm none the wiser. John _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com