On Jan 10, 2008 9:57 AM, Eric Smith <[EMAIL PROTECTED]> wrote: > Eric Smith wrote: > > (I'm posting to python-dev, because this isn't strictly 3.0 related. > > Hopefully most people read it in addition to python-3000). > > > > I'm working on backporting the changes I made for PEP 3101 (Advanced > > String Formatting) to the trunk, in order to meet the pre-PyCon release > > date for 2.6a1. > > > > I have a few questions about how I should handle str/unicode. 3.0 was > > pretty easy, because everything was unicode. > > > > 1: How should the builtin format() work? It takes 2 parameters, an > > object o and a string s, and returns o.__format__(s). If s is None, it > > returns o.__format__(empty_string). In 3.0, the empty string is of > > course unicode. For 2.6, should I use u'' or ''? > > I just re-read PEP 3101, and it doesn't mention this behavior with None. > The way the code actually works is that the specifier is optional, and > if it isn't present then it defaults to an empty string. This behavior > isn't mentioned in the PEP, either. > > This feature came from a request from Talin[0]. We should either add > this to the PEP (and docs), or remove it. If we document it, it should > mention the 2.x behavior (as other places in the PEP do). If we removed > it, it would remove the one place in the backport that's not just hard, > but ambiguous. I'd just as soon see the feature go away, myself.
IIUC, the 's' argument is the format specifier. Format specifiers are written in a very conservative character set, so I'm not sure it matters. Or are you assuming that the *type* of 's' also determines the type of the output? I may be in the minority here, but I think I like having a default for 's' (as implemented -- the PEP ought to be updated) and I also think it should default to an 8-bit string, assuming you support 8-bit strings at all -- after all in 2.x 8-bit strings are the default string type (as reflected by their name, 'str'). > > 3: Every overridden __format__() method is going to have to check for > > string or unicode, just like object.__format() does, and return either a > > string or unicode object, appropriately. I don't see any way around > > this, but I'd like to hear any thoughts. I guess there aren't all that > > many __format__ methods that will be implemented, so this might not be a > > big burden. I'll of course implement the built in ones. > > The PEP actually mentions that this is how 2.x will have to work. So > I'll go ahead and implement it that way, on the assumption that getting > string support into 2.6 is desirable. I think it is. (But then I still live in a predominantly ASCII world. :-) For data types whose output uses only ASCII, would it be acceptable if they always returned an 8-bit string and left it up to the caller to convert it to Unicode? This would apply to all numeric types. (The date/time types have a strftime() style API which means the user must be able to specifiy Unicode.) -- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com