On Tue, Jan 14, 2014 at 9:29 AM, Yury Selivanov <yselivanov...@gmail.com>wrote:
> - Try str(), and do ".encode(‘ascii’, ‘stcict’)” on the result. > please no -- that's the source of a lot of pain in py2 now. having a failure as a result of the value, rather than the type, of an object just makes hard-to-test for bugs. Everything will be hunky dory for development and testing, then in deployment some idiot ( ;-) ) will pass in some non-ascii compatible string and you get failure. And the person who gets the failure doesn't understand why, or they wouldn't have passed in non-ascii values in the first place... Ease of porting is nice, but let's not make it easy to port bug-prone code. -Chris > > This way *most* of the use cases of python2 will be covered without > touching the code. So: > > - b’Hello {}’.format(‘world’) > will be the same as b’hello ‘ + str(‘world’).encode(‘ascii’, ‘strict’) > > - b’Hello {}’.format(‘\u0394’) will throw UnicodeEncodeError > > - b’Status: {}’.format(200) > will be the same as b’Status: ‘ + str(200).encode(‘ascii’, ‘strict’) > > - b’Hello %s’ % (‘world’,) - the same as the first example > > - b’Connection: {}’.format(b’keep-alive’) - works > > - b’Hello %s’ % (b'\xce\x94’,) - will fail, not ASCII subset we accept > > I think it’s OK to check the buffers for ASCII-subset only. Yes, it > will have some sort of sub-optimal performance, but then, it’s quite > rare when string formatting is used to concatenate huge buffers. > > 2. new operators {!b} and %b. This ones will just use ‘__bytes__’ and > Py_buffer. > > -- > Yury Selivanov > > On January 14, 2014 at 11:31:51 AM, Brett Cannon (br...@python.org) wrote: > > > > On Mon, Jan 13, 2014 at 5:14 PM, Guido van Rossum > > wrote: > > > > > On Mon, Jan 13, 2014 at 2:05 PM, Brett Cannon > > wrote: > > > > I have been going on the assumption that bytes.format() would > > change what > > > > '{}' meant for itself and would only interpolate bytes. That > > convenient > > > > between Python 2 and 3 since it represents what we want it to > > (str and > > > bytes > > > > under the hood, respectively), so it just falls through. We > > could also > > > add a > > > > 'b' conversion for bytes() explicitly so as to help people > > not > > > accidentally > > > > mix up things in bytes.format() and str.format(). But I was > > not > > > suggesting > > > > adding a specific format spec for bytes but instead making > > bytes.format() > > > > just do the .encode('ascii') automatically to help with compatibility > > > when a > > > > format spec was present. If people want fancy formatting for > > bytes they > > > can > > > > always do it themselves before calling bytes.format(). > > > > > > This seems hastily written (e.g. verb missing :-), and I'm not > > clear > > > on what you are (or were) actually proposing. When exactly would > > > bytes.format() need .encode('ascii')? > > > > > > I would be happy to wait a few hours or days for you to to write it > > up > > > clearly, rather than responding in a hurry. > > > > > > Sorry about that. Busy day at work + trying to stay on top of this > > entire > > conversation was a bit tough. Let me try to lay out what I'm suggesting > > for > > bytes.format() in terms of how it changes > > http://docs.python.org/3/library/string.html#format-string-syntax > > for bytes. > > > > 1. New conversion operator of 'b' that operates as PEP 460 specifies > > (i.e. > > tries to get a buffer, else calls __bytes__). The default conversion > > changes from 's' to 'b'. > > 2. Use of the conversion field adds an added step of calling > > str.encode('ascii', 'strict') on the result returned from > > calling > > __format__(). > > > > That's it. So point 1 means that the following would work in Python > > 3.5:: > > > > b'Hello, {}, how are you?'.format(b'Guido') > > b'Hello, {!b}, how are you?'.format(b'Guido') > > > > It would produce an error if you used a text argument for 'Guido' > > since str > > doesn't define __bytes__ or a buffer. That gives the EIBTI group > > their > > bytes.format() where nothing magical happens. > > > > For point 2, let's say you have the following in Python 2:: > > > > 'I have {} bottles of beer on the wall'.format(10) > > > > Under my proposal, how would you change it to get the same result > > in Python > > 2 and 3?:: > > > > b'I have {:d} bottles of beer on the wall'.format(10) > > > > In Python 2 you're just being more explicit about the format, > > otherwise > > it's the same semantics as today. In Python 3, though, this would > > translate > > into (under the hood):: > > > > b'I have {} bottles of beer on the wall'.format(format(10, > > 'd').encode('ascii', 'strict')) > > > > This leads to the same bytes value in Python 2 (since it's just > > a string) > > and in Python 3 (as everything accepted by bytes.format() is > > either bytes > > already or converted to from encoding to ASCII bytes). While > > Python 2 users > > would need to make sure they used a format spec to get the same result > > in > > both Python 2 and 3 for ASCII bytes, it's a minor change which also > > makes > > the format more explicit so it's not an inherently bad thing. > > And for those > > that don't want to utilize the automatic ASCII encoding they > > can just not > > use a format spec in the format string and just pass in bytes directly > > (i.e. call __format__() themselves and then call str.encode() > > on their > > own). So PBP people get to have a simple way to use bytes.format() > > in > > Python 2 and 3 when dealing with things that can be represented > > as ASCII > > (just as the bytes methods allow for currently). > > > > I think this covers your desire to have numbers and anything else > > that can > > be represented as ASCII be supported for easy porting while covering > > my > > desire that any automatic encoding is clearly explicit in the > > format string > > and in no way special-cased for only some types (the introduction > > of a 'c' > > converter from PEP 460 is also fine with me). > > > > How you would want to translate this proposal with the % operator > > I'm not > > sure since it has been quite a while since I last seriously used > > it and so > > I don't think I'm in a good position to propose a shift for it. > > _______________________________________________ > > Python-Dev mailing list > > Python-Dev@python.org > > https://mail.python.org/mailman/listinfo/python-dev > > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/yselivanov.ml%40gmail.com > > > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/chris.barker%40noaa.gov > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception chris.bar...@noaa.gov
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com