On Mon, Jul 15, 2013 at 5:10 PM, MRAB <pyt...@mrabarnett.plus.com> wrote: > On 16/07/2013 00:30, Gregory P. Smith wrote: >> >> >> On Mon, Jul 15, 2013 at 4:14 PM, Guido van Rossum <gu...@python.org >> <mailto:gu...@python.org>> wrote: >> >> In a discussion about mypy I discovered that the Python 3 version of >> the re module's Match object behaves subtly different from the Python >> 2 version when the target string (i.e. the haystack, not the needle) >> is a buffer object. >> >> In Python 2, the type of the return value of group() is always either >> a Unicode string or an 8-bit string, and the type is determined by >> looking at the target string -- if the target is unicode, group() >> returns a unicode string, otherwise, group() returns an 8-bit string. >> In particular, if the target is a buffer object, group() returns an >> 8-bit string. I think this is the appropriate behavior: otherwise >> using regular expression matching to extract a small substring from a >> large target string would unnecessarily keep the large target string >> alive as long as the substring is alive. >> >> But in Python 3, the behavior of group() has changed so that its >> return type always matches that of the target string. I think this is >> bad -- apart from the lifetime concern, it means that if your target >> happens to be a bytearray, the return value isn't even hashable! >> >> Does anyone remember whether this was a conscious decision? Is it too >> late to fix? >> >> >> Hmm, that is not what I'd expect either. I would never expect it to >> return a bytearray; I'd normally assume that .group() returned a bytes >> object if the input was binary data and a str object if the input was >> unicode data (str) regardless of specific types containing the input >> target data. >> >> I'm going to hazard a guess that not much, if anything, would be >> depending on getting a bytearray out of that. Fix this in 3.4? 3.3 and >> earlier users are stuck with an extra bytes() call and data copy in >> these cases I guess. >> > I'm not sure I understand the complaint. > > I get this for Python 2.7: > > Python 2.7.5 (default, May 15 2013, 22:43:36) [MSC v.1500 32 bit (Intel)] on > win > 32 > Type "help", "copyright", "credits" or "license" for more information. >>>> import array >>>> import re >>>> re.match(r"a", array.array("b", "a")).group() > array('b', [97]) > > It's the same even in Python 2.4.
Ah, but now try it with buffer(): >> re.search('yz+', buffer('xyzzy')).group() 'yzz' >>> The equivalent in Python 3 (using memoryview) returns a memoryview: >>> re.search(b'yz+', memoryview(b'xyzzy')).group() <memory at 0x10d03a688> >>> And I still think that any return type for group() except bytes or str is wrong. (Except possibly a subclass of these.) -- --Guido van Rossum (python.org/~guido) _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com