Re: [Python-Dev] Misc re.match() complaint

Gregory P. Smith Mon, 15 Jul 2013 16:33:44 -0700

On Mon, Jul 15, 2013 at 4:14 PM, Guido van Rossum <[email protected]> wrote:


> In a discussion about mypy I discovered that the Python 3 version of
> the re module's Match object behaves subtly different from the Python
> 2 version when the target string (i.e. the haystack, not the needle)
> is a buffer object.
>
> In Python 2, the type of the return value of group() is always either
> a Unicode string or an 8-bit string, and the type is determined by
> looking at the target string -- if the target is unicode, group()
> returns a unicode string, otherwise, group() returns an 8-bit string.
> In particular, if the target is a buffer object, group() returns an
> 8-bit string. I think this is the appropriate behavior: otherwise
> using regular expression matching to extract a small substring from a
> large target string would unnecessarily keep the large target string
> alive as long as the substring is alive.
>
> But in Python 3, the behavior of group() has changed so that its
> return type always matches that of the target string. I think this is
> bad -- apart from the lifetime concern, it means that if your target
> happens to be a bytearray, the return value isn't even hashable!
>
> Does anyone remember whether this was a conscious decision? Is it too
> late to fix?


Hmm, that is not what I'd expect either. I would never expect it to return
a bytearray; I'd normally assume that .group() returned a bytes object if
the input was binary data and a str object if the input was unicode data
(str) regardless of specific types containing the input target data.

I'm going to hazard a guess that not much, if anything, would be depending
on getting a bytearray out of that. Fix this in 3.4? 3.3 and earlier users
are stuck with an extra bytes() call and data copy in these cases I guess.

-gps

_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Misc re.match() complaint

Reply via email to