On 16 July 2013 19:18, Terry Reedy <tjre...@udel.edu> wrote: > I wonder if the change was an artifact of changing the code to prohibit > mixing Unicode and bytes.
I'm pretty sure we the only thing we changed in 3.x is to migrate re to the PEP 3118 buffer API, and the behavioural change Guido is seeing is actually the one between the 2.x buffer (which returns 8-bit strings when sliced) and other types (including memoryview) which return instances of themselves. Getting the old buffer behaviour in 3.x without an extra copy operation should just be a matter of wrapping the input with memoryview (to avoid copying the group elements in the match object) and the output with bytes (to avoid keeping the entire original object alive just to reference a few small pieces of it that were matched by the regex): >>> import re >>> data = bytearray(b"aaabbbcccddd") >>> re.match(b"(a*)b*c*(d*)", data).group(2) bytearray(b'ddd') >>> bytes(re.match(b"(a*)b*c*(d*)", memoryview(data)).group(2)) b'ddd' Given that, I'm inclined to keep the existing behaviour on backwards compatibility grounds. To make the above code work on both 2.x *and* 3.x without making an extra copy, it's possible to keep the bytes call (it should be a no-op on 2.x) and dynamically switch the type used to wrap the input between buffer in 2.x and memoryview in 3.x (unfortunately, the 2.x memoryview doesn't work for this case, as the 2.x re API doesn't accept it as valid input). Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com