[issue18468] re.group() should never return a bytearray

2013-10-16 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


--
assignee:  - serhiy.storchaka

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18468
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18468] re.group() should never return a bytearray

2013-10-16 Thread Roundup Robot

Roundup Robot added the comment:

New changeset add40e9f7cbe by Serhiy Storchaka in branch 'default':
Issue #18468: The re.split, re.findall, and re.sub functions and the group()
http://hg.python.org/cpython/rev/add40e9f7cbe

--
nosy: +python-dev

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18468
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18468] re.group() should never return a bytearray

2013-10-16 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Thank you Antoine for your review.

--
resolution:  - fixed
stage: patch review - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18468
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18468] re.group() should never return a bytearray

2013-10-01 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


Removed file: http://bugs.python.org/file31737/re_group_type.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18468
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18468] re.group() should never return a bytearray

2013-10-01 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Fixed a typo.

Could anyone please make a review?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18468
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18468] re.group() should never return a bytearray

2013-10-01 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


Added file: http://bugs.python.org/file31939/re_group_type.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18468
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18468] re.group() should never return a bytearray

2013-10-01 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Updated patch addressed Antoine's comments.

--
Added file: http://bugs.python.org/file31941/re_group_type_2.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18468
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18468] re.group() should never return a bytearray

2013-09-13 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Oh, seems I again did not attach a patch. Now I understand why there were no 
any feedback so long time.

--
keywords: +needs review, patch
Added file: http://bugs.python.org/file31737/re_group_type.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18468
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18468] re.group() should never return a bytearray

2013-08-06 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Here is a patch with an implementation and tests. Feel free to add a 
documentation changes if needed.

--
stage: needs patch - patch review

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18468
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18468] re.group() should never return a bytearray

2013-07-25 Thread Serhiy Storchaka

Changes by Serhiy Storchaka storch...@gmail.com:


--
nosy: +serhiy.storchaka

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18468
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18468] re.group() should never return a bytearray

2013-07-24 Thread Arfrever Frehtes Taifersar Arahesis

Changes by Arfrever Frehtes Taifersar Arahesis arfrever@gmail.com:


--
nosy: +Arfrever

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18468
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18468] re.group() should never return a bytearray

2013-07-18 Thread Ezio Melotti

Ezio Melotti added the comment:

I'm not sure it's worth changing it.
As I see it, match/search are supposed to work with str or bytes and they 
return str/bytes accordingly.  The fact that they work with other bytes-like 
objects seems to me an undocumented implementation detail people should not 
rely on.
If they are passing bytes-like object, both the current behavior (return same 
type) or the new proposed behavior (always return bytes) seem reasonable 
expectations.

IIUC the advantage of changing the behavior is that it won't keep the target 
string alive anymore, but on the other hand is not backward compatible and 
makes things more difficult for people who want the same type back.
If people always want bytes back regardless of the input, they can convert the 
input or output to bytes explicitly.

--
components: +Regular Expressions
nosy: +ezio.melotti, mrabarnett

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18468
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18468] re.group() should never return a bytearray

2013-07-18 Thread Matthew Barnett

Matthew Barnett added the comment:

There's also the fact that the match object keeps a reference to the target 
string anyway:

 import re
 t = memoryview(ba)
 t
memory at 0x0100F110
 m = re.match(ba, t)
 m.string
memory at 0x0100F110

On that subject, buried in the source code (_sre.c) is the comment:

/* FIXME: implement setattr(string, None) as a special case (to
   detach the associated string, if any */


In the regex module I added a method detach_string to perform that function.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18468
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18468] re.group() should never return a bytearray

2013-07-18 Thread Ezio Melotti

Ezio Melotti added the comment:

 match/search are supposed to work with str or bytes and
 they return str/bytes accordingly.

s/they return/calling m.group() returns/

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18468
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18468] re.group() should never return a bytearray

2013-07-18 Thread Guido van Rossum

Guido van Rossum added the comment:

 Ezio Melotti added the comment:
[...]
 IIUC the advantage of changing the behavior is that it won't keep the target 
 string alive anymore, but on the other hand is not backward compatible and 
 makes things more difficult for people who want the same type back.

Everyone seems to be afraid of backward compatibility here. I will
take full responsibility, so let's just discuss what's the better API,
regardless of what we did (and in 99% of the cases it's the same
anyway).

People who want the same type back -- there is no evidence that
anyone wants this. People who want a bytes object -- this is
definitely a valid use case.

 If people always want bytes back regardless of the input, they can convert 
 the input or output to bytes explicitly.

But this requires an extra copy if the input is a bytearray. I suspect
this might be the most commonly used non-bytes non-str target in
Python 3 programs, and we are striving to support bytearray as input
in as many places as possible where plain bytes is accepted. But
generally getting bytearray as output requires a different API, e.g.
recv_into().

I think a very reasonable general rule is that for functions that take
either str or bytes and adjust their output to the input type, if
their input is one of the bytes alternatives (bytearray, memoryview,
array.array('b'), maybe others) the output is always a bytes object.

The reason is that while the buffer API makes it easy to access the
underlying bytes from C, it doesn't give you a way to create a new
object of the same type (except by slicing, which doesn't always
apply, e.g. os.listdir()). So for creating return values that match a
memoryview (or bytearray, etc.) input, the only reasonable thing is to
return a bytes object.

(FWIW os.listdir() violates this too -- os.listdir(b'.') returns a
list of bytes objects, while os.listdir(bytearray(b'.')) returns a
list of str objects. This seems caused by revesed logic -- it probably
tests if the type is bytes rather than if the type isn't str for
the output type, even though it does the right thing with the
input...)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18468
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18468] re.group() should never return a bytearray

2013-07-15 Thread Guido van Rossum

New submission from Guido van Rossum:

I discovered that the Python 3 version of
the re module's Match object behaves subtly different from the Python
2 version when the target string (i.e. the haystack, not the needle)
is a buffer object.

In Python 2, the type of the return value of group() is always either
a Unicode string or an 8-bit string, and the type is determined by
looking at the target string -- if the target is unicode, group()
returns a unicode string, otherwise, group() returns an 8-bit string.
In particular, if the target is a buffer object, group() returns an
8-bit string. I think this is the appropriate behavior: otherwise
using regular expression matching to extract a small substring from a
large target string would unnecessarily keep the large target string
alive as long as the substring is alive.

But in Python 3, the behavior of group() has changed so that its
return type always matches that of the target string. I think this is
bad -- apart from the lifetime concern, it means that if your target
happens to be a bytearray, the return value isn't even hashable!

Proper behavior should be that .group() returned a bytes object if the input 
was binary data and a str object if the input was unicode data (str) regardless 
of specific types containing the input target data.

Probably not much, if anything, would be depending on getting a bytearray out 
of that. Fix this in 3.4? 3.3 and earlier users are stuck with an extra bytes() 
call and data copy in these cases.

[Further discussion at 
http://mail.python.org/pipermail/python-dev/2013-July/127332.html]

--
components: Library (Lib)
messages: 193136
nosy: gvanrossum
priority: normal
severity: normal
stage: needs patch
status: open
title: re.group() should never return a bytearray
type: behavior
versions: Python 3.4

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue18468
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com