Re: [Python-Dev] Another case for frozendict

2014-07-16 Thread R. David Murray
On Wed, 16 Jul 2014 03:27:23 +0100, MRAB  wrote:
> Here's another use-case.
> 
> Using the 're' module:
> 
>  >>> import re
>  >>> # Make a regex.
> ... p = re.compile(r'(?P\w+)\s+(?P\w+)')
>  >>>
>  >>> # What are the named groups?
> ... p.groupindex
> {'first': 1, 'second': 2}
>  >>>
>  >>> # Perform a match.
> ... m = p.match('FIRST SECOND')
>  >>> m.groupdict()
> {'first': 'FIRST', 'second': 'SECOND'}
>  >>>
>  >>> # Try modifying the pattern object.
> ... p.groupindex['JUNK'] = 'foobar'
>  >>>
>  >>> # What are the named groups now?
> ... p.groupindex
> {'first': 1, 'second': 2, 'JUNK': 'foobar'}
>  >>>
>  >>> # And the match object?
> ... m.groupdict()
> Traceback (most recent call last):
>File "", line 2, in 
> IndexError: no such group
> 
> It can't find a named group called 'JUNK'.

IMO, preventing someone from shooting themselves in the foot by modifying
something they shouldn't modify according to the API is not a Python
use case ("consenting adults").

> And with a bit more tinkering it's possible to crash Python. (I'll
> leave that as an exercise for the reader! :-))

Preventing a Python program from being able to crash the interpreter,
that's a use case :)

--David
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Another case for frozendict

2014-07-16 Thread R. David Murray
On Wed, 16 Jul 2014 03:27:23 +0100, MRAB  wrote:
>  >>> # Try modifying the pattern object.
> ... p.groupindex['JUNK'] = 'foobar'
>  >>>
>  >>> # What are the named groups now?
> ... p.groupindex
> {'first': 1, 'second': 2, 'JUNK': 'foobar'}
>  >>>
>  >>> # And the match object?
> ... m.groupdict()
> Traceback (most recent call last):
>File "", line 2, in 
> IndexError: no such group
> 
> It can't find a named group called 'JUNK'.

After I hit send on my previous message, I thought more about your
example.  One of the issues here is that modifying the dict breaks an
invariant of the API.  I have a similar situation in the email module,
and I used the same solution you did in regex: always return a new dict.
It would be nice to be able to return a frozendict instead of having the
overhead of building a new dict on each call.  That by itself might not
be enough reason.  But, if the user wants to use the data in modified form
elsewhere, they would then have to construct a new regular dict out of it,
making the decision to vary the data from what matches the state of the
object it came from an explicit one.  That seems to fit the Python zen
("explicit is better than implicit").

So I'm changing my mind, and do consider this a valid use case, even
absent the crash.

--David
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Another case for frozendict

2014-07-16 Thread R. David Murray
On Wed, 16 Jul 2014 14:04:29 -, [email protected] wrote:
> On Wed, Jul 16, 2014 at 09:47:59AM -0400, R. David Murray wrote:
> 
> > It would be nice to be able to return a frozendict instead of having the
> > overhead of building a new dict on each call.
> 
> There already is an in-between available both to Python and C:
> PyDictProxy_New() / types.MappingProxyType. It's a one line change in
> each case to return a temporary intermediary, using something like (C):
> Py_INCREF(self->dict)
> return self->dict;
> 
> To
> return PyDictProxy_New(self->dict);
> 
> Or Python:
> return self.dct
> 
> To
> return types.MappingProxyType(self.dct)
> 
> Which is cheaper than a copy, and avoids having to audit every use of
> self->dict to ensure the semantics required for a "frozendict" are
> respected, i.e. no mutation occurs after the dict becomes visible to the
> user, and potentially has __hash__ called.
> 
> 
> > That by itself might not be enough reason.  But, if the user wants to
> > use the data in modified form elsewhere, they would then have to
> > construct a new regular dict out of it, making the decision to vary
> > the data from what matches the state of the object it came from an
> > explicit one.  That seems to fit the Python zen ("explicit is better
> > than implicit").
> > 
> > So I'm changing my mind, and do consider this a valid use case, even
> > absent the crash.
> 
> Avoiding crashes seems a better use for a read-only proxy, rather than a
> hashable immutable type.

Good point.  MappingProxyType wasn't yet exposed when I wrote that email
code.

--David
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Another case for frozendict

2014-07-16 Thread Eric Snow
On Wed, Jul 16, 2014 at 7:47 AM, R. David Murray  wrote:
> After I hit send on my previous message, I thought more about your
> example.  One of the issues here is that modifying the dict breaks an
> invariant of the API.  I have a similar situation in the email module,
> and I used the same solution you did in regex: always return a new dict.
> It would be nice to be able to return a frozendict instead of having the
> overhead of building a new dict on each call.  That by itself might not
> be enough reason.  But, if the user wants to use the data in modified form
> elsewhere, they would then have to construct a new regular dict out of it,
> making the decision to vary the data from what matches the state of the
> object it came from an explicit one.  That seems to fit the Python zen
> ("explicit is better than implicit").
>
> So I'm changing my mind, and do consider this a valid use case, even
> absent the crash.

+1

A simple implementation is pretty straight-forward:

class FrozenDict(Mapping):
def __init__(self, *args, **kwargs):
self._map = dict(*args, **kwargs)
self._hash = ...
def __hash__(self):
return self._hash
def __len__(self):
return len(self._map)
def __iter__(self):
yield from self._map
def __getitem__(self, key):
return self._map[key]

This is actually something I've used before on a number of occasions.
Having it in the stdlib would be nice (though that alone is not
sufficient for inclusion :)).  If there is a valid use case for a
frozen dict type in other stdlib modules, I'd consider that a pretty
good justification for adding it.

Incidentally, collections.abc.Mapping is the only one of the 6
container ABCs that does not have a concrete implementation (not
counting types.MappingProxyType which is only a proxy).

-eric
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] == on object tests identity in 3.x - list delegation to members?

2014-07-16 Thread Andreas Maier

Am 13.07.2014 18:23, schrieb Steven D'Aprano:

On Sun, Jul 13, 2014 at 05:13:20PM +0200, Andreas Maier wrote:


Second, if not by delegation to equality of its elements, how would the
equality of sequences defined otherwise?


Wow. I'm impressed by the amount of detailed effort you've put into
investigating this. (Too much detail to absorb, I'm afraid.) But perhaps
you might have just asked on the [email protected] mailing list, or
here, where we would have told you the answer:

 list __eq__ first checks element identity before going on
 to check element equality.


I apologize for not asking. It seems I was looking at the trees 
(behaviors of specific cases) without seeing the wood (identity goes first).



If you can read C, you might like to check the list source code:

http://hg.python.org/cpython/file/22e5a85ba840/Objects/listobject.c


I can read (and write) C fluently, but (1) I don't have a build 
environment on my Windows system so I cannot debug it, and (2) I find it 
hard to judge from just looking at the C code which C function is 
invoked when the Python code enters the C code.
(Quoting Raymond H. from his blog: "Unless you know where to look, 
searching the source for an answer can be a time consuming intellectual 
investment.")


So thanks for clarifying this.

I guess I am arriving (slowly and still partly reluctantly, and I'm not 
alone with that feeling, it seems ...) at the bottom line of all this, 
which is that reflexivity is an important goal in Python, that 
self-written non-reflexive classes are not intended nor well supported, 
and that the non-reflexive NaN is considered an exception that cannot be 
expected to be treated consistently non-reflexive.



This was discussed to death some time ago, both on python-dev and
python-ideas. If you're interested, you can start here:

https://mail.python.org/pipermail/python-list/2012-October/633992.html

which is in the middle of one of the threads, but at least it gets you
to the right time period.


I read a number of posts in that thread by now. Sorry for not reading it 
earlier, but the mailing list archive just does not lend itself to 
searching the past. Of course, one can google it ;-)


Andy
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] == on object tests identity in 3.x - list delegation to members?

2014-07-16 Thread Andreas Maier

Am 13.07.2014 22:05, schrieb Akira Li:

Nick Coghlan  writes:
...

definition of floats and the definition of container invariants like
"assert x in [x]")

The current approach means that the lack of reflexivity of NaN's stays
confined to floats and similar types - it doesn't leak out and infect
the behaviour of the container types.

What we've never figured out is a good place to *document* it. I
thought there was an open bug for that, but I can't find it right now.


There was related issue "Tuple comparisons with NaNs are broken"
http://bugs.python.org/issue21873
but it was closed as "not a bug" despite the corresponding behavior is
*not documented* anywhere.


I currently know about these two issues related to fixing the docs:

http://bugs.python.org/11945 - about NaN values in containers
http://bugs.python.org/12067 - comparisons

I am working on the latter, currently. The patch only targets the 
comparisons chapter in the Language Reference, there is another 
comparisons chapter in the Library Reference, and one in the Tutorial.


I will need to update the patch to issue 12067 as a result of this 
discussion.


Andy

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Another case for frozendict

2014-07-16 Thread dw+python-dev
On Wed, Jul 16, 2014 at 09:47:59AM -0400, R. David Murray wrote:

> It would be nice to be able to return a frozendict instead of having the
> overhead of building a new dict on each call.

There already is an in-between available both to Python and C:
PyDictProxy_New() / types.MappingProxyType. It's a one line change in
each case to return a temporary intermediary, using something like (C):
Py_INCREF(self->dict)
return self->dict;

To
return PyDictProxy_New(self->dict);

Or Python:
return self.dct

To
return types.MappingProxyType(self.dct)

Which is cheaper than a copy, and avoids having to audit every use of
self->dict to ensure the semantics required for a "frozendict" are
respected, i.e. no mutation occurs after the dict becomes visible to the
user, and potentially has __hash__ called.


> That by itself might not be enough reason.  But, if the user wants to
> use the data in modified form elsewhere, they would then have to
> construct a new regular dict out of it, making the decision to vary
> the data from what matches the state of the object it came from an
> explicit one.  That seems to fit the Python zen ("explicit is better
> than implicit").
> 
> So I'm changing my mind, and do consider this a valid use case, even
> absent the crash.

Avoiding crashes seems a better use for a read-only proxy, rather than a
hashable immutable type.


David
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] == on object tests identity in 3.x - uploaded patch v9

2014-07-16 Thread Andreas Maier

Am 16.07.2014 13:40, schrieb Andreas Maier:

Am 13.07.2014 22:05, schrieb Akira Li:

Nick Coghlan  writes:
...

There was related issue "Tuple comparisons with NaNs are broken"
http://bugs.python.org/issue21873
but it was closed as "not a bug" despite the corresponding behavior is
*not documented* anywhere.


I currently know about these two issues related to fixing the docs:

http://bugs.python.org/11945 - about NaN values in containers
http://bugs.python.org/12067 - comparisons

I am working on the latter, currently. The patch only targets the
comparisons chapter in the Language Reference, there is another
comparisons chapter in the Library Reference, and one in the Tutorial.

I will need to update the patch to issue 12067 as a result of this
discussion.


I have uploaded v9 of the patch to issue 12067; it should address the 
recent discussion (plus Mark's review comment on the issue itself).


Please review.

Andy

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Another case for frozendict

2014-07-16 Thread Devin Jeanpierre
On Wed, Jul 16, 2014 at 6:37 AM, R. David Murray  wrote:
> IMO, preventing someone from shooting themselves in the foot by modifying
> something they shouldn't modify according to the API is not a Python
> use case ("consenting adults").

Then why have immutable objects at all? Why do you have to put tuples
and frozensets inside sets, instead of lists and sets? Compare with
Java, which really is "consenting adults" here -- you can add a
mutable object to a set, just don't mutate it, or you might not be
able to find it in the set again.

Several people seem to act as if the Pythonic way is to not allow for
any sort of immutable types at all. ISTM people are trying to
retroactively claim some standard of Pythonicity that never existed.
Python can and does protect you from shooting yourself in the foot by
making objects immutable. Or do you have another explanation for the
proliferation of immutable types, and the inability to add mutable
types to sets and dicts?

Using a frozendict to protect and enforce an invariant in the re
module is entirely reasonable. So is creating a new dict each time.
The intermediate -- reusing a mutable dict and failing in
incomprehensible ways if you mutate it, and potentially even crashing
due to memory safety issues -- is not Pythonic at all.

-- Devin
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Another case for frozendict

2014-07-16 Thread R. David Murray
On Wed, 16 Jul 2014 10:10:07 -0700, Devin Jeanpierre  
wrote:
> On Wed, Jul 16, 2014 at 6:37 AM, R. David Murray  
> wrote:
> > IMO, preventing someone from shooting themselves in the foot by modifying
> > something they shouldn't modify according to the API is not a Python
> > use case ("consenting adults").
> 
> Then why have immutable objects at all? Why do you have to put tuples
> and frozensets inside sets, instead of lists and sets? Compare with
> Java, which really is "consenting adults" here -- you can add a
> mutable object to a set, just don't mutate it, or you might not be
> able to find it in the set again.
> 
> Several people seem to act as if the Pythonic way is to not allow for
> any sort of immutable types at all. ISTM people are trying to
> retroactively claim some standard of Pythonicity that never existed.
> Python can and does protect you from shooting yourself in the foot by
> making objects immutable. Or do you have another explanation for the
> proliferation of immutable types, and the inability to add mutable
> types to sets and dicts?
> 
> Using a frozendict to protect and enforce an invariant in the re
> module is entirely reasonable. So is creating a new dict each time.
> The intermediate -- reusing a mutable dict and failing in
> incomprehensible ways if you mutate it, and potentially even crashing
> due to memory safety issues -- is not Pythonic at all.

You'll note I ended up agreeing with you there: when mutation breaks an
invariant of the object it came from, that's an issue.  Which would be
the case if you could use mutable objects as keys.

--David
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] cStringIO vs io.BytesIO

2014-07-16 Thread Mikhail Korobov
Hi,

cStringIO was removed from Python 3. It seems the suggested replacement is
io.BytesIO. But there is a problem: cStringIO.StringIO(b'data') didn't copy
the data while io.BytesIO(b'data') makes a copy (even if the data is not
modified later).

This means io.BytesIO is not suited well to cases when you want to get a
readonly file-like interface for existing byte strings. Isn't it one of the
main io.BytesIO use cases? Wrapping bytes in cStringIO.StringIO used to be
almost free, but this is not true for io.BytesIO.

So making code 3.x compatible by ditching cStringIO can cause a serious
performance/memory  regressions. One can change the code to build the data
using BytesIO (without creating bytes objects in the first place), but that
is not always possible or convenient.

I believe this problem affects tornado (
https://github.com/tornadoweb/tornado/issues/1110), Scrapy (this is how I
became aware of this issue), NLTK (anecdotical evidence - I tried to port
some hairy NLTK module to io.BytesIO, it became many times slower) and
maybe pretty much every IO-related project ported to Python 3.x (django -
check
,
werkzeug and frameworks based on it - check
,
requests - check

- they all wrap user data to BytesIO, and this may cause slowdowns and up
to 2x memory usage in Python 3.x).

Do you know if there a workaround? Maybe there is some stdlib part that I'm
missing, or a module on PyPI? It is not that hard to write an own wrapper
that won't do copies (or to port [c]StringIO to 3.x), but I wonder if there
is an existing solution or plans to fix it in Python itself - this BytesIO
use case looks quite important.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] cStringIO vs io.BytesIO

2014-07-16 Thread dw+python-dev
On Thu, Jul 17, 2014 at 03:44:23AM +0600, Mikhail Korobov wrote:

> So making code 3.x compatible by ditching cStringIO can cause a serious
> performance/memory  regressions. One can change the code to build the data
> using BytesIO (without creating bytes objects in the first place), but that is
> not always possible or convenient.
> 
> I believe this problem affects tornado (https://github.com/tornadoweb/tornado/
> Do you know if there a workaround? Maybe there is some stdlib part that I'm
> missing, or a module on PyPI? It is not that hard to write an own wrapper that
> won't do copies (or to port [c]StringIO to 3.x), but I wonder if there is an
> existing solution or plans to fix it in Python itself - this BytesIO use case
> looks quite important.

Regarding a fix, the problem seems mostly that the StringI/StringO
specializations were removed, and the new implementation is basically
just a StringO.

At a small cost to memory, it is easy to add a Py_buffer source and
flags variable to the bytesio struct, with the buffers initially setup
for reading, and if a mutation method is called, check for a
copy-on-write flag, duplicate the source object into private memory,
then continue operating as it does now.

Attached is a (rough) patch implementing this, feel free to try it with
hg tip.

[23:03:44 k2!124 cpython] cat i.py
import io
buf = b'x' * (1048576 * 16)
def x():
io.BytesIO(buf)

[23:03:51 k2!125 cpython] ./python -m timeit  -s 'import i' 'i.x()'
100 loops, best of 3: 2.9 msec per loop

[23:03:57 k2!126 cpython] ./python-cow -m timeit  -s 'import i' 'i.x()'
100 loops, best of 3: 0.364 usec per loop


David



diff --git a/Modules/_io/bytesio.c b/Modules/_io/bytesio.c
--- a/Modules/_io/bytesio.c
+++ b/Modules/_io/bytesio.c
@@ -2,6 +2,12 @@
 #include "structmember.h"   /* for offsetof() */
 #include "_iomodule.h"
 
+enum io_flags {
+/* initvalue describes a borrowed buffer we cannot modify and must later
+ * release */
+IO_SHARED = 1
+};
+
 typedef struct {
 PyObject_HEAD
 char *buf;
@@ -11,6 +17,10 @@
 PyObject *dict;
 PyObject *weakreflist;
 Py_ssize_t exports;
+Py_buffer initvalue;
+/* If IO_SHARED, indicates PyBuffer_release(initvalue) required, and that
+ * we don't own buf. */
+enum io_flags flags;
 } bytesio;
 
 typedef struct {
@@ -33,6 +43,47 @@
 return NULL; \
 }
 
+/* Unshare our buffer in preparation for writing, in the case that an
+ * initvalue object was provided, and we're currently borrowing its buffer.
+ * size indicates the total reserved buffer size allocated as part of
+ * unsharing, to avoid a potentially redundant allocation in the subsequent
+ * mutation.
+ */
+static int
+unshare(bytesio *self, size_t size)
+{
+Py_ssize_t new_size = size;
+Py_ssize_t copy_size = size;
+char *new_buf;
+
+/* Do nothing if buffer wasn't shared */
+if (! (self->flags & IO_SHARED)) {
+return 0;
+}
+
+/* If hint provided, adjust our new buffer size and truncate the amount of
+ * source buffer we copy as necessary. */
+if (size > copy_size) {
+copy_size = size;
+}
+
+/* Allocate or fail. */
+new_buf = (char *)PyMem_Malloc(new_size);
+if (new_buf == NULL) {
+PyErr_NoMemory();
+return -1;
+}
+
+/* Copy the (possibly now truncated) source string to the new buffer, and
+ * forget any reference used to keep the source buffer alive. */
+memcpy(new_buf, self->buf, copy_size);
+PyBuffer_Release(&self->initvalue);
+self->flags &= ~IO_SHARED;
+self->buf = new_buf;
+self->buf_size = new_size;
+self->string_size = (Py_ssize_t) copy_size;
+return 0;
+}
 
 /* Internal routine to get a line from the buffer of a BytesIO
object. Returns the length between the current position to the
@@ -125,11 +176,18 @@
 static Py_ssize_t
 write_bytes(bytesio *self, const char *bytes, Py_ssize_t len)
 {
+size_t desired;
+
 assert(self->buf != NULL);
 assert(self->pos >= 0);
 assert(len >= 0);
 
-if ((size_t)self->pos + len > self->buf_size) {
+desired = (size_t)self->pos + len;
+if (unshare(self, desired)) {
+return -1;
+}
+
+if (desired > self->buf_size) {
 if (resize_buffer(self, (size_t)self->pos + len) < 0)
 return -1;
 }
@@ -502,6 +560,10 @@
 return NULL;
 }
 
+if (unshare(self, size)) {
+return NULL;
+}
+
 if (size < self->string_size) {
 self->string_size = size;
 if (resize_buffer(self, size) < 0)
@@ -655,10 +717,13 @@
 static PyObject *
 bytesio_close(bytesio *self)
 {
-if (self->buf != NULL) {
+if (self->flags & IO_SHARED) {
+PyBuffer_Release(&self->initvalue);
+self->flags &= ~IO_SHARED;
+} else if (self->buf != NULL) {
 PyMem_Free(self->buf);
-self->buf = NULL;
 }
+self->buf = NULL;
 Py_RETURN_NONE;
 }
 
@@ -788,10 +853,17 

Re: [Python-Dev] cStringIO vs io.BytesIO

2014-07-16 Thread dw+python-dev
It's worth note that a natural extension of this is to do something very
similar on the write side: instead of generating a temporary private
heap allocation, generate (and freely resize) a private PyBytes object
until it is exposed to the user, at which point, _getvalue() returns it,
and converts its into an IO_SHARED buffer.

That way another copy is avoided in the common case of building a
string, calling getvalue() once, then discarding the IO object.


David

On Wed, Jul 16, 2014 at 11:07:54PM +, [email protected] wrote:
> On Thu, Jul 17, 2014 at 03:44:23AM +0600, Mikhail Korobov wrote:
> 
> > So making code 3.x compatible by ditching cStringIO can cause a serious
> > performance/memory  regressions. One can change the code to build the data
> > using BytesIO (without creating bytes objects in the first place), but that 
> > is
> > not always possible or convenient.
> > 
> > I believe this problem affects tornado 
> > (https://github.com/tornadoweb/tornado/
> > Do you know if there a workaround? Maybe there is some stdlib part that I'm
> > missing, or a module on PyPI? It is not that hard to write an own wrapper 
> > that
> > won't do copies (or to port [c]StringIO to 3.x), but I wonder if there is an
> > existing solution or plans to fix it in Python itself - this BytesIO use 
> > case
> > looks quite important.
> 
> Regarding a fix, the problem seems mostly that the StringI/StringO
> specializations were removed, and the new implementation is basically
> just a StringO.
> 
> At a small cost to memory, it is easy to add a Py_buffer source and
> flags variable to the bytesio struct, with the buffers initially setup
> for reading, and if a mutation method is called, check for a
> copy-on-write flag, duplicate the source object into private memory,
> then continue operating as it does now.
> 
> Attached is a (rough) patch implementing this, feel free to try it with
> hg tip.
> 
> [23:03:44 k2!124 cpython] cat i.py
> import io
> buf = b'x' * (1048576 * 16)
> def x():
> io.BytesIO(buf)
> 
> [23:03:51 k2!125 cpython] ./python -m timeit  -s 'import i' 'i.x()'
> 100 loops, best of 3: 2.9 msec per loop
> 
> [23:03:57 k2!126 cpython] ./python-cow -m timeit  -s 'import i' 'i.x()'
> 100 loops, best of 3: 0.364 usec per loop
> 
> 
> David
> 
> 
> 
> diff --git a/Modules/_io/bytesio.c b/Modules/_io/bytesio.c
> --- a/Modules/_io/bytesio.c
> +++ b/Modules/_io/bytesio.c
> @@ -2,6 +2,12 @@
>  #include "structmember.h"   /* for offsetof() */
>  #include "_iomodule.h"
>  
> +enum io_flags {
> +/* initvalue describes a borrowed buffer we cannot modify and must later
> + * release */
> +IO_SHARED = 1
> +};
> +
>  typedef struct {
>  PyObject_HEAD
>  char *buf;
> @@ -11,6 +17,10 @@
>  PyObject *dict;
>  PyObject *weakreflist;
>  Py_ssize_t exports;
> +Py_buffer initvalue;
> +/* If IO_SHARED, indicates PyBuffer_release(initvalue) required, and that
> + * we don't own buf. */
> +enum io_flags flags;
>  } bytesio;
>  
>  typedef struct {
> @@ -33,6 +43,47 @@
>  return NULL; \
>  }
>  
> +/* Unshare our buffer in preparation for writing, in the case that an
> + * initvalue object was provided, and we're currently borrowing its buffer.
> + * size indicates the total reserved buffer size allocated as part of
> + * unsharing, to avoid a potentially redundant allocation in the subsequent
> + * mutation.
> + */
> +static int
> +unshare(bytesio *self, size_t size)
> +{
> +Py_ssize_t new_size = size;
> +Py_ssize_t copy_size = size;
> +char *new_buf;
> +
> +/* Do nothing if buffer wasn't shared */
> +if (! (self->flags & IO_SHARED)) {
> +return 0;
> +}
> +
> +/* If hint provided, adjust our new buffer size and truncate the amount 
> of
> + * source buffer we copy as necessary. */
> +if (size > copy_size) {
> +copy_size = size;
> +}
> +
> +/* Allocate or fail. */
> +new_buf = (char *)PyMem_Malloc(new_size);
> +if (new_buf == NULL) {
> +PyErr_NoMemory();
> +return -1;
> +}
> +
> +/* Copy the (possibly now truncated) source string to the new buffer, and
> + * forget any reference used to keep the source buffer alive. */
> +memcpy(new_buf, self->buf, copy_size);
> +PyBuffer_Release(&self->initvalue);
> +self->flags &= ~IO_SHARED;
> +self->buf = new_buf;
> +self->buf_size = new_size;
> +self->string_size = (Py_ssize_t) copy_size;
> +return 0;
> +}
>  
>  /* Internal routine to get a line from the buffer of a BytesIO
> object. Returns the length between the current position to the
> @@ -125,11 +176,18 @@
>  static Py_ssize_t
>  write_bytes(bytesio *self, const char *bytes, Py_ssize_t len)
>  {
> +size_t desired;
> +
>  assert(self->buf != NULL);
>  assert(self->pos >= 0);
>  assert(len >= 0);
>  
> -if ((size_t)self->pos + len > self->buf_size) {
> +desired = (size_t)self->pos

Re: [Python-Dev] cStringIO vs io.BytesIO

2014-07-16 Thread Nick Coghlan
On 16 Jul 2014 20:00,  wrote:
> On Thu, Jul 17, 2014 at 03:44:23AM +0600, Mikhail Korobov wrote:
> > I believe this problem affects tornado (
https://github.com/tornadoweb/tornado/
> > Do you know if there a workaround? Maybe there is some stdlib part that
I'm
> > missing, or a module on PyPI? It is not that hard to write an own
wrapper that
> > won't do copies (or to port [c]StringIO to 3.x), but I wonder if there
is an
> > existing solution or plans to fix it in Python itself - this BytesIO
use case
> > looks quite important.
>
> Regarding a fix, the problem seems mostly that the StringI/StringO
> specializations were removed, and the new implementation is basically
> just a StringO.

Right, I don't think there's a major philosophy change here, just a missing
optimisation that could be restored in 3.5.

Cheers,
Nick.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] cStringIO vs io.BytesIO

2014-07-16 Thread Antoine Pitrou



Hi,

Le 16/07/2014 19:07, [email protected] a écrit :


Attached is a (rough) patch implementing this, feel free to try it with
hg tip.


Thanks for your work. Please post any patch to http://bugs.python.org

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com