Re: [Python-Dev] Another case for frozendict
On Sun, Jul 13, 2014 at 02:04:17PM +, Jason R. Coombs wrote: > PEP-416 mentions a MappingProxyType, but that’s no help. Well, it kindof is. By combining MappingProxyType and UserDict the desired effect can be achieved concisely: import collections import types class frozendict(collections.UserDict): def __init__(self, d, **kw): if d: d = d.copy() d.update(kw) else: d = kw self.data = types.MappingProxyType(d) _h = None def __hash__(self): if self._h is None: self._h = sum(map(hash, self.data.items())) return self._h def __repr__(self): return repr(dict(self)) > Although hashability is mentioned in the PEP under constraints, there are many > use-cases that fall out of the ability to hash a dict, such as the one > described above, which are not mentioned at all in use-cases for the PEP. > If there’s ever any interest in reviving that PEP, I’m in favor of its > implementation. In its previous form, the PEP seemed more focused on some false optimization capabilities of a read-only type, rather than as here, the far more interesting hashability properties. It might warrant a fresh PEP to more thoroughly investigate this angle. David ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Another case for frozendict
On Sun, Jul 13, 2014 at 06:43:28PM +, dw+python-...@hmmz.org wrote: > if d: > d = d.copy() To cope with iterables, "d = d.copy()" should have read "d = dict(d)". David ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Another case for frozendict
On Wed, Jul 16, 2014 at 09:47:59AM -0400, R. David Murray wrote: > It would be nice to be able to return a frozendict instead of having the > overhead of building a new dict on each call. There already is an in-between available both to Python and C: PyDictProxy_New() / types.MappingProxyType. It's a one line change in each case to return a temporary intermediary, using something like (C): Py_INCREF(self->dict) return self->dict; To return PyDictProxy_New(self->dict); Or Python: return self.dct To return types.MappingProxyType(self.dct) Which is cheaper than a copy, and avoids having to audit every use of self->dict to ensure the semantics required for a "frozendict" are respected, i.e. no mutation occurs after the dict becomes visible to the user, and potentially has __hash__ called. > That by itself might not be enough reason. But, if the user wants to > use the data in modified form elsewhere, they would then have to > construct a new regular dict out of it, making the decision to vary > the data from what matches the state of the object it came from an > explicit one. That seems to fit the Python zen ("explicit is better > than implicit"). > > So I'm changing my mind, and do consider this a valid use case, even > absent the crash. Avoiding crashes seems a better use for a read-only proxy, rather than a hashable immutable type. David ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] cStringIO vs io.BytesIO
On Thu, Jul 17, 2014 at 03:44:23AM +0600, Mikhail Korobov wrote: > So making code 3.x compatible by ditching cStringIO can cause a serious > performance/memory regressions. One can change the code to build the data > using BytesIO (without creating bytes objects in the first place), but that is > not always possible or convenient. > > I believe this problem affects tornado (https://github.com/tornadoweb/tornado/ > Do you know if there a workaround? Maybe there is some stdlib part that I'm > missing, or a module on PyPI? It is not that hard to write an own wrapper that > won't do copies (or to port [c]StringIO to 3.x), but I wonder if there is an > existing solution or plans to fix it in Python itself - this BytesIO use case > looks quite important. Regarding a fix, the problem seems mostly that the StringI/StringO specializations were removed, and the new implementation is basically just a StringO. At a small cost to memory, it is easy to add a Py_buffer source and flags variable to the bytesio struct, with the buffers initially setup for reading, and if a mutation method is called, check for a copy-on-write flag, duplicate the source object into private memory, then continue operating as it does now. Attached is a (rough) patch implementing this, feel free to try it with hg tip. [23:03:44 k2!124 cpython] cat i.py import io buf = b'x' * (1048576 * 16) def x(): io.BytesIO(buf) [23:03:51 k2!125 cpython] ./python -m timeit -s 'import i' 'i.x()' 100 loops, best of 3: 2.9 msec per loop [23:03:57 k2!126 cpython] ./python-cow -m timeit -s 'import i' 'i.x()' 100 loops, best of 3: 0.364 usec per loop David diff --git a/Modules/_io/bytesio.c b/Modules/_io/bytesio.c --- a/Modules/_io/bytesio.c +++ b/Modules/_io/bytesio.c @@ -2,6 +2,12 @@ #include "structmember.h" /* for offsetof() */ #include "_iomodule.h" +enum io_flags { +/* initvalue describes a borrowed buffer we cannot modify and must later + * release */ +IO_SHARED = 1 +}; + typedef struct { PyObject_HEAD char *buf; @@ -11,6 +17,10 @@ PyObject *dict; PyObject *weakreflist; Py_ssize_t exports; +Py_buffer initvalue; +/* If IO_SHARED, indicates PyBuffer_release(initvalue) required, and that + * we don't own buf. */ +enum io_flags flags; } bytesio; typedef struct { @@ -33,6 +43,47 @@ return NULL; \ } +/* Unshare our buffer in preparation for writing, in the case that an + * initvalue object was provided, and we're currently borrowing its buffer. + * size indicates the total reserved buffer size allocated as part of + * unsharing, to avoid a potentially redundant allocation in the subsequent + * mutation. + */ +static int +unshare(bytesio *self, size_t size) +{ +Py_ssize_t new_size = size; +Py_ssize_t copy_size = size; +char *new_buf; + +/* Do nothing if buffer wasn't shared */ +if (! (self->flags & IO_SHARED)) { +return 0; +} + +/* If hint provided, adjust our new buffer size and truncate the amount of + * source buffer we copy as necessary. */ +if (size > copy_size) { +copy_size = size; +} + +/* Allocate or fail. */ +new_buf = (char *)PyMem_Malloc(new_size); +if (new_buf == NULL) { +PyErr_NoMemory(); +return -1; +} + +/* Copy the (possibly now truncated) source string to the new buffer, and + * forget any reference used to keep the source buffer alive. */ +memcpy(new_buf, self->buf, copy_size); +PyBuffer_Release(&self->initvalue); +self->flags &= ~IO_SHARED; +self->buf = new_buf; +self->buf_size = new_size; +self->string_size = (Py_ssize_t) copy_size; +return 0; +} /* Internal routine to get a line from the buffer of a BytesIO object. Returns the length between the current position to the @@ -125,11 +176,18 @@ static Py_ssize_t write_bytes(bytesio *self, const char *bytes, Py_ssize_t len) { +size_t desired; + assert(self->buf != NULL); assert(self->pos >= 0); assert(len >= 0); -if ((size_t)self->pos + len > self->buf_size) { +desired = (size_t)self->pos + len; +if (unshare(self, desired)) { +return -1; +} + +if (desired > self->buf_size) { if (resize_buffer(self, (size_t)self->pos + len) < 0) return -1; } @@ -502,6 +560,10 @@ return NULL; } +if (unshare(self, size)) { +return NULL; +} + if (size < self->string_size) { self->string_size = size; if (resize_buffer(self, size) < 0) @@ -655,10 +717,13 @@ static PyObject * bytesio_close(bytesio *self) { -if (self->buf != NULL) { +if (self->flags & IO_SHARED) { +PyBuffer_Release(&self->initvalue); +self->flags &= ~IO_SHARED; +} else if (self->buf != NULL) { PyMem_Free(self->buf); -self->buf = NULL; } +self->buf = NULL; Py_RETURN_NONE; } @@ -788,10 +853,17
Re: [Python-Dev] cStringIO vs io.BytesIO
It's worth note that a natural extension of this is to do something very similar on the write side: instead of generating a temporary private heap allocation, generate (and freely resize) a private PyBytes object until it is exposed to the user, at which point, _getvalue() returns it, and converts its into an IO_SHARED buffer. That way another copy is avoided in the common case of building a string, calling getvalue() once, then discarding the IO object. David On Wed, Jul 16, 2014 at 11:07:54PM +, dw+python-...@hmmz.org wrote: > On Thu, Jul 17, 2014 at 03:44:23AM +0600, Mikhail Korobov wrote: > > > So making code 3.x compatible by ditching cStringIO can cause a serious > > performance/memory regressions. One can change the code to build the data > > using BytesIO (without creating bytes objects in the first place), but that > > is > > not always possible or convenient. > > > > I believe this problem affects tornado > > (https://github.com/tornadoweb/tornado/ > > Do you know if there a workaround? Maybe there is some stdlib part that I'm > > missing, or a module on PyPI? It is not that hard to write an own wrapper > > that > > won't do copies (or to port [c]StringIO to 3.x), but I wonder if there is an > > existing solution or plans to fix it in Python itself - this BytesIO use > > case > > looks quite important. > > Regarding a fix, the problem seems mostly that the StringI/StringO > specializations were removed, and the new implementation is basically > just a StringO. > > At a small cost to memory, it is easy to add a Py_buffer source and > flags variable to the bytesio struct, with the buffers initially setup > for reading, and if a mutation method is called, check for a > copy-on-write flag, duplicate the source object into private memory, > then continue operating as it does now. > > Attached is a (rough) patch implementing this, feel free to try it with > hg tip. > > [23:03:44 k2!124 cpython] cat i.py > import io > buf = b'x' * (1048576 * 16) > def x(): > io.BytesIO(buf) > > [23:03:51 k2!125 cpython] ./python -m timeit -s 'import i' 'i.x()' > 100 loops, best of 3: 2.9 msec per loop > > [23:03:57 k2!126 cpython] ./python-cow -m timeit -s 'import i' 'i.x()' > 100 loops, best of 3: 0.364 usec per loop > > > David > > > > diff --git a/Modules/_io/bytesio.c b/Modules/_io/bytesio.c > --- a/Modules/_io/bytesio.c > +++ b/Modules/_io/bytesio.c > @@ -2,6 +2,12 @@ > #include "structmember.h" /* for offsetof() */ > #include "_iomodule.h" > > +enum io_flags { > +/* initvalue describes a borrowed buffer we cannot modify and must later > + * release */ > +IO_SHARED = 1 > +}; > + > typedef struct { > PyObject_HEAD > char *buf; > @@ -11,6 +17,10 @@ > PyObject *dict; > PyObject *weakreflist; > Py_ssize_t exports; > +Py_buffer initvalue; > +/* If IO_SHARED, indicates PyBuffer_release(initvalue) required, and that > + * we don't own buf. */ > +enum io_flags flags; > } bytesio; > > typedef struct { > @@ -33,6 +43,47 @@ > return NULL; \ > } > > +/* Unshare our buffer in preparation for writing, in the case that an > + * initvalue object was provided, and we're currently borrowing its buffer. > + * size indicates the total reserved buffer size allocated as part of > + * unsharing, to avoid a potentially redundant allocation in the subsequent > + * mutation. > + */ > +static int > +unshare(bytesio *self, size_t size) > +{ > +Py_ssize_t new_size = size; > +Py_ssize_t copy_size = size; > +char *new_buf; > + > +/* Do nothing if buffer wasn't shared */ > +if (! (self->flags & IO_SHARED)) { > +return 0; > +} > + > +/* If hint provided, adjust our new buffer size and truncate the amount > of > + * source buffer we copy as necessary. */ > +if (size > copy_size) { > +copy_size = size; > +} > + > +/* Allocate or fail. */ > +new_buf = (char *)PyMem_Malloc(new_size); > +if (new_buf == NULL) { > +PyErr_NoMemory(); > +return -1; > +} > + > +/* Copy the (possibly now truncated) source string to the new buffer, and > + * forget any reference used to keep the source buffer alive. */ > +memcpy(new_buf, self->buf, copy_size); > +PyBuffer_Release(&self->initvalue); > +self->flags &= ~IO_SHARED; > +self->buf = new_buf; > +self->buf_size = new_size; > +self->string_size = (Py_ssize_t) copy_size; > +return 0; > +} > > /* Internal routine to get a line from the buffer of a BytesIO > object. Returns the length between the current position to the > @@ -125,11 +176,18 @@ > static Py_ssize_t > write_bytes(bytesio *self, const char *bytes, Py_ssize_t len) > { > +size_t desired; > + > assert(self->buf != NULL); > assert(self->pos >= 0); > assert(len >= 0); > > -if ((size_t)self->pos + len > self->buf_size) { > +desired = (size_t)self->pos
Re: [Python-Dev] cpython: Issue #22003: When initialized from a bytes object, io.BytesIO() now
Hi Serhiy, At least conceptually, 15381 seems the better approach, but getting a correct implementation may take more iterations than the (IMHO) simpler change in 22003. For my tastes, the current 15381 implementation seems a little too magical in relying on Py_REFCNT() as the sole indication that a PyBytes can be mutated. For the sake of haste, 22003 only addresses the specific regression introduced in Python 3.x BytesIO, compared to 2.x StringI, where 3.x lacked an equivalent no-copies specialization. David ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] python process creation overhead
On Mon, May 12, 2014 at 04:22:52PM -0700, Gregory Szorc wrote: > Why can't Python start as quickly as Perl or Ruby? On my heavily abused Core 2 Macbook with 9 .pth files, 2.7 drops from 81ms startup to 20ms by specifying -S, which disables site.py. Oblitering the .pth files immediately knocks 10ms off regular startup. I guess the question isn't why Python is slower than perl, but what aspects of site.py could be cached, reimplemented, or stripped out entirely. I'd personally love to see .pth support removed. David ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Internal representation of strings and Micropython
On Wed, Jun 04, 2014 at 03:17:00PM +1000, Nick Coghlan wrote: > There's a general expectation that indexing will be O(1) because all > the builtin containers that support that syntax use it for O(1) lookup > operations. Depending on your definition of built in, there is at least one standard library container that does not - collections.deque. Given the specialized kinds of application this Python implementation is targetted at, it seems UTF-8 is ideal considering the huge memory savings resulting from the compressed representation, and the reduced likelihood of there being any real need for serious text processing on the device. It is also unlikely to find software or libraries like Django or Werkzeug running on a microcontroller, more likely all the Python code would be custom, in which case, replacing string indexing with iteration, or temporary conversion to a list is easily done. In this context, while a fixed-width encoding may be the correct choice it would also likely be the wrong choice. David ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Moving Python 3.5 on Windows to a new compiler
On Fri, Jun 06, 2014 at 03:41:22PM +, Steve Dower wrote: > [snip] Speaking as a third party who aims to provide binary distributions for recent Python releases on Windows, every new compiler introduces a licensing and configuration headache. So I guess the questions are: * Does the ABI stability address some historical real world problem with Python binary builds? (I guess possibly) * Is the existing solution of third parties building under e.g. Mingw as an option of last resort causing real world issues? It seems to work for a lot of people, although I personally avoid it. * Have other compiler vendors indicated they will change their ABI environment to match VS under this new stability guarantee? If not, then as yet there is no real world benefit here. * Has Python ever hit a showstopper release issue as a result of a bug in MSVC? (I guess probably not). * Will VS 14 be golden prior to Python 3.5's release? It would suck to rely on a beta compiler.. :) Sorry for dunking water on this, but I've recently spent a ton of time getting a Microsoft build environment running, and it seems possible a new compiler may not yet justify more effort if there is little tangible benefit. David ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Moving Python 3.5 on Windows to a new compiler
On Fri, Jun 06, 2014 at 10:49:24PM +0400, Brian Curtin wrote: > None of the options are particularly good, but yes, I think that's an > option we have to consider. We're supporting 2.7.x for 6 more years on > a compiler that is already 6 years old. Surely that is infinitely less desirable than simply bumping the minor version? David ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Moving Python 3.5 on Windows to a new compiler
On Sat, Jun 07, 2014 at 05:33:45AM +1000, Chris Angelico wrote: > > Is it really any difference in maintenance if you just stop applying > > updates to 2.7 and switch to 2.8? If 2.8 is really just 2.7 with a > > new compiler then there should be no functional difference between > > doing that and doing a 2.7.whatever except all of the tooling that > > relies on the compiler not to change in micro releases won’t > > suddenly break and freak out. > If the only difference between 2.7 and 2.8 is the compiler used on > Windows, what happens on Linux and other platforms? A Python 2.8 would > have to be materially different from Python 2.7, not just binarily > incompatible on one platform. Grrmph, that's fair. Perhaps a final alternative is simply continuing the 2.7 series with a stale compiler, as a kind of carrot on a stick to encourage users to upgrade? Gating 2.7 life on the natural decline of its supported compiler/related ecosystem seems somehow quite a gradual and natural demise.. :) David ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] namedtuple implementation grumble
On Sun, Jun 08, 2014 at 07:37:46PM +, dw+python-...@hmmz.org wrote: > cls = tuple(name, (_NamedTuple,), { Ugh, this should of course have been type(). David ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] namedtuple implementation grumble
On Sun, Jun 08, 2014 at 03:13:55PM -0400, Eric V. Smith wrote: > > The current implementation is also *really* easy to understand, > > while writing out the dynamic type creation explicitly would likely > > require much deeper knowledge of the type machinery to follow. > As proof that it's harder to understand, here's an example of that > dynamically creating functions and types: Probably I'm missing something, but there's a much simpler non-exec approach, something like: class _NamedTuple(...): ... def namedtuple(name, fields): cls = tuple(name, (_NamedTuple,), { '_fields': fields.split() }) for i, field_name in enumerate(cls._fields): prop = property(functools.partial(_NamedTuple.__getitem__, i) functools.partial(_NamedTuple.__setitem__, i)) setattr(cls, field_name, prop) return cls David ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] namedtuple implementation grumble
On Sun, Jun 08, 2014 at 05:27:41PM -0400, Eric V. Smith wrote: > How would you write _Namedtuple.__new__? Knew something must be missing :) Obviously it's possible, but not nearly as efficiently as reusing the argument parsing machinery as in the original implementation. I guess especially the kwargs implementation below would suck.. _undef = object() class _NamedTuple(...): def __new__(cls, *a, **kw): if kw: a = list(a) + ([_undef] * (len(self._fields)-len(a))) for k, v in kw.iteritems(): i = cls._name_id_map[k] if a[i] is not _undef: raise TypeError(...) a[i] = v if _undef not in a: return tuple.__new__(cls, a) raise TypeError(...) else: if len(a) == len(self._fields): return tuple.__new__(cls, a) raise TypeError(...) def namedtuple(name, fields): fields = fields.split() cls = type(name, (_NamedTuple,), { '_fields': fields, '_name_id_map': {k: i for i, k in enumerate(fields)} }) for i, field_name in enumerate(fields): getter = functools.partial(_NamedTuple.__getitem__, i) setattr(cls, field_name, property(getter)) return cls David ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com