Re: [Python-Dev] Another case for frozendict

2014-07-13 Thread dw+python-dev
On Sun, Jul 13, 2014 at 02:04:17PM +, Jason R. Coombs wrote:

> PEP-416 mentions a MappingProxyType, but that’s no help.

Well, it kindof is. By combining MappingProxyType and UserDict the
desired effect can be achieved concisely:

import collections
import types

class frozendict(collections.UserDict):
def __init__(self, d, **kw):
if d:
d = d.copy()
d.update(kw)
else:
d = kw
self.data = types.MappingProxyType(d)

_h = None
def __hash__(self):
if self._h is None:
self._h = sum(map(hash, self.data.items()))
return self._h

def __repr__(self):
return repr(dict(self))


> Although hashability is mentioned in the PEP under constraints, there are many
> use-cases that fall out of the ability to hash a dict, such as the one
> described above, which are not mentioned at all in use-cases for the PEP.

> If there’s ever any interest in reviving that PEP, I’m in favor of its
> implementation.

In its previous form, the PEP seemed more focused on some false
optimization capabilities of a read-only type, rather than as here, the
far more interesting hashability properties. It might warrant a fresh
PEP to more thoroughly investigate this angle.


David
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Another case for frozendict

2014-07-13 Thread dw+python-dev
On Sun, Jul 13, 2014 at 06:43:28PM +, dw+python-...@hmmz.org wrote:

> if d:
> d = d.copy()

To cope with iterables, "d = d.copy()" should have read "d = dict(d)".


David
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Another case for frozendict

2014-07-16 Thread dw+python-dev
On Wed, Jul 16, 2014 at 09:47:59AM -0400, R. David Murray wrote:

> It would be nice to be able to return a frozendict instead of having the
> overhead of building a new dict on each call.

There already is an in-between available both to Python and C:
PyDictProxy_New() / types.MappingProxyType. It's a one line change in
each case to return a temporary intermediary, using something like (C):
Py_INCREF(self->dict)
return self->dict;

To
return PyDictProxy_New(self->dict);

Or Python:
return self.dct

To
return types.MappingProxyType(self.dct)

Which is cheaper than a copy, and avoids having to audit every use of
self->dict to ensure the semantics required for a "frozendict" are
respected, i.e. no mutation occurs after the dict becomes visible to the
user, and potentially has __hash__ called.


> That by itself might not be enough reason.  But, if the user wants to
> use the data in modified form elsewhere, they would then have to
> construct a new regular dict out of it, making the decision to vary
> the data from what matches the state of the object it came from an
> explicit one.  That seems to fit the Python zen ("explicit is better
> than implicit").
> 
> So I'm changing my mind, and do consider this a valid use case, even
> absent the crash.

Avoiding crashes seems a better use for a read-only proxy, rather than a
hashable immutable type.


David
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] cStringIO vs io.BytesIO

2014-07-16 Thread dw+python-dev
On Thu, Jul 17, 2014 at 03:44:23AM +0600, Mikhail Korobov wrote:

> So making code 3.x compatible by ditching cStringIO can cause a serious
> performance/memory  regressions. One can change the code to build the data
> using BytesIO (without creating bytes objects in the first place), but that is
> not always possible or convenient.
> 
> I believe this problem affects tornado (https://github.com/tornadoweb/tornado/
> Do you know if there a workaround? Maybe there is some stdlib part that I'm
> missing, or a module on PyPI? It is not that hard to write an own wrapper that
> won't do copies (or to port [c]StringIO to 3.x), but I wonder if there is an
> existing solution or plans to fix it in Python itself - this BytesIO use case
> looks quite important.

Regarding a fix, the problem seems mostly that the StringI/StringO
specializations were removed, and the new implementation is basically
just a StringO.

At a small cost to memory, it is easy to add a Py_buffer source and
flags variable to the bytesio struct, with the buffers initially setup
for reading, and if a mutation method is called, check for a
copy-on-write flag, duplicate the source object into private memory,
then continue operating as it does now.

Attached is a (rough) patch implementing this, feel free to try it with
hg tip.

[23:03:44 k2!124 cpython] cat i.py
import io
buf = b'x' * (1048576 * 16)
def x():
io.BytesIO(buf)

[23:03:51 k2!125 cpython] ./python -m timeit  -s 'import i' 'i.x()'
100 loops, best of 3: 2.9 msec per loop

[23:03:57 k2!126 cpython] ./python-cow -m timeit  -s 'import i' 'i.x()'
100 loops, best of 3: 0.364 usec per loop


David



diff --git a/Modules/_io/bytesio.c b/Modules/_io/bytesio.c
--- a/Modules/_io/bytesio.c
+++ b/Modules/_io/bytesio.c
@@ -2,6 +2,12 @@
 #include "structmember.h"   /* for offsetof() */
 #include "_iomodule.h"
 
+enum io_flags {
+/* initvalue describes a borrowed buffer we cannot modify and must later
+ * release */
+IO_SHARED = 1
+};
+
 typedef struct {
 PyObject_HEAD
 char *buf;
@@ -11,6 +17,10 @@
 PyObject *dict;
 PyObject *weakreflist;
 Py_ssize_t exports;
+Py_buffer initvalue;
+/* If IO_SHARED, indicates PyBuffer_release(initvalue) required, and that
+ * we don't own buf. */
+enum io_flags flags;
 } bytesio;
 
 typedef struct {
@@ -33,6 +43,47 @@
 return NULL; \
 }
 
+/* Unshare our buffer in preparation for writing, in the case that an
+ * initvalue object was provided, and we're currently borrowing its buffer.
+ * size indicates the total reserved buffer size allocated as part of
+ * unsharing, to avoid a potentially redundant allocation in the subsequent
+ * mutation.
+ */
+static int
+unshare(bytesio *self, size_t size)
+{
+Py_ssize_t new_size = size;
+Py_ssize_t copy_size = size;
+char *new_buf;
+
+/* Do nothing if buffer wasn't shared */
+if (! (self->flags & IO_SHARED)) {
+return 0;
+}
+
+/* If hint provided, adjust our new buffer size and truncate the amount of
+ * source buffer we copy as necessary. */
+if (size > copy_size) {
+copy_size = size;
+}
+
+/* Allocate or fail. */
+new_buf = (char *)PyMem_Malloc(new_size);
+if (new_buf == NULL) {
+PyErr_NoMemory();
+return -1;
+}
+
+/* Copy the (possibly now truncated) source string to the new buffer, and
+ * forget any reference used to keep the source buffer alive. */
+memcpy(new_buf, self->buf, copy_size);
+PyBuffer_Release(&self->initvalue);
+self->flags &= ~IO_SHARED;
+self->buf = new_buf;
+self->buf_size = new_size;
+self->string_size = (Py_ssize_t) copy_size;
+return 0;
+}
 
 /* Internal routine to get a line from the buffer of a BytesIO
object. Returns the length between the current position to the
@@ -125,11 +176,18 @@
 static Py_ssize_t
 write_bytes(bytesio *self, const char *bytes, Py_ssize_t len)
 {
+size_t desired;
+
 assert(self->buf != NULL);
 assert(self->pos >= 0);
 assert(len >= 0);
 
-if ((size_t)self->pos + len > self->buf_size) {
+desired = (size_t)self->pos + len;
+if (unshare(self, desired)) {
+return -1;
+}
+
+if (desired > self->buf_size) {
 if (resize_buffer(self, (size_t)self->pos + len) < 0)
 return -1;
 }
@@ -502,6 +560,10 @@
 return NULL;
 }
 
+if (unshare(self, size)) {
+return NULL;
+}
+
 if (size < self->string_size) {
 self->string_size = size;
 if (resize_buffer(self, size) < 0)
@@ -655,10 +717,13 @@
 static PyObject *
 bytesio_close(bytesio *self)
 {
-if (self->buf != NULL) {
+if (self->flags & IO_SHARED) {
+PyBuffer_Release(&self->initvalue);
+self->flags &= ~IO_SHARED;
+} else if (self->buf != NULL) {
 PyMem_Free(self->buf);
-self->buf = NULL;
 }
+self->buf = NULL;
 Py_RETURN_NONE;
 }
 
@@ -788,10 +853,17 

Re: [Python-Dev] cStringIO vs io.BytesIO

2014-07-16 Thread dw+python-dev
It's worth note that a natural extension of this is to do something very
similar on the write side: instead of generating a temporary private
heap allocation, generate (and freely resize) a private PyBytes object
until it is exposed to the user, at which point, _getvalue() returns it,
and converts its into an IO_SHARED buffer.

That way another copy is avoided in the common case of building a
string, calling getvalue() once, then discarding the IO object.


David

On Wed, Jul 16, 2014 at 11:07:54PM +, dw+python-...@hmmz.org wrote:
> On Thu, Jul 17, 2014 at 03:44:23AM +0600, Mikhail Korobov wrote:
> 
> > So making code 3.x compatible by ditching cStringIO can cause a serious
> > performance/memory  regressions. One can change the code to build the data
> > using BytesIO (without creating bytes objects in the first place), but that 
> > is
> > not always possible or convenient.
> > 
> > I believe this problem affects tornado 
> > (https://github.com/tornadoweb/tornado/
> > Do you know if there a workaround? Maybe there is some stdlib part that I'm
> > missing, or a module on PyPI? It is not that hard to write an own wrapper 
> > that
> > won't do copies (or to port [c]StringIO to 3.x), but I wonder if there is an
> > existing solution or plans to fix it in Python itself - this BytesIO use 
> > case
> > looks quite important.
> 
> Regarding a fix, the problem seems mostly that the StringI/StringO
> specializations were removed, and the new implementation is basically
> just a StringO.
> 
> At a small cost to memory, it is easy to add a Py_buffer source and
> flags variable to the bytesio struct, with the buffers initially setup
> for reading, and if a mutation method is called, check for a
> copy-on-write flag, duplicate the source object into private memory,
> then continue operating as it does now.
> 
> Attached is a (rough) patch implementing this, feel free to try it with
> hg tip.
> 
> [23:03:44 k2!124 cpython] cat i.py
> import io
> buf = b'x' * (1048576 * 16)
> def x():
> io.BytesIO(buf)
> 
> [23:03:51 k2!125 cpython] ./python -m timeit  -s 'import i' 'i.x()'
> 100 loops, best of 3: 2.9 msec per loop
> 
> [23:03:57 k2!126 cpython] ./python-cow -m timeit  -s 'import i' 'i.x()'
> 100 loops, best of 3: 0.364 usec per loop
> 
> 
> David
> 
> 
> 
> diff --git a/Modules/_io/bytesio.c b/Modules/_io/bytesio.c
> --- a/Modules/_io/bytesio.c
> +++ b/Modules/_io/bytesio.c
> @@ -2,6 +2,12 @@
>  #include "structmember.h"   /* for offsetof() */
>  #include "_iomodule.h"
>  
> +enum io_flags {
> +/* initvalue describes a borrowed buffer we cannot modify and must later
> + * release */
> +IO_SHARED = 1
> +};
> +
>  typedef struct {
>  PyObject_HEAD
>  char *buf;
> @@ -11,6 +17,10 @@
>  PyObject *dict;
>  PyObject *weakreflist;
>  Py_ssize_t exports;
> +Py_buffer initvalue;
> +/* If IO_SHARED, indicates PyBuffer_release(initvalue) required, and that
> + * we don't own buf. */
> +enum io_flags flags;
>  } bytesio;
>  
>  typedef struct {
> @@ -33,6 +43,47 @@
>  return NULL; \
>  }
>  
> +/* Unshare our buffer in preparation for writing, in the case that an
> + * initvalue object was provided, and we're currently borrowing its buffer.
> + * size indicates the total reserved buffer size allocated as part of
> + * unsharing, to avoid a potentially redundant allocation in the subsequent
> + * mutation.
> + */
> +static int
> +unshare(bytesio *self, size_t size)
> +{
> +Py_ssize_t new_size = size;
> +Py_ssize_t copy_size = size;
> +char *new_buf;
> +
> +/* Do nothing if buffer wasn't shared */
> +if (! (self->flags & IO_SHARED)) {
> +return 0;
> +}
> +
> +/* If hint provided, adjust our new buffer size and truncate the amount 
> of
> + * source buffer we copy as necessary. */
> +if (size > copy_size) {
> +copy_size = size;
> +}
> +
> +/* Allocate or fail. */
> +new_buf = (char *)PyMem_Malloc(new_size);
> +if (new_buf == NULL) {
> +PyErr_NoMemory();
> +return -1;
> +}
> +
> +/* Copy the (possibly now truncated) source string to the new buffer, and
> + * forget any reference used to keep the source buffer alive. */
> +memcpy(new_buf, self->buf, copy_size);
> +PyBuffer_Release(&self->initvalue);
> +self->flags &= ~IO_SHARED;
> +self->buf = new_buf;
> +self->buf_size = new_size;
> +self->string_size = (Py_ssize_t) copy_size;
> +return 0;
> +}
>  
>  /* Internal routine to get a line from the buffer of a BytesIO
> object. Returns the length between the current position to the
> @@ -125,11 +176,18 @@
>  static Py_ssize_t
>  write_bytes(bytesio *self, const char *bytes, Py_ssize_t len)
>  {
> +size_t desired;
> +
>  assert(self->buf != NULL);
>  assert(self->pos >= 0);
>  assert(len >= 0);
>  
> -if ((size_t)self->pos + len > self->buf_size) {
> +desired = (size_t)self->pos

Re: [Python-Dev] cpython: Issue #22003: When initialized from a bytes object, io.BytesIO() now

2014-07-30 Thread dw+python-dev
Hi Serhiy,

At least conceptually, 15381 seems the better approach, but getting a
correct implementation may take more iterations than the (IMHO) simpler
change in 22003. For my tastes, the current 15381 implementation seems a
little too magical in relying on Py_REFCNT() as the sole indication that
a PyBytes can be mutated.

For the sake of haste, 22003 only addresses the specific regression
introduced in Python 3.x BytesIO, compared to 2.x StringI, where 3.x
lacked an equivalent no-copies specialization.


David
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] python process creation overhead

2014-05-12 Thread dw+python-dev
On Mon, May 12, 2014 at 04:22:52PM -0700, Gregory Szorc wrote:

> Why can't Python start as quickly as Perl or Ruby?

On my heavily abused Core 2 Macbook with 9 .pth files, 2.7 drops from
81ms startup to 20ms by specifying -S, which disables site.py.

Oblitering the .pth files immediately knocks 10ms off regular startup. I
guess the question isn't why Python is slower than perl, but what
aspects of site.py could be cached, reimplemented, or stripped out
entirely.  I'd personally love to see .pth support removed.


David
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Internal representation of strings and Micropython

2014-06-04 Thread dw+python-dev
On Wed, Jun 04, 2014 at 03:17:00PM +1000, Nick Coghlan wrote:

> There's a general expectation that indexing will be O(1) because all
> the builtin containers that support that syntax use it for O(1) lookup
> operations.

Depending on your definition of built in, there is at least one standard
library container that does not - collections.deque.

Given the specialized kinds of application this Python implementation is
targetted at, it seems UTF-8 is ideal considering the huge memory
savings resulting from the compressed representation, and the reduced
likelihood of there being any real need for serious text processing on
the device.

It is also unlikely to find software or libraries like Django or
Werkzeug running on a microcontroller, more likely all the Python code
would be custom, in which case, replacing string indexing with
iteration, or temporary conversion to a list is easily done.

In this context, while a fixed-width encoding may be the correct choice
it would also likely be the wrong choice.


David
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Moving Python 3.5 on Windows to a new compiler

2014-06-06 Thread dw+python-dev
On Fri, Jun 06, 2014 at 03:41:22PM +, Steve Dower wrote:

> [snip]

Speaking as a third party who aims to provide binary distributions for
recent Python releases on Windows, every new compiler introduces a
licensing and configuration headache. So I guess the questions are:

* Does the ABI stability address some historical real world problem with
  Python binary builds? (I guess possibly)

* Is the existing solution of third parties building under e.g. Mingw as
  an option of last resort causing real world issues? It seems to work
  for a lot of people, although I personally avoid it.

* Have other compiler vendors indicated they will change their ABI
  environment to match VS under this new stability guarantee? If not,
  then as yet there is no real world benefit here.

* Has Python ever hit a showstopper release issue as a result of a bug
  in MSVC? (I guess probably not).

* Will VS 14 be golden prior to Python 3.5's release? It would suck to
  rely on a beta compiler.. :)


Sorry for dunking water on this, but I've recently spent a ton of time
getting a Microsoft build environment running, and it seems possible a
new compiler may not yet justify more effort if there is little tangible
benefit.


David
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Moving Python 3.5 on Windows to a new compiler

2014-06-06 Thread dw+python-dev
On Fri, Jun 06, 2014 at 10:49:24PM +0400, Brian Curtin wrote:

> None of the options are particularly good, but yes, I think that's an
> option we have to consider. We're supporting 2.7.x for 6 more years on
> a compiler that is already 6 years old.

Surely that is infinitely less desirable than simply bumping the minor
version?


David
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Moving Python 3.5 on Windows to a new compiler

2014-06-06 Thread dw+python-dev
On Sat, Jun 07, 2014 at 05:33:45AM +1000, Chris Angelico wrote:

> > Is it really any difference in maintenance if you just stop applying
> > updates to 2.7 and switch to 2.8? If 2.8 is really just 2.7 with a
> > new compiler then there should be no functional difference between
> > doing that and doing a 2.7.whatever except all of the tooling that
> > relies on the compiler not to change in micro releases won’t
> > suddenly break and freak out.

> If the only difference between 2.7 and 2.8 is the compiler used on
> Windows, what happens on Linux and other platforms? A Python 2.8 would
> have to be materially different from Python 2.7, not just binarily
> incompatible on one platform.

Grrmph, that's fair. Perhaps a final alternative is simply continuing
the 2.7 series with a stale compiler, as a kind of carrot on a stick to
encourage users to upgrade? Gating 2.7 life on the natural decline of
its supported compiler/related ecosystem seems somehow quite a gradual
and natural demise.. :)


David
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] namedtuple implementation grumble

2014-06-08 Thread dw+python-dev
On Sun, Jun 08, 2014 at 07:37:46PM +, dw+python-...@hmmz.org wrote:

> cls = tuple(name, (_NamedTuple,), {

Ugh, this should of course have been type().


David
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] namedtuple implementation grumble

2014-06-08 Thread dw+python-dev
On Sun, Jun 08, 2014 at 03:13:55PM -0400, Eric V. Smith wrote:

> > The current implementation is also *really* easy to understand,
> > while writing out the dynamic type creation explicitly would likely
> > require much deeper knowledge of the type machinery to follow.

> As proof that it's harder to understand, here's an example of that
> dynamically creating functions and types:

Probably I'm missing something, but there's a much simpler non-exec
approach, something like:

class _NamedTuple(...):
...

def namedtuple(name, fields):
cls = tuple(name, (_NamedTuple,), {
'_fields': fields.split()
})
for i, field_name in enumerate(cls._fields):
prop = property(functools.partial(_NamedTuple.__getitem__, i)
functools.partial(_NamedTuple.__setitem__, i))
setattr(cls, field_name, prop)
return cls

David
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] namedtuple implementation grumble

2014-06-08 Thread dw+python-dev
On Sun, Jun 08, 2014 at 05:27:41PM -0400, Eric V. Smith wrote:

> How would you write _Namedtuple.__new__?

Knew something must be missing :)  Obviously it's possible, but not
nearly as efficiently as reusing the argument parsing machinery as in
the original implementation.

I guess especially the kwargs implementation below would suck..

_undef = object()

class _NamedTuple(...):
def __new__(cls, *a, **kw):
if kw:
a = list(a) + ([_undef] * (len(self._fields)-len(a)))
for k, v in kw.iteritems():
i = cls._name_id_map[k]
if a[i] is not _undef:
raise TypeError(...)
a[i] = v
if _undef not in a:
return tuple.__new__(cls, a)
raise TypeError(...)
else:
if len(a) == len(self._fields):
return tuple.__new__(cls, a)
raise TypeError(...)

def namedtuple(name, fields):
fields = fields.split()
cls = type(name, (_NamedTuple,), {
'_fields': fields,
'_name_id_map': {k: i for i, k in enumerate(fields)}
})
for i, field_name in enumerate(fields):
getter = functools.partial(_NamedTuple.__getitem__, i)
setattr(cls, field_name, property(getter))
return cls


David
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com