Re: [Python-Dev] PEP XXX: Compact ordered dict

2016-06-22 Thread INADA Naoki
FYI, Here is calculated size of each dict by len(d).
https://docs.google.com/spreadsheets/d/1nN5y6IsiJGdNxD7L7KBXmhdUyXjuRAQR_WbrS8zf6mA/edit?usp=sharing

On Tue, Jun 21, 2016 at 12:17 PM, Oleg Broytman  wrote:
> Hi!
>
> On Tue, Jun 21, 2016 at 11:14:39AM +0900, INADA Naoki 
>  wrote:
>> Here is my draft, but I haven't
>> posted it yet since
>> my English is much worse than C.
>> https://www.dropbox.com/s/s85n9b2309k03cq/pep-compact-dict.txt?dl=0
>
>It's good enough for a start (if a PEP is needed at all). If you push
> it to Github I'm sure they will come with pull requests.
>
> Oleg.
> --
>  Oleg Broytmanhttp://phdru.name/[email protected]
>Programmers don't die, they just GOSUB without RETURN.
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> https://mail.python.org/mailman/options/python-dev/songofacandy%40gmail.com



-- 
INADA Naoki  
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Why are class dictionaries not accessible?

2016-06-22 Thread Random832
The documentation states: """Objects such as modules and instances have
an updateable __dict__ attribute; however, other objects may have write
restrictions on their __dict__ attributes (for example, classes use a
dictproxy to prevent direct dictionary updates)."""

However, it's not clear from that *why* direct dictionary updates are
undesirable. This not only prevents you from getting a reference to the
real class dict (which is the apparent goal), but is also the
fundamental reason why you can't use a metaclass to put, say, an
OrderedDict in its place - because the type constructor has to copy the
dict that was used in class preparation into a new dict rather than
using the one that was actually returned by __prepare__.

[Also, the name of the type used for this is mappingproxy, not
dictproxy]
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] When to use EOFError?

2016-06-22 Thread Random832
On Tue, Jun 21, 2016, at 16:48, Serhiy Storchaka wrote:
> There is a design question. If you read file in some format or with some 
> protocol, and the data is ended unexpectedly, when to use general 
> EOFError exception and when to use format/protocol specific exception?
> 
> For example when load truncated pickle data, an unpickler can raise 
> EOFError, UnpicklingError, ValueError or AttributeError. It is possible 
> to avoid ValueError or AttributeError, but what exception should be 
> raised instead, EOFError or UnpicklingError? Maybe convert all EOFError 
> to UnpicklingError?

I think this is the most appropriate. If the calling code needs to know
the original reason it can find it in __cause__.

My instinct, though, (and I'm aware that others may not agree, but I
thought it was worth bringing up) is that loads should actually always
raise a ValueError, i.e. my mental model of loads is like:

def loads(s):
   f = BytesIO(s)
   try:
  return load(f)
   except UnpicklingError as e:
  raise ValueError from e
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why are class dictionaries not accessible?

2016-06-22 Thread Guido van Rossum
On Wed, Jun 22, 2016 at 7:17 AM, Random832  wrote:

> The documentation states: """Objects such as modules and instances have
> an updateable __dict__ attribute; however, other objects may have write
> restrictions on their __dict__ attributes (for example, classes use a
> dictproxy to prevent direct dictionary updates)."""
>
> However, it's not clear from that *why* direct dictionary updates are
> undesirable. This not only prevents you from getting a reference to the
> real class dict (which is the apparent goal), but is also the
> fundamental reason why you can't use a metaclass to put, say, an
> OrderedDict in its place - because the type constructor has to copy the
> dict that was used in class preparation into a new dict rather than
> using the one that was actually returned by __prepare__.
>
> [Also, the name of the type used for this is mappingproxy, not
> dictproxy]
>

This is done in order to force all mutations of the class dict to go
through attribute assignments on the class. The latter takes care of
updating the class struct, e.g. if you were to add an `__add__` method
dynamically it would update tp_as_number->nb_add. If you could modify the
dict object directly it would be more difficult to arrange for this side
effect.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] When to use EOFError?

2016-06-22 Thread André Malo
* Serhiy Storchaka wrote:

> There is a design question. If you read file in some format or with some
> protocol, and the data is ended unexpectedly, when to use general
> EOFError exception and when to use format/protocol specific exception?
>
> For example when load truncated pickle data, an unpickler can raise
> EOFError, UnpicklingError, ValueError or AttributeError. It is possible
> to avoid ValueError or AttributeError, but what exception should be
> raised instead, EOFError or UnpicklingError? Maybe convert all EOFError
> to UnpicklingError? Or all UnpicklingError caused by unexpectedly ended
> input to EOFError? Or raise EOFError if the input is ended after
> completed opcode, and UnpicklingError if it contains truncated opcode?

I often concatenate multiple pickles into one file. When reading them, it 
works like this:

try:
while True:
yield pickle.load(fp)
except EOFError:
pass

In this case the truncation is not really unexpected. Maybe it should 
distinguish between truncated-in-the-middle and truncated-because-empty.

(Same goes for marshal)

Cheers,
-- 
Real programmers confuse Christmas and Halloween because
DEC 25 = OCT 31.  -- Unknown

  (found in ssl_engine_mutex.c)
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 487: Simpler customization of class creation

2016-06-22 Thread Eric Fahlgren
On Wed 2016-06-22 Eric Snow [mailto:[email protected]] wrote:
> The problem I have with this is that it still doesn't give any strong 
> relationship with the class definition.
> Certainly in most cases it will amount to the same thing.  However, there is 
> no way to know if cls.__dict__ 
> represents the class definition or not.  You also lose knowing whether or not 
> a class came from a definition
> (or acts as though it did).  Finally, __definition_order__ makes the 
> relationship with the definition order clear,
>  whereas cls.__dict__ does not.
> Instead of being an obvious tool, with cls.__dict__ that relationship would 
> be tucked away where only a
>  few folks with deep knowledge of Python would be in a position to take 
> advantage.

I see this as being grossly/loosely analogous to traversing __bases__ vs 
calling mro(), so I feel the
same rationale applies to adding __definition_order__ as mro.

Eric

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Idea: more compact, interned string key only dict for namespace.

2016-06-22 Thread INADA Naoki
As my last email, compact ordered dict can't preserve
insertion order of key sharing dict (PEP 412).

I'm thinking about deprecating key shared dict for now.

Instead, my new idea is introducing more compact dict
specialized for namespace.

If BDFL (or BDFL delegate) likes this idea, I'll take another
one week to implement this.


Background


* Most keys of namespace dict are string.
* Calculating hash of string is cheap (one memory access, thanks for cache).
* And most keys are interned already.


Design
--

Instead of normal PyDictKeyEntry, use PyInternedKeyEntry like this.

typedef struct {
// no me_hash
PyObject *me_key, *me_value;
} PyInternedKeyEntry;


insertdict() interns key if it's unicode, otherwise it converts dict to
normal compact ordered dict.

lookdict_interned() compares only pointer (doesn't call unicode_eq())
when searching key is interned.

And add new internal API to create interned key only dict.

PyDictObject* _PyDict_NewForNamespace();


Memory usage


on amd64 arch.

key-sharing dict:

* 96 bytes for ~3 items
* 128 bytes for 4~5 items.

compact dict:

* 224 bytes for ~5 items.

(232 bytes when keep supporting key-shared dict)

interned key only dict:

* 184 bytes for ~5 items


Note
--

Interned key only dict is still larger than key-shared dict.

But it can be used for more purpose.  It can be used for interning string
for example.  It can be used to kwargs dict when all keys are interned already.

If we provide _PyDict_NewForNamespace to extension modules,
json decoder can have option to use this, too.


-- 
INADA Naoki  
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Idea: more compact, interned string key only dict for namespace.

2016-06-22 Thread Mark Shannon

Hi all,

I think we need some more data before going any further reimplementing 
dicts.


What I would like to know is, across a set of Python programs (ideally a 
representative set), what the proportion of dicts in memory at any one 
time are:


a) instance dicts
b) other namespace dicts (classes and modules)
c) data dicts with all string keys
d) other data dicts
e) keyword argument dicts (I'm guessing this is vanishingly small)

I would expect that (a) far exceeds (b) and depending on the application 
also considerably exceeds (c), but I would like some real data.
From that we can compute the (approximate) memory costs of the 
competing designs.


As an aside, if anyone is really keen to save memory, then removing the 
cycle GC header is the thing to do.

That uses 24 bytes per object and *half* of all live objects have it.
And don't forget that any Python object is really two objects, the 
object and its dict, so that is 48 extra bytes every time you create a 
new object.



Cheers,
Mark.

On 22/06/16 10:23, INADA Naoki wrote:

As my last email, compact ordered dict can't preserve
insertion order of key sharing dict (PEP 412).

I'm thinking about deprecating key shared dict for now.

Instead, my new idea is introducing more compact dict
specialized for namespace.

If BDFL (or BDFL delegate) likes this idea, I'll take another
one week to implement this.


Background


* Most keys of namespace dict are string.
* Calculating hash of string is cheap (one memory access, thanks for cache).
* And most keys are interned already.


Design
--

Instead of normal PyDictKeyEntry, use PyInternedKeyEntry like this.

typedef struct {
 // no me_hash
 PyObject *me_key, *me_value;
} PyInternedKeyEntry;


insertdict() interns key if it's unicode, otherwise it converts dict to
normal compact ordered dict.

lookdict_interned() compares only pointer (doesn't call unicode_eq())
when searching key is interned.

And add new internal API to create interned key only dict.

PyDictObject* _PyDict_NewForNamespace();


Memory usage


on amd64 arch.

key-sharing dict:

* 96 bytes for ~3 items
* 128 bytes for 4~5 items.

compact dict:

* 224 bytes for ~5 items.

(232 bytes when keep supporting key-shared dict)

interned key only dict:

* 184 bytes for ~5 items


Note
--

Interned key only dict is still larger than key-shared dict.

But it can be used for more purpose.  It can be used for interning string
for example.  It can be used to kwargs dict when all keys are interned already.

If we provide _PyDict_NewForNamespace to extension modules,
json decoder can have option to use this, too.



___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Idea: more compact, interned string key only dict for namespace.

2016-06-22 Thread INADA Naoki
> Memory usage
> 
>
> on amd64 arch.
>
> key-sharing dict:
>
> * 96 bytes for ~3 items
> * 128 bytes for 4~5 items.

Note: There are another shared key.

* 128 bytes for ~3 items
* 224 bytes for 4~5 items

So, let S = how many instances shares the key,

* 90 + (96 / S) bytes for ~3 items
* 128 + (224 / S) bytes for 4~5 items

>
> compact dict:
>
> * 224 bytes for ~5 items.
>
> (232 bytes when keep supporting key-shared dict)
>
> interned key only dict:
>
> * 184 bytes for ~5 items
>
>
> Note
> --
>
> Interned key only dict is still larger than key-shared dict.
>
> But it can be used for more purpose.  It can be used for interning string
> for example.  It can be used to kwargs dict when all keys are interned 
> already.
>
> If we provide _PyDict_NewForNamespace to extension modules,
> json decoder can have option to use this, too.
>
>
> --
> INADA Naoki  



-- 
INADA Naoki  
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Idea: more compact, interned string key only dict for namespace.

2016-06-22 Thread INADA Naoki
Hi, Mark.  Thank you for reply.

On Thu, Jun 23, 2016 at 10:30 AM, Mark Shannon  wrote:
> Hi all,
>
> I think we need some more data before going any further reimplementing
> dicts.
>
> What I would like to know is, across a set of Python programs (ideally a
> representative set), what the proportion of dicts in memory at any one time
> are:
>
> a) instance dicts
> b) other namespace dicts (classes and modules)
> c) data dicts with all string keys
> d) other data dicts
> e) keyword argument dicts (I'm guessing this is vanishingly small)
>
> I would expect that (a) far exceeds (b) and depending on the application
> also considerably exceeds (c), but I would like some real data.
> From that we can compute the (approximate) memory costs of the competing
> designs.

I think you're right.
But, I don't have clear idea about how to do it.
Is there existing effort about collecting stats of dict?

>
> As an aside, if anyone is really keen to save memory, then removing the
> cycle GC header is the thing to do.
> That uses 24 bytes per object and *half* of all live objects have it.
> And don't forget that any Python object is really two objects, the object
> and its dict, so that is 48 extra bytes every time you create a new object.
>

It's great idea.  But I can't do it before Python 3.6.

My main concern is not saving memory, ordered dict for **kwargs without
significant overhead.

If "orderd, except key sharing dict" is acceptable, no problem.
Key sharing compact dict is smaller than current key sharing dict of Python 3.5
for most cases.
https://docs.google.com/spreadsheets/d/1nN5y6IsiJGdNxD7L7KBXmhdUyXjuRAQR_WbrS8zf6mA/edit#gid=0

Regards,

--
INADA Naoki  
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com