Re: [Python-Dev] PEP XXX: Compact ordered dict
FYI, Here is calculated size of each dict by len(d). https://docs.google.com/spreadsheets/d/1nN5y6IsiJGdNxD7L7KBXmhdUyXjuRAQR_WbrS8zf6mA/edit?usp=sharing On Tue, Jun 21, 2016 at 12:17 PM, Oleg Broytman wrote: > Hi! > > On Tue, Jun 21, 2016 at 11:14:39AM +0900, INADA Naoki > wrote: >> Here is my draft, but I haven't >> posted it yet since >> my English is much worse than C. >> https://www.dropbox.com/s/s85n9b2309k03cq/pep-compact-dict.txt?dl=0 > >It's good enough for a start (if a PEP is needed at all). If you push > it to Github I'm sure they will come with pull requests. > > Oleg. > -- > Oleg Broytmanhttp://phdru.name/[email protected] >Programmers don't die, they just GOSUB without RETURN. > ___ > Python-Dev mailing list > [email protected] > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/songofacandy%40gmail.com -- INADA Naoki ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Why are class dictionaries not accessible?
The documentation states: """Objects such as modules and instances have an updateable __dict__ attribute; however, other objects may have write restrictions on their __dict__ attributes (for example, classes use a dictproxy to prevent direct dictionary updates).""" However, it's not clear from that *why* direct dictionary updates are undesirable. This not only prevents you from getting a reference to the real class dict (which is the apparent goal), but is also the fundamental reason why you can't use a metaclass to put, say, an OrderedDict in its place - because the type constructor has to copy the dict that was used in class preparation into a new dict rather than using the one that was actually returned by __prepare__. [Also, the name of the type used for this is mappingproxy, not dictproxy] ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] When to use EOFError?
On Tue, Jun 21, 2016, at 16:48, Serhiy Storchaka wrote: > There is a design question. If you read file in some format or with some > protocol, and the data is ended unexpectedly, when to use general > EOFError exception and when to use format/protocol specific exception? > > For example when load truncated pickle data, an unpickler can raise > EOFError, UnpicklingError, ValueError or AttributeError. It is possible > to avoid ValueError or AttributeError, but what exception should be > raised instead, EOFError or UnpicklingError? Maybe convert all EOFError > to UnpicklingError? I think this is the most appropriate. If the calling code needs to know the original reason it can find it in __cause__. My instinct, though, (and I'm aware that others may not agree, but I thought it was worth bringing up) is that loads should actually always raise a ValueError, i.e. my mental model of loads is like: def loads(s): f = BytesIO(s) try: return load(f) except UnpicklingError as e: raise ValueError from e ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Why are class dictionaries not accessible?
On Wed, Jun 22, 2016 at 7:17 AM, Random832 wrote: > The documentation states: """Objects such as modules and instances have > an updateable __dict__ attribute; however, other objects may have write > restrictions on their __dict__ attributes (for example, classes use a > dictproxy to prevent direct dictionary updates).""" > > However, it's not clear from that *why* direct dictionary updates are > undesirable. This not only prevents you from getting a reference to the > real class dict (which is the apparent goal), but is also the > fundamental reason why you can't use a metaclass to put, say, an > OrderedDict in its place - because the type constructor has to copy the > dict that was used in class preparation into a new dict rather than > using the one that was actually returned by __prepare__. > > [Also, the name of the type used for this is mappingproxy, not > dictproxy] > This is done in order to force all mutations of the class dict to go through attribute assignments on the class. The latter takes care of updating the class struct, e.g. if you were to add an `__add__` method dynamically it would update tp_as_number->nb_add. If you could modify the dict object directly it would be more difficult to arrange for this side effect. -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] When to use EOFError?
* Serhiy Storchaka wrote: > There is a design question. If you read file in some format or with some > protocol, and the data is ended unexpectedly, when to use general > EOFError exception and when to use format/protocol specific exception? > > For example when load truncated pickle data, an unpickler can raise > EOFError, UnpicklingError, ValueError or AttributeError. It is possible > to avoid ValueError or AttributeError, but what exception should be > raised instead, EOFError or UnpicklingError? Maybe convert all EOFError > to UnpicklingError? Or all UnpicklingError caused by unexpectedly ended > input to EOFError? Or raise EOFError if the input is ended after > completed opcode, and UnpicklingError if it contains truncated opcode? I often concatenate multiple pickles into one file. When reading them, it works like this: try: while True: yield pickle.load(fp) except EOFError: pass In this case the truncation is not really unexpected. Maybe it should distinguish between truncated-in-the-middle and truncated-because-empty. (Same goes for marshal) Cheers, -- Real programmers confuse Christmas and Halloween because DEC 25 = OCT 31. -- Unknown (found in ssl_engine_mutex.c) ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 487: Simpler customization of class creation
On Wed 2016-06-22 Eric Snow [mailto:[email protected]] wrote: > The problem I have with this is that it still doesn't give any strong > relationship with the class definition. > Certainly in most cases it will amount to the same thing. However, there is > no way to know if cls.__dict__ > represents the class definition or not. You also lose knowing whether or not > a class came from a definition > (or acts as though it did). Finally, __definition_order__ makes the > relationship with the definition order clear, > whereas cls.__dict__ does not. > Instead of being an obvious tool, with cls.__dict__ that relationship would > be tucked away where only a > few folks with deep knowledge of Python would be in a position to take > advantage. I see this as being grossly/loosely analogous to traversing __bases__ vs calling mro(), so I feel the same rationale applies to adding __definition_order__ as mro. Eric ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Idea: more compact, interned string key only dict for namespace.
As my last email, compact ordered dict can't preserve
insertion order of key sharing dict (PEP 412).
I'm thinking about deprecating key shared dict for now.
Instead, my new idea is introducing more compact dict
specialized for namespace.
If BDFL (or BDFL delegate) likes this idea, I'll take another
one week to implement this.
Background
* Most keys of namespace dict are string.
* Calculating hash of string is cheap (one memory access, thanks for cache).
* And most keys are interned already.
Design
--
Instead of normal PyDictKeyEntry, use PyInternedKeyEntry like this.
typedef struct {
// no me_hash
PyObject *me_key, *me_value;
} PyInternedKeyEntry;
insertdict() interns key if it's unicode, otherwise it converts dict to
normal compact ordered dict.
lookdict_interned() compares only pointer (doesn't call unicode_eq())
when searching key is interned.
And add new internal API to create interned key only dict.
PyDictObject* _PyDict_NewForNamespace();
Memory usage
on amd64 arch.
key-sharing dict:
* 96 bytes for ~3 items
* 128 bytes for 4~5 items.
compact dict:
* 224 bytes for ~5 items.
(232 bytes when keep supporting key-shared dict)
interned key only dict:
* 184 bytes for ~5 items
Note
--
Interned key only dict is still larger than key-shared dict.
But it can be used for more purpose. It can be used for interning string
for example. It can be used to kwargs dict when all keys are interned already.
If we provide _PyDict_NewForNamespace to extension modules,
json decoder can have option to use this, too.
--
INADA Naoki
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Idea: more compact, interned string key only dict for namespace.
Hi all,
I think we need some more data before going any further reimplementing
dicts.
What I would like to know is, across a set of Python programs (ideally a
representative set), what the proportion of dicts in memory at any one
time are:
a) instance dicts
b) other namespace dicts (classes and modules)
c) data dicts with all string keys
d) other data dicts
e) keyword argument dicts (I'm guessing this is vanishingly small)
I would expect that (a) far exceeds (b) and depending on the application
also considerably exceeds (c), but I would like some real data.
From that we can compute the (approximate) memory costs of the
competing designs.
As an aside, if anyone is really keen to save memory, then removing the
cycle GC header is the thing to do.
That uses 24 bytes per object and *half* of all live objects have it.
And don't forget that any Python object is really two objects, the
object and its dict, so that is 48 extra bytes every time you create a
new object.
Cheers,
Mark.
On 22/06/16 10:23, INADA Naoki wrote:
As my last email, compact ordered dict can't preserve
insertion order of key sharing dict (PEP 412).
I'm thinking about deprecating key shared dict for now.
Instead, my new idea is introducing more compact dict
specialized for namespace.
If BDFL (or BDFL delegate) likes this idea, I'll take another
one week to implement this.
Background
* Most keys of namespace dict are string.
* Calculating hash of string is cheap (one memory access, thanks for cache).
* And most keys are interned already.
Design
--
Instead of normal PyDictKeyEntry, use PyInternedKeyEntry like this.
typedef struct {
// no me_hash
PyObject *me_key, *me_value;
} PyInternedKeyEntry;
insertdict() interns key if it's unicode, otherwise it converts dict to
normal compact ordered dict.
lookdict_interned() compares only pointer (doesn't call unicode_eq())
when searching key is interned.
And add new internal API to create interned key only dict.
PyDictObject* _PyDict_NewForNamespace();
Memory usage
on amd64 arch.
key-sharing dict:
* 96 bytes for ~3 items
* 128 bytes for 4~5 items.
compact dict:
* 224 bytes for ~5 items.
(232 bytes when keep supporting key-shared dict)
interned key only dict:
* 184 bytes for ~5 items
Note
--
Interned key only dict is still larger than key-shared dict.
But it can be used for more purpose. It can be used for interning string
for example. It can be used to kwargs dict when all keys are interned already.
If we provide _PyDict_NewForNamespace to extension modules,
json decoder can have option to use this, too.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Idea: more compact, interned string key only dict for namespace.
> Memory usage > > > on amd64 arch. > > key-sharing dict: > > * 96 bytes for ~3 items > * 128 bytes for 4~5 items. Note: There are another shared key. * 128 bytes for ~3 items * 224 bytes for 4~5 items So, let S = how many instances shares the key, * 90 + (96 / S) bytes for ~3 items * 128 + (224 / S) bytes for 4~5 items > > compact dict: > > * 224 bytes for ~5 items. > > (232 bytes when keep supporting key-shared dict) > > interned key only dict: > > * 184 bytes for ~5 items > > > Note > -- > > Interned key only dict is still larger than key-shared dict. > > But it can be used for more purpose. It can be used for interning string > for example. It can be used to kwargs dict when all keys are interned > already. > > If we provide _PyDict_NewForNamespace to extension modules, > json decoder can have option to use this, too. > > > -- > INADA Naoki -- INADA Naoki ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Idea: more compact, interned string key only dict for namespace.
Hi, Mark. Thank you for reply. On Thu, Jun 23, 2016 at 10:30 AM, Mark Shannon wrote: > Hi all, > > I think we need some more data before going any further reimplementing > dicts. > > What I would like to know is, across a set of Python programs (ideally a > representative set), what the proportion of dicts in memory at any one time > are: > > a) instance dicts > b) other namespace dicts (classes and modules) > c) data dicts with all string keys > d) other data dicts > e) keyword argument dicts (I'm guessing this is vanishingly small) > > I would expect that (a) far exceeds (b) and depending on the application > also considerably exceeds (c), but I would like some real data. > From that we can compute the (approximate) memory costs of the competing > designs. I think you're right. But, I don't have clear idea about how to do it. Is there existing effort about collecting stats of dict? > > As an aside, if anyone is really keen to save memory, then removing the > cycle GC header is the thing to do. > That uses 24 bytes per object and *half* of all live objects have it. > And don't forget that any Python object is really two objects, the object > and its dict, so that is 48 extra bytes every time you create a new object. > It's great idea. But I can't do it before Python 3.6. My main concern is not saving memory, ordered dict for **kwargs without significant overhead. If "orderd, except key sharing dict" is acceptable, no problem. Key sharing compact dict is smaller than current key sharing dict of Python 3.5 for most cases. https://docs.google.com/spreadsheets/d/1nN5y6IsiJGdNxD7L7KBXmhdUyXjuRAQR_WbrS8zf6mA/edit#gid=0 Regards, -- INADA Naoki ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
