Antoine Pitrou added the comment:
So, there are several aspects to this request:
- add a __slots__: this is Serhiy's patch
- add some weak interning of IP addresses: I believe it is ok to add it as an
optimization, provided it doesn't otherwise change the API or the classes'
behaviour
Seth:
Serhiy Storchaka added the comment:
Here is a patch which makes IPv4Address and IPv6Address more lightweight (for
other classes the size is not so important). It makes _version the class
attribute and adds __slots__.
--
keywords: +patch
resolution: wont fix -
stage: - patch review
Eric Snow added the comment:
@sbromberger, there's no need for your own package. Just use something like
this:
_addr_cache = {}
def ipaddr(addr):
try:
return _addr_cache[addr]
except KeyError:
_addr_cache[addr] = ipaddress.ipaddress(addr)
return
Josh Rosenberg added the comment:
Serhiy: I believe you need to add a bunch of __slots__ = () to various base
classes in the module, even though they lack member variables. Every link in
the inheritance chain must declare __slots__, or the child class will have
__dict__ and __weakref__.
New submission from Seth Bromberger:
ipaddress.ip_address instances should be flyweight. There's really no reason to
make them mutable:
a = ipaddress.ip_address(10.1.2.3)
b = ipaddress.ip_address(10.1.2.3)
id(a)
140066533772368
id(b)
140066504622544
Especially with IPv6 and large numbers
Seth Bromberger added the comment:
Confirmed on 3.4.0; likely exists in previous versions as well.
--
versions: +Python 3.4 -Python 3.6
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23103
Changes by Berker Peksag berker.pek...@gmail.com:
--
nosy: +ncoghlan, pmoody
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23103
___
___
Josh Rosenberg added the comment:
What is your proposal? WeakValueDictionary mapping raw bytes object to single
instance of ipaddress that is queried from ipaddress's __new__? No built-in has
quite that extensive a level of caching and aggressive deduplication to my
knowledge.
--
Seth Bromberger added the comment:
What is your proposal? WeakValueDictionary mapping raw bytes object to single
instance of ipaddress that is queried from ipaddress's __new__? No built-in
has quite that extensive a level of caching and aggressive deduplication to my
knowledge.
I was
Serhiy Storchaka added the comment:
IP address instances already are immutable and flyweight. There are no mutating
methods. And a number of address attributes (packed, is_loopback, etc) are
calculated on fly from integer representation.
But IP address objects can be lighter.
1) Use
Changes by R. David Murray rdmur...@bitdance.com:
--
nosy: +r.david.murray
___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue23103
___
___
Seth Bromberger added the comment:
1) However:
b = ipaddress.IPv4Address('1.2.3.4')
a = ipaddress.IPv4Address('1.2.3.4')
id(a)
4428380928
id(b)
4428381768
a == b
True
b._ip += 6
id(b)
4428381768
b
IPv4Address('1.2.3.10')
2) Isn’t _version already a class attribute? It’s set in
R. David Murray added the comment:
If ipaddress objects are already immutable, it seems to me that having a
registry that makes it so there is only one instance per ip value isn't a crazy
idea. I've written such code in other contexts and it's not complicated. We
have caching of identifier
Seth Bromberger added the comment:
The opposite argument is that it may be better left up to the application that
has to handle lots of ips to do the caching according to what it knows to be
an optimum pattern.
I'd agree with you wholeheartedly if ipaddress wasn't part of stdlib, but the
Raymond Hettinger added the comment:
Adding __slots__ is reasonable.
Adding caching to fold together repeated addresses is dubious (it comes with
its own space and time cost but won't benefit all users) and, as David said,
the decision on how to handle this is up to the user. Also, it is
Josh Rosenberg added the comment:
So, just to be clear, checking the implementation (as of 3.4):
1. All the ipaddress classes leave __slots__ unset. So the overhead is actually
higher than 56 bytes per instance; the __dict__ for an IPv4Address (ignoring
the actual keys and values in the dict,
Marc-Andre Lemburg added the comment:
Josh, you might want to have a look at the lightning talk on this page and the
associated slides...
http://www.egenix.com/library/presentations/PyCon-UK-2014-When-performance-matters/
After having done the tests, using __slots__ is what I consider the
Seth Bromberger added the comment:
As a test, I tried the following (taken mostly from
http://codesnipers.com/?q=python-flyweights):
class Foo(object):
_Foo = weakref.WeakValueDictionary()
def __new__(cls, addr):
obj = Foo._Foo.get(addr, None)
if obj is None:
Seth Bromberger added the comment:
Sorry for the additional followup, but I re-ran the test with approximately
real-world parameters. In this second test, I created 10 million Foo()
instances with random values from 1-2000 (using random.randint). This
corresponds to 2000 hosts generating 10
Josh Rosenberg added the comment:
Marc-Andre: Oh, I know. I proselytize to coworkers on the virtues of using
__slots__ for classes that will have many instances created (particularly since
entirely too much of our stuff is still Py2.7, so we don't have the free
savings from shared key
Seth Bromberger added the comment:
I'm just pointing out that if he thinks 56 bytes per object is too large, he's
going to be disappointed with what Python has to offer. General purpose
user-defined Python objects don't optimize for low memory usage, and even
__slots__ only gets you so far.
21 matches
Mail list logo