It's pretty wasteful to use a dynamic storage dictionary to hold the data
of a "struct-like data container".

Users can currently add `__slots__` manually to your `@dataclass` class,
but it means you can no longer use default values, and the manual typing
gets very tedious.

I compared the RAM usage and benchmarked the popular attrs library vs
dataclass, and saw the following result: Slots win heavily in the memory
usage department, regardless of whether you use dataclass or attrs. And
dataclass with manually written slots use 8 bytes less than
attrs-with-slots (static number, does not change based on how many fields
the class has). But dataclass loses with its lack of features, lack of
default values if slots are used, and tedious way to write slots manually
(see class "D").

Here are the numbers in bytes per-instance for classes:
```
attrs size 512
attrs-with-slots size 200
dataclass size 512
dataclass-with-slots size 192
```

As for data access benchmarks: The result varied too much between runs to
draw any conclusions except to say that slots was slightly faster than
dictionary-based storage. And that there's no real difference between the
dataclass and attrs libraries in access-speed.

Here is the full benchmark code:

```
import attr
from dataclasses import dataclass
from pympler import asizeof
import time

# every additional field adds 88 bytes
@attr.s
class A:
    a = attr.ib(type=int, default=0)
    b = attr.ib(type=int, default=4)
    c = attr.ib(type=int, default=2)
    d = attr.ib(type=int, default=8)

# every additional field adds 40 bytes
@attr.s(slots=True)
class B:
    a = attr.ib(type=int, default=0)
    b = attr.ib(type=int, default=4)
    c = attr.ib(type=int, default=2)
    d = attr.ib(type=int, default=8)

# every additional field adds 88 bytes
@dataclass
class C:
    a: int = 0
    b: int = 4
    c: int = 2
    d: int = 8

# every additional field adds 40 bytes
@dataclass
class D:
    __slots__ = {"a", "b", "c", "d"}
    a: int
    b: int
    c: int
    d: int

Ainst = A()
Binst = B()
Cinst = C()
Dinst = D(0,4,2,8)

print("attrs size", asizeof.asizeof(Ainst)) # 512 bytes

print("attrs-with-slots size", asizeof.asizeof(Binst)) # 200 bytes

print("dataclass size", asizeof.asizeof(Cinst)) # 512 bytes

print("dataclass-with-slots size", asizeof.asizeof(Dinst)) # 192 bytes

s = time.perf_counter()
for i in range(0,250000000):
    x = Ainst.a
elapsed = time.perf_counter() - s
print("elapsed attrs:", (elapsed*1000), "milliseconds")

s = time.perf_counter()
for i in range(0,250000000):
    x = Binst.a
elapsed = time.perf_counter() - s
print("elapsed attrs-with-slots:", (elapsed*1000), "milliseconds")

s = time.perf_counter()
for i in range(0,250000000):
    x = Cinst.a
elapsed = time.perf_counter() - s
print("elapsed dataclass:", (elapsed*1000), "milliseconds")

s = time.perf_counter()
for i in range(0,250000000):
    x = Dinst.a
elapsed = time.perf_counter() - s
print("elapsed dataclass-with-slots:", (elapsed*1000), "milliseconds")
```

Also note that it IS possible to annotate attrs-classes using the PEP 526
annotation (ie. `a: int = 0` instead of `a = attr.ib(type=int, default=0)`,
but then you lose out on a bunch of its extra features that are also
specified as named parameters to attr.ib (such as validators, kw_only
parameters, etc).

Anyway, the gist of everything is: Slots heavily beat dictionaries,
reducing the RAM usage to less than half of the current dataclass
implementation.

My proposal: Implement `@dataclass(slots=True)` which does the same thing
as attrs: Replaces the class with a modified class that has a `__slots__`
property instead of a `__dict__`. And fully supporting default values in
the process.
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/U6IFJNMPNJMOICMI3OSVRCRSZDMZ3V4M/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to