Hello All,
Back to the thread after a long time.
I have been working on this problem and my findings are below.
1. Dictionaries seem to be slow, has a large memory footprint and
slows down read in multiple places:
1. Instance creation at dictfetchX in the cursor implementation
2. In pickling & compression at socket
3. And in the network (As nicoe pointed out the keys of the
dictionary are redundant)
2. Namedtuple [1] is an alternative but has certain limitations
1. Being a subclass of tuple, commonly needed functionality (in
read method) like append etc are not present. The replace
method in the object in fact creates a new instance of the
namedtuple itself
2. Pickling needs a custom pickler as indicated by Bechamel in
IRC today [2]
3. Limits the use of field names which begin with an _ (Not
applicable since Python 2.7)
3. Hence, a custom data structure which had the following properties
was needed:
1. With comparable or better speed than namedtuple
2. With smaller footprint that dictionary
3. Must be easily pickled and unpickled
4. Must have an easier api to access attributes than what
namedtuple provides
Based on this I set out to develop the custom data structure [3] and I
request your review of the same .
The benchmarks are below (These tests[4] are also on code review and you
could test it):
Size of list: 300000 records to represent a:apple, b:ball, c:cat, d:dog,
e:english
1. With List of dictionary (Current implementation):
real 0m3.422s
user 0m3.212s
sys 0m0.152s
Data size: 9899891
2. With List of namedtuple:
real 0m4.111s (Don't be surprised, namedtuple sucks in
performance when pickled)
user 0m3.858s
sys 0m0.121s
Data size: 5999915
3. With List of Unnamed Structure:
real 0m1.454s
user 0m1.296s
sys 0m0.108s
Data size: 5399950
To summarize:
The UnnamedDatastructure is 58% faster than list of dictionary and 65%
faster than list of namedtuples.
The UnnamedDatastructure is 46% lighter that list of dictionary and 10%
lighter than list of namedtuples.
The API is also cleaner and more friendly as can be seen in the doc
tests [4] of the data structure [3].
Hence the best performance is achieved with this custom data structure
implementation. Requesting your review of code [5] once again.
[1]
http://docs.python.org/release/2.6.6/library/collections.html#collections.namedtuple
[2]
http://docs.activestate.com/activepython/3.1/python/library/pickle.html#persistence-of-external-objects
[3] http://codereview.appspot.com/2193049/patch/1/3
[4] http://docs.python.org/library/doctest.html
[5] http://codereview.appspot.com/2193049/
Thanks,
Sharoon Thomas
Openlabs Technologies & Consulting (P) LTD
http://openlabs.co.in
On 11/08/2010 17:42, Sharoon Thomas wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hello All,
I just tried to alter the database cursor for postgres [1] alone to add a new
method which returns the data as list of namedtuples instead of list of
dictionary.
Tests were done on party.party with all fields being fetched for 100,000
records. the size was separately calculated using pympler [2]
1. List of Dictionary
===============
Time to read:
real 0m3.222s
user 0m2.373s
sys 0m0.295s
Memory footprint: 66804704
2. List of namedtuples
=================
Time to read:
real 0m2.237s
user 0m1.491s
sys 0m0.224s
Memory footprint: 8412336 (wov just 13% of list of dict :)
IMHO it makes sense to implement named tuples for read and I am starting the
work on it. Hope to complete it soon.
[1] trytond/backend/postgres/database.py
[2] http://code.google.com/p/pympler/
Thanks,
Sharoon Thomas
On 6 Aug 2010, at 18:04, Cédric Krier wrote:
On 06/08/10 17:44 +0100, Sharoon Thomas wrote:
Hi all,
I vote for the idea and I think it would be better to return a list of named
tuples (not sure if it can be used over xml rpc or other rpc).
Named tuples have the same memory footprint as normal tuples and gives class
attribute like functionality.
Already thought but I did not check in which version it was available. It
seems it is 2.4 so it is ok.
For xml-rpc and json-rpc, they should be converted to simple list and for
netrpc we could add it as allowed objects.
--
Cédric Krier
B2CK SPRL
Rue de Rotterdam, 4
4000 Liège
Belgium
Tel: +32 472 54 46 59
Email/Jabber: [email protected]
Website: http://www.b2ck.com/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)
iEYEARECAAYFAkxi0wAACgkQaiEY2z2HLxqYPwCeNJjGVofc45SgoGAa0wFQmTxI
SxEAni8/h8tFLhkUkNQ6D5wUeG4o8Uxd
=6KfR
-----END PGP SIGNATURE-----
--
[email protected] mailing list