Hello All,

Back to the thread after a long time.

I have been working on this problem and my findings are below.

  1. Dictionaries seem to be slow, has a large memory footprint and
     slows down read in multiple places:
        1. Instance creation at dictfetchX in the cursor implementation
        2. In pickling & compression at socket
        3. And in the network (As nicoe pointed out the keys of the
           dictionary are redundant)
  2. Namedtuple [1] is an alternative but has certain limitations
        1. Being a subclass of tuple, commonly needed functionality (in
           read method) like append etc are not present. The replace
           method in the object in fact creates a new instance of the
           namedtuple itself
        2. Pickling needs a custom pickler as indicated by Bechamel in
           IRC today [2]
        3. Limits the use of field names which begin with an _ (Not
           applicable since Python 2.7)
  3. Hence, a custom data structure which had the following properties
     was needed:
        1. With comparable or better speed than namedtuple
        2. With smaller footprint that dictionary
        3. Must be easily pickled and unpickled
        4. Must have an easier api to access attributes than what
           namedtuple provides

Based on this I set out to develop the custom data structure [3] and I request your review of the same .

The benchmarks are below (These tests[4] are also on code review and you could test it):

Size of list: 300000 records to represent a:apple, b:ball, c:cat, d:dog, e:english

1. With List of dictionary (Current implementation):

   real    0m3.422s
   user    0m3.212s
   sys    0m0.152s

   Data size: 9899891

2. With List of namedtuple:

   real    0m4.111s (Don't be surprised, namedtuple sucks in
   performance when pickled)
   user    0m3.858s
   sys    0m0.121s

   Data size: 5999915

3. With List of Unnamed Structure:

   real    0m1.454s
   user    0m1.296s
   sys    0m0.108s

   Data size: 5399950


To summarize:

The UnnamedDatastructure is 58% faster than list of dictionary and 65% faster than list of namedtuples. The UnnamedDatastructure is 46% lighter that list of dictionary and 10% lighter than list of namedtuples.

The API is also cleaner and more friendly as can be seen in the doc tests [4] of the data structure [3].

Hence the best performance is achieved with this custom data structure implementation. Requesting your review of code [5] once again.

[1] http://docs.python.org/release/2.6.6/library/collections.html#collections.namedtuple [2] http://docs.activestate.com/activepython/3.1/python/library/pickle.html#persistence-of-external-objects
[3] http://codereview.appspot.com/2193049/patch/1/3
[4] http://docs.python.org/library/doctest.html
[5] http://codereview.appspot.com/2193049/

Thanks,

Sharoon Thomas
Openlabs Technologies & Consulting (P) LTD
http://openlabs.co.in

On 11/08/2010 17:42, Sharoon Thomas wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello All,

I just tried to alter the database cursor for postgres [1] alone to add a new 
method which returns the data as list of namedtuples instead of list of 
dictionary.

Tests were done on party.party with all fields being fetched for 100,000 
records. the size was separately calculated using pympler [2]

1. List of Dictionary
===============
Time to read:
real            0m3.222s
user    0m2.373s
sys             0m0.295s

Memory footprint: 66804704

2. List of namedtuples
=================
Time to read:
real            0m2.237s
user    0m1.491s
sys             0m0.224s

Memory footprint: 8412336 (wov just 13% of list of dict :)

IMHO it makes sense to implement named tuples for read and I am starting the 
work on it. Hope to complete it soon.

[1] trytond/backend/postgres/database.py
[2] http://code.google.com/p/pympler/

Thanks,

Sharoon Thomas

On 6 Aug 2010, at 18:04, Cédric Krier wrote:

On 06/08/10 17:44 +0100, Sharoon Thomas wrote:
Hi all,

I vote for the idea and I think it would be better to return a list of named
tuples (not sure if it can be used over xml rpc or other rpc).

Named tuples have the same memory footprint as normal tuples and gives class
attribute like functionality.
Already thought but I did not check in which version it was available. It
seems it is 2.4 so it is ok.
For xml-rpc and json-rpc, they should be converted to simple list and for
netrpc we could add it as allowed objects.

--
Cédric Krier

B2CK SPRL
Rue de Rotterdam, 4
4000 Liège
Belgium
Tel: +32 472 54 46 59
Email/Jabber: [email protected]
Website: http://www.b2ck.com/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iEYEARECAAYFAkxi0wAACgkQaiEY2z2HLxqYPwCeNJjGVofc45SgoGAa0wFQmTxI
SxEAni8/h8tFLhkUkNQ6D5wUeG4o8Uxd
=6KfR
-----END PGP SIGNATURE-----

--
[email protected] mailing list

Reply via email to