On 10/29/11 11:44, Gelonida N wrote:
I would like to save many dicts with a fixed (and known) amount of keys
in a memory efficient manner (no random, but only sequential access is
required) to a file (which can later be sent over a slow expensive
network to other machines)

Example:
Every dict will have the keys 'timestamp', 'floatvalue', 'intvalue',
'message1', 'message2'
'timestamp' is an integer
'floatvalue' is a float
'intvalue' an int
'message1' is a string with a length of max 2000 characters, but can
often be very short
'message2' the same as message1

so a typical dict will look like
{ 'timetamp' : 12, 'floatvalue': 3.14159, 'intvalue': 42,
  'message1' : '', 'message2' : '=' * 1999 }



What do you call "many"? Fifty? A thousand? A thousand million? How many
items in each dict? Ten? A million?

File size can be between 100kb and over 100Mb per file. Files will be
accumulated over months.

If Steven's pickle-protocol2 solution doesn't quite do what you need, you can do something like the code below. Gzip is pretty good at addressing...

Or have you considered simply compressing the files?
Compression makes sense but the inital file format should be
already rather 'compact'

...by compressing out a lot of the duplicate aspects. Which also mitigates some of the verbosity of CSV.

It serializes the data to a gzipped CSV file then unserializes it. Just point it at the appropriate data-source, adjust the column-names and data-types

-tkc

from gzip import GzipFile
from csv import writer, reader

data = [ # use your real data here
    {
    'timestamp': 12,
    'floatvalue': 3.14159,
    'intvalue': 42,
    'message1': 'hello world',
    'message2': '=' * 1999,
    },
    ] * 10000


f = GzipFile('data.gz', 'wb')
try:
    w = writer(f)
    for row in data:
        w.writerow([
            row[name] for name in (
            # use your real col-names here
            'timestamp',
            'floatvalue',
            'intvalue',
            'message1',
            'message2',
            )])
finally:
    f.close()

output = []
for row in reader(GzipFile('data.gz')):
    d = dict((
        (name, f(row[i]))
        for i, (f,name) in enumerate((
            # adjust for your column-names/data-types
            (int, 'timestamp'),
            (float, 'floatvalue'),
            (int, 'intvalue'),
            (str, 'message1'),
            (str, 'message2'),
            ))))
    output.append(d)

# or

output = [
    dict((
        (name, f(row[i]))
        for i, (f,name) in enumerate((
            # adjust for your column-names/data-types
            (int, 'timestamp'),
            (float, 'floatvalue'),
            (int, 'intvalue'),
            (str, 'message1'),
            (str, 'message2'),
            ))))
    for row in reader(GzipFile('data.gz'))
    ]
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to