Re: save tuple of simple data types to disk (low memory foot print)

Tim Chase Sat, 29 Oct 2011 11:26:21 -0700

On 10/29/11 11:44, Gelonida N wrote:

I would like to save many dicts with a fixed (and known) amount of keys
in a memory efficient manner (no random, but only sequential access is
required) to a file (which can later be sent over a slow expensive
network to other machines)


Example:
Every dict will have the keys 'timestamp', 'floatvalue', 'intvalue',
'message1', 'message2'
'timestamp' is an integer
'floatvalue' is a float
'intvalue' an int
'message1' is a string with a length of max 2000 characters, but can
often be very short
'message2' the same as message1

so a typical dict will look like
{ 'timetamp' : 12, 'floatvalue': 3.14159, 'intvalue': 42,
  'message1' : '', 'message2' : '=' * 1999 }


What do you call "many"? Fifty? A thousand? A thousand million? How many
items in each dict? Ten? A million?


File size can be between 100kb and over 100Mb per file. Files will be
accumulated over months.

If Steven's pickle-protocol2 solution doesn't quite do what youneed, you can do something like the code below. Gzip is prettygood at addressing...

Or have you considered simply compressing the files?

Compression makes sense but the inital file format should be
already rather 'compact'

...by compressing out a lot of the duplicate aspects. Which alsomitigates some of the verbosity of CSV.

It serializes the data to a gzipped CSV file then unserializesit. Just point it at the appropriate data-source, adjust thecolumn-names and data-types


-tkc

from gzip import GzipFile
from csv import writer, reader

data = [ # use your real data here
    {
    'timestamp': 12,
    'floatvalue': 3.14159,
    'intvalue': 42,
    'message1': 'hello world',
    'message2': '=' * 1999,
    },
    ] * 10000


f = GzipFile('data.gz', 'wb')
try:
    w = writer(f)
    for row in data:
        w.writerow([
            row[name] for name in (
            # use your real col-names here
            'timestamp',
            'floatvalue',
            'intvalue',
            'message1',
            'message2',
            )])
finally:
    f.close()

output = []
for row in reader(GzipFile('data.gz')):
    d = dict((
        (name, f(row[i]))
        for i, (f,name) in enumerate((
            # adjust for your column-names/data-types
            (int, 'timestamp'),
            (float, 'floatvalue'),
            (int, 'intvalue'),
            (str, 'message1'),
            (str, 'message2'),
            ))))
    output.append(d)

# or

output = [
    dict((
        (name, f(row[i]))
        for i, (f,name) in enumerate((
            # adjust for your column-names/data-types
            (int, 'timestamp'),
            (float, 'floatvalue'),
            (int, 'intvalue'),
            (str, 'message1'),
            (str, 'message2'),
            ))))
    for row in reader(GzipFile('data.gz'))
    ]
--
http://mail.python.org/mailman/listinfo/python-list

Re: save tuple of simple data types to disk (low memory foot print)

Reply via email to