[racket-users] Loading a 1.2GB serialized file

Matt Jadud Mon, 13 Apr 2015 11:03:47 -0700

Hi all,

I have a slight problem. While running some data analysis, I decided to
cache some hash tables to disk. I started by using FASL, but found that it
was fragile between versions of Racket. For example, I have v6 in one place
and v6.1.1 in another... and, I can't afford to have these cached versions
not have *some* persistent value, in as much as I am working across desktop
Macs and server Linux machines with the same scripts, which are slightly
out-of-sync in Racket versions.


So, I just used straight serialize. This was OK for reasonable amounts of
data, but I forgot that I might be dealing with unreasonable amounts of
data. I am now dealing with a file that is 1.2GB in size, and contains a
hash with many keys, each of which contains a fair bit of data in the form
of additional hashes, lists, and strings. My goal, ultimately, would be to
read this file in and run my analysis.

As I attempt to run an analysis now (which involves loading the serialized
data as-is), I get the following:

Couldn't allocate memory: (os/kern) no space available
Abort trap: 6

As far as I know (which isn't very far), this is coming from the OS, not
Racket. I'm running a 10.9 machine with 16GB of RAM that is, generally,
doing a bunch of desktop-machine kinds of things (as opposed to a
stripped-down Linux server).

My thoughts at this point:

1. Hope Matthew Flatt has a magic incantation up his sleeve.

2a. Fire up an Amazon VM with 60+GB of RAM, and see if that lets me run my
analysis.

2b. Generally find access to a machine with a lot more RAM than I have in
any of my machines at the moment.

3. Wish I had stored my data differently.

I can, if need be, modify the storage format and do something that is more
RAM-friendly (allowing for sequential, element-by-element load-and-analyze
from cached local data). The database I'm working against is large enough
that my initial query can sometimes take a while to run  (days), and I'm
not excited about going back to that stage for this particular query. I
will, as necessary. And, of course, my SQL-fu may be weak, but I'm dealing
with tables that are larger than my typical experience
(tens-to-hundreds-of-millions of elements), and I have to do one join and a
bit of filtering, which slows things down, even when indexed.

Racket seemed to be happy to build the hash table in the first place (see
#3), but loading it back is currently my stumbling block.

Ah, adventures with somewhat large data.

Cheers,
Matt

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[racket-users] Loading a 1.2GB serialized file

Reply via email to