On Tuesday 17 June 2008, Assaf Gordon wrote: > Hello all, > > I'm having problems loading big files into memory - maybe you could help > me solve them. > > My data file is a big (~250MB) text file, with eight tab-separated > fields. I want to load the entire file into a list.
Perl has a lot of overhead on its data-structures. So it may run out of memory with such large data. I suggest you use some kind of database instead: * http://www.postgresql.org/ - a client/server SQL database (MySQL is not recommended due to http://www.shlomifish.org/open-source/anti/mysql/ ). * http://www.sqlite.org/ - a file-based SQL database. Public Domain. * http://www.oracle.com/technology/products/berkeley-db/index.html - Berkeley DB - a simple key/value-based database. (GPL-like licence). * http://freshmeat.net/projects/tokyocabinet/ - an LGPLed database that seems similar to BDB. (I did not test it yet). > > I've narrowed down the code into this: > ------------- > #!/usr/bin/perl > use strict; > use warnings; > use Data::Dumper; > use Devel::Size qw (size total_size); > > my @probes; > while (<>) { > my @fields = split(/\s+/); > push @probes, [EMAIL PROTECTED]; > } > > print "size = ", size([EMAIL PROTECTED]),"\n"; > print "total size= ", total_size([EMAIL PROTECTED]),"\n"; > print "data size = ", total_size([EMAIL PROTECTED])- size([EMAIL > PROTECTED]),"\n"; > print Dumper([EMAIL PROTECTED]),"\n"; > ------------ > (Can't get any simpler than that, right?) > > But when I run the program, the perl process consumes 2.5GB of memory, > prints "out of memory" and stops. That is expected. > > I know that perl isn't the most efficient memory consumer, but surely > there's a way to do it... You can try using perltie games - http://perldoc.perl.org/perltie.html , but I would recommend against it. Just use a database, or possibly use a C extension with hand-crafted memory allocation. That or get a 64-bit machine with lots of available memory. ;-) Regards, Shlomi Fish > > If you care to test it yourselves, here's a simple script that creates a > dummy text file, similar to my own data file: > ----- > #!/usr/bin/perl > foreach (1..2100000) { print join("\t", "LONG-TEXT-FIELD", 11111, > 222222, 3333333, 44444444, 5555555, 6666666, > "VERY-VERY-VERY-VERY-VERY-VERY-VERY-VERY-VERY-LONG-TEXT-FIELD" ),"\n" ; } > ----- > > > Thanks in advance for your help! > Assaf. > > _______________________________________________ > Perl mailing list > [email protected] > http://perl.org.il/mailman/listinfo/perl ----------------------------------------------------------------- Shlomi Fish http://www.shlomifish.org/ My Aphorisms - http://www.shlomifish.org/humour.html The bad thing about hardware is that it sometimes works and sometimes doesn't. The good thing about software is that it's consistent: it always does not work, and it always does not work in exactly the same way. _______________________________________________ Perl mailing list [email protected] http://perl.org.il/mailman/listinfo/perl
