[Pharo-users] running out of memory while processing a 220MB csv file with NeoCSVReader - tips?

Paul DeBruicker Fri, 14 Nov 2014 10:10:25 -0800

Hi -

I'm processing a 9 GBs of CSV files (the biggest file is 220MB or so).  I'm not 
sure if its because of the size of the files or the code I've written to keep 
track of the domain objects I'm interested in, but I'm getting out of memory 
errors & crashes in Pharo 3 on Mac with the latest VM.  I haven't checked other 
vms.


I'm going to profile my own code and attempt to split the files manually for 
now to see what else it could be. 


Right now I'm doing something similar to

        |file reader|
        file:= '/path/to/file/myfile.csv' asFileReference readStream.
        reader: NeoCSVReader on: file

        reader
                recordClass: MyClass; 
                skipHeader;
                addField: #myField:;
                ....
        

        reader do:[:eachRecord | self seeIfRecordIsInterestingAndIfSoKeepIt: 
eachRecord].
        file close.



Is there a facility in NeoCSVReader to read a file in batches (e.g. 1000 lines 
at a time) or an easy way to do that ?




Thanks

Paul

[Pharo-users] running out of memory while processing a 220MB csv file with NeoCSVReader - tips?

Reply via email to