I repeated your test with much large size, dat1=: 1e6#dat the memory usage is 36x times the byte size of csv. I think this is reasonable for J, because it used several integer arrays of the same length as the csv character. But each integer is 8 byte long and total byte size of 4 such integer array is already 32x the byte size of csv.
I don't think this is a bug in J. If you concern memory usage efficiency, you should do it in C. Putting it the other way, if efficient csv can be done using J script, then special csv code in Jd is not needed. On Tue, May 5, 2020 at 11:55 AM Aaron Ash <[email protected]> wrote: > Hi, > > I've noticed that the tables/dsv addon seems to have an extremely high > memory growth factor when processing csv data: > > load 'tables/csv'dat=: (34;'45';'hello';_5.34),: > 12;'32';'goodbye';1.23d=: makecsv dat# d > NB. 45 chars longtimespacex 'fixcsv d'NB. 2.28e_5 48644864 % 45 NB. > 108.089 factor of memory growth > > This makes loading many datasets effectively impossible even on > reasonably specced machines. > > A 1GB csv file would require 108GB of memory to load which seems > fairly extreme to the point where I would consider this a bug. > > Someone on irc mentioned that generally larger datasets should be > loaded in to jd and that's fair enough but I still would expect to be > able to load csv data reasonably quickly and memory efficiently. > > Is this a bug? Is there a better library to use for csv data? > > Cheers, > > Aaron. > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
