Scott, I'd be interesting in hearing the results and any timings your final approach. I've worked in this area too after finding the addons to be slower than I wanted. I had some success using memory mapped files on csvs by making it a fixed line width by padding all lines to the longest found line with extra spaces.
I've also had excellent performance when using memory mapped files (JINT) on 5M+ int64s. Basically instantaneous On Mon, Nov 11, 2013 at 9:02 PM, bill lam <[email protected]> wrote: > If you are sure they are well formed and numeric only and no missing items, > then you do not need that addons, eg > > a=: 0 : 0 > 1,2,3 > 4,5,6 > ) > a > 1,2,3 > 4,5,6 > > ".;._2 a > 1 2 3 > 4 5 6 > > beware if it contains negative numbers, you might need to replace > the - with _ first. > > Пн, 11 ноя 2013, Scott Locklin писал(а): > > Pascal wrote: > > > > >Can you be more specific about the code? > > >I assume that you looked into cut ;. ? > > >an alternative to boxing might be to strip out the commas, and then run > 0&". on the string. Not sure that is faster though. > > > > I'm sorry for being unclear. I did something like this: > > > > loadd 'tables/csv' > > datloc=: '/path/to/csvs/' > > > > ip=: ".> readcsv datloc,'chunk1.csv' > > ip=: ip, ".> readcsv datloc,'chunk2.csv' > > (etc) > > > > ip is fairly small, but the boxed array read in by readcsv is bloody > enormous. readcsv is also pretty slow. I solved the problem with chunking > the csvs, but waiting around for several minutes seemed a very un J-like > experience. By comparison, the binsearch I needed to do took a fraction of > a second (it took almost a half hour in R, which has no native binsearch). > > > > -Scott, who will definitely be taking Eric up on his kind offer > > ---------------------------------------------------------------------- > > For information about J forums see http://www.jsoftware.com/forums.htm > > -- > regards, > ==================================================== > GPG key 1024D/4434BAB3 2008-08-24 > gpg --keyserver subkeys.pgp.net --recv-keys 4434BAB3 > gpg --keyserver subkeys.pgp.net --armor --export 4434BAB3 > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
