I just remembered this talk I gave last year on using J with "Big Data" - http://datarave.github.io/20150324/20150324_Devon_McCormick_Working_with_large_correlation_matrixes_using_j/20150324_Devon_McCormick_Working_with_large_correlation_matrixes_using_j.pdf .
It includes an example of using my "doSomething" adverb to calculate correlations for periods based on a somewhat large file of daily returns. On Sat, Jan 2, 2016 at 10:19 PM, Ryan Eckbo <[email protected]> wrote: > Thanks for the links, I didn't use them in this particular case but I've > bookmarked them > for the future. > > I used 'taketo' and 'dropto' on a 2.5G file and it's surprising to me how > fast they are. > > > On 26 Dec 2015, at 9:56, Devon McCormick wrote: > > I've developed an adverb for working with large files: >> >> http://code.jsoftware.com/wiki/NYCJUG/2014-05-13#Streaming_Through_Large_Files >> . An example of using this code can be found here - >> http://code.jsoftware.com/wiki/User:Devon_McCormick/Code/largeFileVet - >> and >> the updated version of the code here - >> http://code.jsoftware.com/wiki/User:Devon_McCormick/Code/WorkOnLargeFiles >> . >> >> A good place to start might be with an example of using a simple version >> of >> an adverb that makes minimal assumptions about the logical structure of >> the >> file: >> >> http://code.jsoftware.com/wiki/User:Devon_McCormick/Code/WorkOnLargeFiles/SimpleFile >> . >> >> The complete code is here: >> >> http://code.jsoftware.com/wiki/User:Devon_McCormick/Code/workOnLargeFile.ijs >> . >> >> On Fri, Dec 25, 2015 at 7:37 AM, Raul Miller <[email protected]> >> wrote: >> >> When you want to see how J will proceed, you can set up an experiment, >>> and use echo to show what is happening when. >>> >>> That said, your "0 verb will operate a box at a time (or a pair of >>> boxes at a time, since it's dyadic - the "0/ verb thus operating a >>> pair of boxes at a time but being monadic...). >>> >>> So... you'll be reading in a pair of files at a time, and accumulating >>> the results of your myverb in your J session. >>> >>> I hope this helps, >>> >>> -- >>> Raul >>> >>> >>> On Fri, Dec 25, 2015 at 4:51 AM, Ryan Eckbo <[email protected]> wrote: >>> >>>> I'm processing some big files on the order of 2G, extracting >= 250M of >>>> >>> data >>> >>>> from >>>> each. I have to memory map them to get the data: >>>> >>>> readbigfile=: 3 : 0 >>>> JCHAR map_jmf_ 'f';y >>>> NB. get data from f >>>> unmap_jmf_'f' >>>> ) >>>> >>>> I have about 150 of these files together with matching smaller ones, and >>>> >>> I >>> >>>> need to >>>> do something like this: >>>> >>>> (fread@[ myverb readbigfile@])"0/ SmallFiles,.Bigfiles >>>> >>>> My question is how is the J runtime going to execute this: is it going >>>> to >>>> proceed line >>>> by line or try and read all the big files at once? If the former, is >>>> the >>>> memory freed >>>> right after execution? In general I don't know how to deal with huge >>>> >>> arrays. >>> >>>> >>>> Thanks for any help, >>>> Ryan >>>> >>>> >>>> >>>> ---------------------------------------------------------------------- >>>> For information about J forums see http://www.jsoftware.com/forums.htm >>>> >>> ---------------------------------------------------------------------- >>> For information about J forums see http://www.jsoftware.com/forums.htm >>> >>> >> >> >> -- >> >> Devon McCormick, CFA >> >> Quantitative Consultant >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm >> > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > -- Devon McCormick, CFA Quantitative Consultant ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
