I just remembered this talk I gave last year on using J with "Big Data" -
http://datarave.github.io/20150324/20150324_Devon_McCormick_Working_with_large_correlation_matrixes_using_j/20150324_Devon_McCormick_Working_with_large_correlation_matrixes_using_j.pdf
.

It includes an example of using my "doSomething" adverb to calculate
correlations for periods based on a somewhat large file of daily returns.

On Sat, Jan 2, 2016 at 10:19 PM, Ryan Eckbo <[email protected]> wrote:

> Thanks for the links, I didn't use them in this particular case but I've
> bookmarked them
> for the future.
>
> I used 'taketo' and 'dropto' on a 2.5G file and it's surprising to me how
> fast they are.
>
>
> On 26 Dec 2015, at 9:56, Devon McCormick wrote:
>
> I've developed an adverb for working with large files:
>>
>> http://code.jsoftware.com/wiki/NYCJUG/2014-05-13#Streaming_Through_Large_Files
>> .  An example of using this code can be found here -
>> http://code.jsoftware.com/wiki/User:Devon_McCormick/Code/largeFileVet -
>> and
>> the updated version of the code here -
>> http://code.jsoftware.com/wiki/User:Devon_McCormick/Code/WorkOnLargeFiles
>> .
>>
>> A good place to start might be with an example of using a simple version
>> of
>> an adverb that makes minimal assumptions about the logical structure of
>> the
>> file:
>>
>> http://code.jsoftware.com/wiki/User:Devon_McCormick/Code/WorkOnLargeFiles/SimpleFile
>> .
>>
>> The complete code is here:
>>
>> http://code.jsoftware.com/wiki/User:Devon_McCormick/Code/workOnLargeFile.ijs
>> .
>>
>> On Fri, Dec 25, 2015 at 7:37 AM, Raul Miller <[email protected]>
>> wrote:
>>
>> When you want to see how J will proceed, you can set up an experiment,
>>> and use echo to show what is happening when.
>>>
>>> That said, your "0 verb will operate a box at a time (or a pair of
>>> boxes at a time, since it's dyadic - the "0/ verb thus operating a
>>> pair of boxes at a time but being monadic...).
>>>
>>> So... you'll be reading in a pair of files at a time, and accumulating
>>> the results of your myverb in your J session.
>>>
>>> I hope this helps,
>>>
>>> --
>>> Raul
>>>
>>>
>>> On Fri, Dec 25, 2015 at 4:51 AM, Ryan Eckbo <[email protected]> wrote:
>>>
>>>> I'm processing some big files on the order of 2G, extracting >= 250M of
>>>>
>>> data
>>>
>>>> from
>>>> each. I have to memory map them to get the data:
>>>>
>>>> readbigfile=: 3 : 0
>>>> JCHAR map_jmf_ 'f';y
>>>> NB. get data from f
>>>> unmap_jmf_'f'
>>>> )
>>>>
>>>> I have about 150 of these files together with matching smaller ones, and
>>>>
>>> I
>>>
>>>> need to
>>>> do something like this:
>>>>
>>>> (fread@[ myverb readbigfile@])"0/ SmallFiles,.Bigfiles
>>>>
>>>> My question is how is the J runtime going to execute this: is it going
>>>> to
>>>> proceed line
>>>> by line or try and read all the big files at once?  If the former, is
>>>> the
>>>> memory freed
>>>> right after execution? In general I don't know how to deal with huge
>>>>
>>> arrays.
>>>
>>>>
>>>> Thanks for any help,
>>>> Ryan
>>>>
>>>>
>>>>
>>>> ----------------------------------------------------------------------
>>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>>>
>>> ----------------------------------------------------------------------
>>> For information about J forums see http://www.jsoftware.com/forums.htm
>>>
>>>
>>
>>
>> --
>>
>> Devon McCormick, CFA
>>
>> Quantitative Consultant
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>



-- 

Devon McCormick, CFA

Quantitative Consultant
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to