If I may, and having heard no reply to my defense of component files, I'd 
suggest jfiles as a potential solution. Each record can take whatever size you 
are comfortable with: a month, year, or decade. Or am I missing something (as 
usual)?

> On Mar 31, 2020, at 8:05 PM, Raul Miller <[email protected]> wrote:
> 
> If you have enough memory for the intermediate results, you would have
> no problems with a file that large. You need an order of magnitude
> more memory for intermediate results than the raw data, though.
> 
> Me, if I was working with something that big, I'd probably break it
> into pieces first, textually, before trying to process it numerically.
> Like, maybe use the first column as a file name (discarding that
> column from the intermediate files -- or, better, replacing each
> acronym an index value, and also removing the '-' characters from the
> date column perhaps using a big string replace based on the month and
> year... something like ,2017-01- becomes ,201701 and so on...).
> Basically: putting each intermediate file in some target directory.
> 
> If you didn't have even a gigabyte of memory on your machine, you
> could use index reads. For example, 1!:11 -- see
> https://www.jsoftware.com/help/dictionary/dx001.htm -- with a starting
> offset of 0 and a length of 1e7, then find how many you'd have to drop
> to get to the last line feed ((#txt) - 1+ txt i: LF), and drop those
> extra, and make the next offset be that many bytes further into the
> file. Then iterate...
> 
> That said, ec2 instances go up to 3904 gigabytes of ram, which would
> be more than adequate to plow through that much data, if you wanted to
> throw money at Amazon. A 64MB machine should be big enough, though, I
> expect.
> 
> Thanks,
> 
> -- 
> Raul
> 
>> On Tue, Mar 31, 2020 at 12:58 AM HH PackRat <[email protected]> wrote:
>> 
>> Finishing up with function #4......
>> 
>> I have a very large file consisting of multiple sets of historical
>> stock prices that I would like to split into individual files for each
>> stock.  (I'll probably first have to write out all the files to a USB
>> flash drive [I have limited hard drive space, but it might work as a
>> very tight fit] and then, when finished, burn them to a DVD-ROM for
>> more permanent storage.)  Since I thought that J was capable of
>> handling very large files, I figured that this might be a challenge to
>> try.
>> 
>> Unfortunately, I don't know how to handle file reading where you might
>> only be able to read a part of the file at a time.  (I don't know how
>> large a file J can read--maybe it can read the whole file.)  This file
>> has 14,937,606 lines and is 1.63 GB (1,759,801,721 bytes) in size.
>> 
>> Additionally (and probably most importantly), I don't know how to
>> collect a subset of the contents of a file to output to a file, and
>> then resume where J left off and collect the next subset of data to
>> output, and so on.
>> 
>> I'm going to need a LOT of help with this J programming!
>> 
>> Below is a sample of the data--5 days' worth of data for 5 different
>> stocks.  The master file is a csv file, and the individual outputs (5
>> in this case) should also be csv files.  (Obviously, row 0 needs to be
>> ignored.)  The output files should use the ticker symbol as the name
>> for each file (e.g., AA.csv).  The ticker symbol (column 0) should be
>> stripped off of each line of data, with only the remainder of each row
>> (date onward being retained) being cumulated for output.
>> 
>> Please correct me if I'm wrong, but my assumption is that if code
>> works for these 25 lines of data, the code ought to work as well for
>> 14,937,606 lines!
>> 
>> DATA SET D:
>> __________________________________________________
>> 
>> ticker,date,open,high,low,close,volume
>> AA,2017-06-27,31.6,32.5,31.49,31.63,5463485.0
>> AA,2017-06-28,32.1,33.0,31.93,32.95,3764296.0
>> AA,2017-06-29,33.11,33.34,32.61,33.18,3730077.0
>> AA,2017-06-30,33.16,33.45,32.535,32.65,3014777.0
>> AA,2017-07-03,32.94,34.3,32.915,34.02,3112086.0
>> AAPL,2017-06-28,144.49,146.11,143.1601,145.83,21915939.0
>> AAPL,2017-06-29,144.71,145.13,142.28,143.68,31116980.0
>> AAPL,2017-06-30,144.45,144.96,143.78,144.02,22328979.0
>> AAPL,2017-07-03,144.88,145.3001,143.1,143.5,14276812.0
>> AAPL,2017-07-05,143.69,144.79,142.7237,144.09,20758795.0
>> GE,2017-06-28,27.26,27.4,27.05,27.08,30759065.0
>> GE,2017-06-29,27.16,27.41,26.79,27.02,36443559.0
>> GE,2017-06-30,27.09,27.19,26.91,27.01,25849199.0
>> GE,2017-07-03,27.16,27.59,27.06,27.45,20664966.0
>> GE,2017-07-05,27.54,27.56,27.23,27.35,21082332.0
>> IBM,2017-06-28,155.15,155.55,154.78,155.32,2203062.0
>> IBM,2017-06-29,155.35,155.74,153.62,154.13,3245649.0
>> IBM,2017-06-30,154.28,154.5,153.14,153.83,3501395.0
>> IBM,2017-07-03,153.58,156.025,153.52,155.58,2822499.0
>> IBM,2017-07-05,155.77,155.89,153.63,153.67,3558639.0
>> T,2017-06-28,37.88,38.065,37.78,37.94,20312146.0
>> T,2017-06-29,37.87,37.98,37.62,37.62,23508452.0
>> T,2017-06-30,37.73,37.87,37.54,37.73,22303282.0
>> T,2017-07-03,37.84,38.13,37.785,38.11,11123146.0
>> T,2017-07-05,38.11,38.21,37.85,38.12,19644726.0
>> __________________________________________________
>> 
>> SUPER thanks in advance for any and all help with this one!
>> 
>> Harvey
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to