On Mon, 11 Aug 2008 15:11:56 -0400, Brock Kalef wrote: > I'm looking to read 800+ MB web log files and process the log prior to= running through an > analysis tool. I'm running into "Out of Memory" errors and the odd Rebol= Crash in attempting to > do this. > > I started out simply reading the data directly into a word and looping= through the data. This > worked great for the sample data set of 45 MB. this then failed on a 430+= MB file. i.e.. data: > read/lines %file-name.log > > I then changed the direct read to use a port i.e.. data-port: open/lines= %file-name.log. This > worked for the 430+ MB file but then I started getting the errors again= for the 800+ MB files. > > It's now obvious that I will need to read in portions of the file at a= time. However, I am > unsure how to do this while also ensuring I get all the data. As you can= see from my earlier > example code, I'm interested in reading a line at a time for simplicity in= processing the records > as they are not fixed width (vary in length). My fear is that I will not= be able to properly > handle the records that are truncated due to the size of the data block I= retrieve from the file. > Or atleast not be able to do this easily. Are there any suggestions? > > My guess is that I will need to; > - pull in a fixed length block of data > - read to the data until I reach the first occurrence of a newline - = track the index of the > location of the newline > - continue reading the data until I reach the end of the data-block - = once reaching the end of > the data retrieved, calculate where the last record process ended - read= the next data block > from that point - continue until reaching the end of file > > Any other suggestions? > > Regards, > Brock Kalef
Sounds like a plan to me. Just ran this on a 1.9 GB file and it was= surprisingly fast (kept my HD busy for sure): port: open/seek %/c/apache.log chunksize: 1'048'576 ; 1 MB chunks forskip port chunksize [ chunk: copy/part port chunksize ] close port Do you really need to process it line by line though? That would really slow= it down. Sure you cannot operate on the chunks in their entirety somehow? Cheers, Kai -- To unsubscribe from the list, just send an email to lists at rebol.com with unsubscribe as the subject.
