Tim, Thanks for your reply. Yes, I had been looking at Carl's Large Files examples and used open, but it wouldn't work on the really large files unless I used the /seek option. Using this, I am then forced to retrieve a block of content at a time. It seems like it's going to be the way I have to work with this file.
Thanks again. Brock=20 -----Original Message----- From: [email protected] [mailto:[EMAIL PROTECTED] On Behalf Of Tim Johnson Sent: August 11, 2008 4:36 PM To: [EMAIL PROTECTED] Subject: [REBOL] Re: Working with large files Hi Brock: Have you tried using 'open instead of read? I use open with the direct refinement on large files: Example: inf: open/direct/lines file while [L: pick inf 1] [ ;;do things with L ] close inf > help read USAGE: READ source /binary /string /direct /no-wait /lines /part size /with end-of-line /mode args /custom params /skip length=20 DESCRIPTION: Reads from a file, url, or port-spec (block or object). READ is a native value. ARGUMENTS: source -- (Type: file url object block) REFINEMENTS: /binary -- Preserves contents exactly. /string -- Translates all line terminators. /direct -- Opens the port without buffering. /no-wait -- Returns immediately without waiting if no data. /lines -- Handles data as lines. /part -- Reads a specified amount of data. size -- (Type: number) /with -- Specifies alternate line termination. end-of-line -- (Type: char string) /mode -- Block of above refinements. args -- (Type: block) /custom -- Allows special refinements. params -- (Type: block) /skip -- Skips a number of bytes. length -- (Type: number) HTH Tim On Monday 11 August 2008, Brock Kalef wrote: > I'm looking to read 800+ MB web log files and process the log prior to > running through an analysis tool. I'm running into "Out of Memory" > errors and the odd Rebol Crash in attempting to do this. > > I started out simply reading the data directly into a word and looping > through the data. This worked great for the sample data set of 45 MB. > this then failed on a 430+ MB file. i.e.. data: read/lines=20 > %file-name.log > > I then changed the direct read to use a port i.e.. data-port: > open/lines %file-name.log. This worked for the 430+ MB file but then I > started getting the errors again for the 800+ MB files. > > It's now obvious that I will need to read in portions of the file at a > time. However, I am unsure how to do this while also ensuring I get=20 > all the data. As you can see from my earlier example code, I'm=20 > interested in reading a line at a time for simplicity in processing=20 > the records as they are not fixed width (vary in length). My fear is=20 > that I will not be able to properly handle the records that are=20 > truncated due to the size of the data block I retrieve from the file. > Or atleast not be able to do this easily. Are there any suggestions? > > My guess is that I will need to; > - pull in a fixed length block of data > - read to the data until I reach the first occurrence of a newline > - track the index of the location of the newline > - continue reading the data until I reach the end of the data-block > - once reaching the end of the data retrieved, calculate where the=20 > last record process ended > - read the next data block from that point > - continue until reaching the end of file > > Any other suggestions? > > Regards, > Brock Kalef -- To unsubscribe from the list, just send an email to lists at rebol.com with unsubscribe as the subject. -- To unsubscribe from the list, just send an email to lists at rebol.com with unsubscribe as the subject.
