Hi Brock:
Have you tried using 'open instead of read?
I use open with the direct refinement on large files:
Example:
inf: open/direct/lines file
while [L: pick inf 1] [
;;do things with L
]
close inf
> help read
USAGE:
READ source /binary /string /direct /no-wait /lines /part size /with
end-of-line /mode args /custom params /skip length
DESCRIPTION:
Reads from a file, url, or port-spec (block or object).
READ is a native value.
ARGUMENTS:
source -- (Type: file url object block)
REFINEMENTS:
/binary -- Preserves contents exactly.
/string -- Translates all line terminators.
/direct -- Opens the port without buffering.
/no-wait -- Returns immediately without waiting if no data.
/lines -- Handles data as lines.
/part -- Reads a specified amount of data.
size -- (Type: number)
/with -- Specifies alternate line termination.
end-of-line -- (Type: char string)
/mode -- Block of above refinements.
args -- (Type: block)
/custom -- Allows special refinements.
params -- (Type: block)
/skip -- Skips a number of bytes.
length -- (Type: number)
HTH
Tim
On Monday 11 August 2008, Brock Kalef wrote:
> I'm looking to read 800+ MB web log files and process the log prior to
> running through an analysis tool. I'm running into "Out of Memory"
> errors and the odd Rebol Crash in attempting to do this.
>
> I started out simply reading the data directly into a word and looping
> through the data. This worked great for the sample data set of 45 MB.
> this then failed on a 430+ MB file. i.e.. data: read/lines
> %file-name.log
>
> I then changed the direct read to use a port i.e.. data-port:
> open/lines %file-name.log. This worked for the 430+ MB file but then I
> started getting the errors again for the 800+ MB files.
>
> It's now obvious that I will need to read in portions of the file at a
> time. However, I am unsure how to do this while also ensuring I get all
> the data. As you can see from my earlier example code, I'm interested
> in reading a line at a time for simplicity in processing the records as
> they are not fixed width (vary in length). My fear is that I will not
> be able to properly handle the records that are truncated due to the
> size of the data block I retrieve from the file. Or atleast not be able
> to do this easily. Are there any suggestions?
>
> My guess is that I will need to;
> - pull in a fixed length block of data
> - read to the data until I reach the first occurrence of a newline
> - track the index of the location of the newline
> - continue reading the data until I reach the end of the data-block
> - once reaching the end of the data retrieved, calculate where the last
> record process ended
> - read the next data block from that point
> - continue until reaching the end of file
>
> Any other suggestions?
>
> Regards,
> Brock Kalef
--
To unsubscribe from the list, just send an email to
lists at rebol.com with unsubscribe as the subject.