[REBOL] Re: Working with large files

Tim Johnson Mon, 11 Aug 2008 19:44:14 -0700

Hi Brock:
Have you tried using 'open instead of read?
I use open with the direct refinement on large files:
Example:
                inf: open/direct/lines file
                while [L: pick inf 1] [
                        ;;do things with L
                        ]
                close inf
> help read
USAGE:
    READ source /binary /string /direct /no-wait /lines /part size /with 
end-of-line /mode args /custom params /skip length


DESCRIPTION:
     Reads from a file, url, or port-spec (block or object).
     READ is a native value.

ARGUMENTS:
     source -- (Type: file url object block)

REFINEMENTS:
     /binary -- Preserves contents exactly.
     /string -- Translates all line terminators.
     /direct -- Opens the port without buffering.
     /no-wait -- Returns immediately without waiting if no data.
     /lines -- Handles data as lines.
     /part -- Reads a specified amount of data.
         size -- (Type: number)
     /with -- Specifies alternate line termination.
         end-of-line -- (Type: char string)
     /mode -- Block of above refinements.
         args -- (Type: block)
     /custom -- Allows special refinements.
         params -- (Type: block)
     /skip -- Skips a number of bytes.
         length -- (Type: number)
HTH
Tim
On Monday 11 August 2008, Brock Kalef wrote:
> I'm looking to read 800+ MB web log files and process the log prior to
> running through an analysis tool.  I'm running into "Out of Memory"
> errors and the odd Rebol Crash in attempting to do this.
>
> I started out simply reading the data directly into a word and looping
> through the data.  This worked great for the sample data set of 45 MB.
> this then failed on a 430+ MB file.  i.e..  data: read/lines
> %file-name.log
>
> I then changed the direct read to use a port i.e..   data-port:
> open/lines %file-name.log.   This worked for the 430+ MB file but then I
> started getting the errors again for the 800+ MB files.
>
> It's now obvious that I will need to read in portions of the file at a
> time.  However, I am unsure how to do this while also ensuring I get all
> the data.  As you can see from my earlier example code, I'm interested
> in reading a line at a time for simplicity in processing the records as
> they are not fixed width (vary in length).  My fear is that I will not
> be able to properly handle the records that are truncated due to the
> size of the data block I retrieve from the file.  Or atleast not be able
> to do this easily.  Are there any suggestions?
>
> My guess is that I will need to;
> -  pull in a fixed length block of data
> -  read to the data until I reach the first occurrence of a newline
> -  track the index of the location of the newline
> -  continue reading the data until I reach the end of the data-block
> -  once reaching the end of the data retrieved, calculate where the last
> record process ended
> -  read the next data block from that point
> -  continue until reaching the end of file
>
> Any other suggestions?
>
> Regards,
> Brock Kalef


-- 
To unsubscribe from the list, just send an email to 
lists at rebol.com with unsubscribe as the subject.

[REBOL] Re: Working with large files

Reply via email to