[REBOL] Re: Working with large files

Brock Kalef Tue, 12 Aug 2008 06:06:49 -0700

Tim,
Thanks for your reply.  Yes, I had been looking at Carl's Large Files
examples and used open, but it wouldn't work on the really large files
unless I used the /seek option.  Using this, I am then forced to
retrieve a block of content at a time.  It seems like it's going to be
the way I have to work with this file.


Thanks again.

Brock=20

-----Original Message-----
From: [email protected] [mailto:[EMAIL PROTECTED] On Behalf
Of Tim Johnson
Sent: August 11, 2008 4:36 PM
To: [EMAIL PROTECTED]
Subject: [REBOL] Re: Working with large files


Hi Brock:
Have you tried using 'open instead of read?
I use open with the direct refinement on large files:
Example:
                inf: open/direct/lines file
                while [L: pick inf 1] [
                        ;;do things with L
                        ]
                close inf
> help read
USAGE:
    READ source /binary /string /direct /no-wait /lines /part size /with
end-of-line /mode args /custom params /skip length=20

DESCRIPTION:
     Reads from a file, url, or port-spec (block or object).
     READ is a native value.

ARGUMENTS:
     source -- (Type: file url object block)

REFINEMENTS:
     /binary -- Preserves contents exactly.
     /string -- Translates all line terminators.
     /direct -- Opens the port without buffering.
     /no-wait -- Returns immediately without waiting if no data.
     /lines -- Handles data as lines.
     /part -- Reads a specified amount of data.
         size -- (Type: number)
     /with -- Specifies alternate line termination.
         end-of-line -- (Type: char string)
     /mode -- Block of above refinements.
         args -- (Type: block)
     /custom -- Allows special refinements.
         params -- (Type: block)
     /skip -- Skips a number of bytes.
         length -- (Type: number)
HTH
Tim
On Monday 11 August 2008, Brock Kalef wrote:
> I'm looking to read 800+ MB web log files and process the log prior to

> running through an analysis tool.  I'm running into "Out of Memory"
> errors and the odd Rebol Crash in attempting to do this.
>
> I started out simply reading the data directly into a word and looping

> through the data.  This worked great for the sample data set of 45 MB.
> this then failed on a 430+ MB file.  i.e..  data: read/lines=20
> %file-name.log
>
> I then changed the direct read to use a port i.e..   data-port:
> open/lines %file-name.log.   This worked for the 430+ MB file but then
I
> started getting the errors again for the 800+ MB files.
>
> It's now obvious that I will need to read in portions of the file at a

> time.  However, I am unsure how to do this while also ensuring I get=20
> all the data.  As you can see from my earlier example code, I'm=20
> interested in reading a line at a time for simplicity in processing=20
> the records as they are not fixed width (vary in length).  My fear is=20
> that I will not be able to properly handle the records that are=20
> truncated due to the size of the data block I retrieve from the file.

> Or atleast not be able to do this easily.  Are there any suggestions?
>
> My guess is that I will need to;
> -  pull in a fixed length block of data
> -  read to the data until I reach the first occurrence of a newline
> -  track the index of the location of the newline
> -  continue reading the data until I reach the end of the data-block
> -  once reaching the end of the data retrieved, calculate where the=20
> last record process ended
> -  read the next data block from that point
> -  continue until reaching the end of file
>
> Any other suggestions?
>
> Regards,
> Brock Kalef


--
To unsubscribe from the list, just send an email to lists at rebol.com
with unsubscribe as the subject.

-- 
To unsubscribe from the list, just send an email to 
lists at rebol.com with unsubscribe as the subject.

[REBOL] Re: Working with large files

Reply via email to