Re: Reading large (10+GB) files in PHP Command Line

Michael Torrie Tue, 01 Sep 2009 17:15:05 -0700

William Attwood wrote:
> this takes in each line from STDIN, adds it to an array, and when the array
> hits 8000 (my memory limit at the time) it sends the array to a PHP function
> that will process and input it into the DB I am using.
> 
> Just in case anyone needs to process large files, stream them in
> 
> # more file.log | php process.php


I'm a little confused as to why you don't just process the file one line
at a time with little or no memory consumption (file reads are normally
buffered anyway, so reading until a line break is not a bottleneck).
Why the big buffer?  I don't see any speed increases coming from that.

Also sounds like Perl or Python would be a much better fit for your
little problem.  PHP seems like a kludge in this particular case.  In
Python, it's a matter of:

for line in open(file):
   dosomething_with(line)

Maybe off-topic now, but if you need to do operations you'd normally do
in Bash with lots of pipes, you can use generators:
http://www.dabeaz.com/generators/

def my_grep(input_generator):
   for line in input_generator:
       if re.match(expression, line):
           yield line


for line in my_grep(open(file)):
    print 'pattern found in %s' % line

If you string generators together intelligently you can probably match
the speed of Bash and friends, and it certainly is simple.




/*
PLUG: http://plug.org, #utah on irc.freenode.net
Unsubscribe: http://plug.org/mailman/options/plug
Don't fear the penguin.
*/

Re: Reading large (10+GB) files in PHP Command Line

Reply via email to