[PHP] Parsing a large file

2006-01-13 Thread Wolf
I have large log files from a web server (about a gig in size) and need
to parse each line looking for a string, and when encountered push that
line to a new file.  I was thinking I could have PHP read in the whole
file, but thinking it could be a major pain since I have about 20 log
files to read through.

Anyone have some suggestions?

Thanks,
Robert

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP] Parsing a large file

2006-01-13 Thread Albert
Wolf wrote:
 I have large log files from a web server (about a gig in size) and need
 to parse each line looking for a string, and when encountered push that
 line to a new file.  I was thinking I could have PHP read in the whole
 file, but thinking it could be a major pain since I have about 20 log
 files to read through.

 Anyone have some suggestions?

Is this on a Linux server?

Why don’t you use grep?

cat filename | grep string  newfile

see man grep for detail on grep. (It uses regular expressions)

Albert

-- 
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.371 / Virus Database: 267.14.17/228 - Release Date: 2006/01/12
 

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Parsing a large file

2006-01-13 Thread Wolf
Windows server, though I may dump it to linux to get my smaller file,
however not sure my admin would like that.  :)

Albert wrote:
 Wolf wrote:
 
I have large log files from a web server (about a gig in size) and need
to parse each line looking for a string, and when encountered push that
line to a new file.  I was thinking I could have PHP read in the whole
file, but thinking it could be a major pain since I have about 20 log
files to read through.

Anyone have some suggestions?
 
 
 Is this on a Linux server?
 
 Why don’t you use grep?
 
 cat filename | grep string  newfile
 
 see man grep for detail on grep. (It uses regular expressions)
 
 Albert
 

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Parsing a large file

2006-01-13 Thread Stut

Wolf wrote:


Windows server, though I may dump it to linux to get my smaller file,
however not sure my admin would like that.  :)
 

Get a Windows build of grep (and other useful stuff) here: 
http://unxutils.sourceforge.net/



Albert wrote:
 


cat filename | grep string  newfile
   

Why cat? Sorry, but this is one of my pet hates! The following does the 
same but in one process instead of two.


   grep [string] [filename]  [newfile]

-Stut

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Parsing a large file

2006-01-13 Thread Richard Correia
best way I think is

nohup grep -i string log1 log2 log3 ...logx  newfile 

This will run this command in background and you can work on other
meanwhile.

Thanks
Richard


On 1/13/06, Wolf [EMAIL PROTECTED] wrote:

 I have large log files from a web server (about a gig in size) and need
 to parse each line looking for a string, and when encountered push that
 line to a new file.  I was thinking I could have PHP read in the whole
 file, but thinking it could be a major pain since I have about 20 log
 files to read through.

 Anyone have some suggestions?

 Thanks,
 Robert

 --
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php




Re: [PHP] Parsing a large file

2006-01-13 Thread Jay Paulson
 I have large log files from a web server (about a gig in size) and need
 to parse each line looking for a string, and when encountered push that
 line to a new file.  I was thinking I could have PHP read in the whole
 file, but thinking it could be a major pain since I have about 20 log
 files to read through.
 
 Anyone have some suggestions?
 
 Thanks,
 Robert

I'm actually in the process of doing the exact same thing!  If you search on
the list you'll see some of my emails.  But to help you out here's what I've
got so far. :)

Since you are dealing with such huge files you'll want to read them a little
at a time as to not to use too much system memory all at once.  The fgets()
reads a file line by line.  So you read a few lines and then process those
lines and then move on. :)

Hope this helps get you started!

// open log file for reading
if (!$fhandle = fopen($path.$log_file_name, r)) {
echo couldn't open $file_name for writing!;
die;
}

$i = 0;
$buf = ;
while (!feof($fhandle)) {
$buf[] = fgets($fhandle);
if ($i++ % 10 == 0) {
// process buff here do all the regex and what not
// and get the line for
// the new text file to be loaded into the database
// haven't written this yet

// write to a file in the directory this runs in.
// this file will be used to load data into a mysql
// database to run queries on.

// empty buff out to remove it from system memory
unset($buf);
}
}

fclose($fhandle);

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP] Parsing a large file

2006-01-13 Thread Richard Lynch
On Fri, January 13, 2006 8:37 am, Albert wrote:
 Wolf wrote:
 I have large log files from a web server (about a gig in size) and
 need
 to parse each line looking for a string, and when encountered push
 that
 line to a new file.  I was thinking I could have PHP read in the
 whole
 file, but thinking it could be a major pain since I have about 20
 log
 files to read through.

 Anyone have some suggestions?

 Is this on a Linux server?

 Why don’t you use grep?

 cat filename | grep string  newfile

If you DO use grep, don't cat the whole file out to grep it...

grep __filename__  __newfile__

cat on a 1 GIG file is probably a bit wasteful, I suspect...

-- 
Like Music?
http://l-i-e.com/artists.htm

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Parsing a large file

2006-01-13 Thread Richard Lynch
On Fri, January 13, 2006 3:33 pm, Jay Paulson wrote:
 $buf = ;

Probably better to initialize it to an empty array();...

 while (!feof($fhandle)) {
 $buf[] = fgets($fhandle);

... since you are going to initialize it to an array here anyway.

 if ($i++ % 10 == 0) {

Buffering 10 lines of text in PHP is probably not going to make a
significant difference...

You'll have to test on your hardware to confirm, but between:

1. Low-level disk IDE buffer
2. Operating System disk cache buffers
3. C code of PHP source disk cache buffers

your PHP 10-line buffer in an array
is probably more overhead, and much more complicted code to maintain,
with no significant benefit.

-- 
Like Music?
http://l-i-e.com/artists.htm

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Parsing a large file

2006-01-13 Thread Jay Paulson
 On Fri, January 13, 2006 3:33 pm, Jay Paulson wrote:
 $buf = ;
 
 Probably better to initialize it to an empty array();...

Yep right.
 
 while (!feof($fhandle)) {
 $buf[] = fgets($fhandle);
 
 ... since you are going to initialize it to an array here anyway.
 
 if ($i++ % 10 == 0) {
 
 Buffering 10 lines of text in PHP is probably not going to make a
 significant difference...

This is true.  It's what I have written to start with.  Basically I'm just
trying to make sure that I'm not hogging system memory with a huge file b/c
there are other apps running at the same time that need system resources as
well.  That's the main reason why I'm using a buffer to read the file in and
parse it a little at a time.  By all means test it out on your hardware and
see what that buffer needs to be.
 
 You'll have to test on your hardware to confirm, but between:
 
 1. Low-level disk IDE buffer
 2. Operating System disk cache buffers
 3. C code of PHP source disk cache buffers
 
 your PHP 10-line buffer in an array
 is probably more overhead, and much more complicted code to maintain,
 with no significant benefit.

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Parsing a large file

2006-01-13 Thread Richard Lynch
On Fri, January 13, 2006 4:47 pm, Jay Paulson wrote:
 Buffering 10 lines of text in PHP is probably not going to make a
 significant difference...

 This is true.  It's what I have written to start with.  Basically I'm
 just
 trying to make sure that I'm not hogging system memory with a huge
 file b/c
 there are other apps running at the same time that need system
 resources as
 well.  That's the main reason why I'm using a buffer to read the file
 in and
 parse it a little at a time.  By all means test it out on your
 hardware and
 see what that buffer needs to be.

I'm not saying not to read it a little at a time.

I'm saying 1 line at a time, using the most natural code, is PROBABLY
at least as fast as, if not faster than, the 10-line buffer version
posted.

And the 1-line buffer of PHP fgets() is far easier to maintain.

So unless you've got test data to prove the 10-line buffer helps,
throw it out, and just use fgets() 1-line buffer.

:-)

-- 
Like Music?
http://l-i-e.com/artists.htm

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Parsing a large file

2006-01-13 Thread Curt Zirzow
On Fri, Jan 13, 2006 at 04:21:10PM -0600, Richard Lynch wrote:
 
 If you DO use grep, don't cat the whole file out to grep it...
 
 grep __filename__  __newfile__
 
oops, forgot the expression :)

 grep findthis __filename__  __newfile__



-- 
cat .signature: No such file or directory

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Parsing a large file

2006-01-13 Thread Curt Zirzow
On Fri, Jan 13, 2006 at 04:47:11PM -0600, Jay Paulson wrote:
  On Fri, January 13, 2006 3:33 pm, Jay Paulson wrote:
  $buf = ;
  
  Probably better to initialize it to an empty array();...
 
 Yep right.
  
  while (!feof($fhandle)) {
  $buf[] = fgets($fhandle);
  
  ... since you are going to initialize it to an array here anyway.
  
  if ($i++ % 10 == 0) {
  
  Buffering 10 lines of text in PHP is probably not going to make a
  significant difference...
 
 This is true.  It's what I have written to start with.  Basically I'm just
 trying to make sure that I'm not hogging system memory with a huge file b/c
 there are other apps running at the same time that need system resources as
 well.  That's the main reason why I'm using a buffer to read the file in and
 parse it a little at a time.  By all means test it out on your hardware and
 see what that buffer needs to be.

I'd tend to go with Richard's suggestion. You say you are worried
about resources and memory? well when you load those 10 lines of
code where do they go? memory.

if resource and memory is an issue, there are a couple of options i
would suggest, being that the bottleneck is really disk I/O and cpu
usage.

  1) inside the loop (while reading one line at a time) do a
 usleep(), this will prevent heavy disk access and let the cpu
 catchup with processing

  2) 'nice' the application. run php under nice and give its cpu
 usage a lower priority of cpu processing time.

If you want to test how usleep and the 'nice' thing works here are
some sample scripts to benchmark with:

// cpu usage try with and without nice
  while (1) {}
vs.
  while(1) { usleep(500); }

//diskio, try with and without nice
  $fp = fopen('/var/log/messages', 'r') or die('boo');
  while(1) {
$line = fgets($fp);
fseek($fp, 0, SEEK_SET);
  }
vs.
  $fp = fopen('/var/log/messages', 'r') or die('boo');
  while(1) {
$line = fgets($fp);
fseek($fp, 0, SEEK_SET);
usleep(500);
  }

Like Richard said, there are much easier ways to make the app less
resource intensive instead of trying to battle io between memory
and cpu, within php.

Curt.
-- 
cat .signature: No such file or directory

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php