[PHP] Parsing a large file
I have large log files from a web server (about a gig in size) and need to parse each line looking for a string, and when encountered push that line to a new file. I was thinking I could have PHP read in the whole file, but thinking it could be a major pain since I have about 20 log files to read through. Anyone have some suggestions? Thanks, Robert -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP] Parsing a large file
Wolf wrote: I have large log files from a web server (about a gig in size) and need to parse each line looking for a string, and when encountered push that line to a new file. I was thinking I could have PHP read in the whole file, but thinking it could be a major pain since I have about 20 log files to read through. Anyone have some suggestions? Is this on a Linux server? Why don’t you use grep? cat filename | grep string newfile see man grep for detail on grep. (It uses regular expressions) Albert -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.1.371 / Virus Database: 267.14.17/228 - Release Date: 2006/01/12 -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Parsing a large file
Windows server, though I may dump it to linux to get my smaller file, however not sure my admin would like that. :) Albert wrote: Wolf wrote: I have large log files from a web server (about a gig in size) and need to parse each line looking for a string, and when encountered push that line to a new file. I was thinking I could have PHP read in the whole file, but thinking it could be a major pain since I have about 20 log files to read through. Anyone have some suggestions? Is this on a Linux server? Why don’t you use grep? cat filename | grep string newfile see man grep for detail on grep. (It uses regular expressions) Albert -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Parsing a large file
Wolf wrote: Windows server, though I may dump it to linux to get my smaller file, however not sure my admin would like that. :) Get a Windows build of grep (and other useful stuff) here: http://unxutils.sourceforge.net/ Albert wrote: cat filename | grep string newfile Why cat? Sorry, but this is one of my pet hates! The following does the same but in one process instead of two. grep [string] [filename] [newfile] -Stut -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Parsing a large file
best way I think is nohup grep -i string log1 log2 log3 ...logx newfile This will run this command in background and you can work on other meanwhile. Thanks Richard On 1/13/06, Wolf [EMAIL PROTECTED] wrote: I have large log files from a web server (about a gig in size) and need to parse each line looking for a string, and when encountered push that line to a new file. I was thinking I could have PHP read in the whole file, but thinking it could be a major pain since I have about 20 log files to read through. Anyone have some suggestions? Thanks, Robert -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Parsing a large file
I have large log files from a web server (about a gig in size) and need to parse each line looking for a string, and when encountered push that line to a new file. I was thinking I could have PHP read in the whole file, but thinking it could be a major pain since I have about 20 log files to read through. Anyone have some suggestions? Thanks, Robert I'm actually in the process of doing the exact same thing! If you search on the list you'll see some of my emails. But to help you out here's what I've got so far. :) Since you are dealing with such huge files you'll want to read them a little at a time as to not to use too much system memory all at once. The fgets() reads a file line by line. So you read a few lines and then process those lines and then move on. :) Hope this helps get you started! // open log file for reading if (!$fhandle = fopen($path.$log_file_name, r)) { echo couldn't open $file_name for writing!; die; } $i = 0; $buf = ; while (!feof($fhandle)) { $buf[] = fgets($fhandle); if ($i++ % 10 == 0) { // process buff here do all the regex and what not // and get the line for // the new text file to be loaded into the database // haven't written this yet // write to a file in the directory this runs in. // this file will be used to load data into a mysql // database to run queries on. // empty buff out to remove it from system memory unset($buf); } } fclose($fhandle); -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP] Parsing a large file
On Fri, January 13, 2006 8:37 am, Albert wrote: Wolf wrote: I have large log files from a web server (about a gig in size) and need to parse each line looking for a string, and when encountered push that line to a new file. I was thinking I could have PHP read in the whole file, but thinking it could be a major pain since I have about 20 log files to read through. Anyone have some suggestions? Is this on a Linux server? Why dont you use grep? cat filename | grep string newfile If you DO use grep, don't cat the whole file out to grep it... grep __filename__ __newfile__ cat on a 1 GIG file is probably a bit wasteful, I suspect... -- Like Music? http://l-i-e.com/artists.htm -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Parsing a large file
On Fri, January 13, 2006 3:33 pm, Jay Paulson wrote: $buf = ; Probably better to initialize it to an empty array();... while (!feof($fhandle)) { $buf[] = fgets($fhandle); ... since you are going to initialize it to an array here anyway. if ($i++ % 10 == 0) { Buffering 10 lines of text in PHP is probably not going to make a significant difference... You'll have to test on your hardware to confirm, but between: 1. Low-level disk IDE buffer 2. Operating System disk cache buffers 3. C code of PHP source disk cache buffers your PHP 10-line buffer in an array is probably more overhead, and much more complicted code to maintain, with no significant benefit. -- Like Music? http://l-i-e.com/artists.htm -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Parsing a large file
On Fri, January 13, 2006 3:33 pm, Jay Paulson wrote: $buf = ; Probably better to initialize it to an empty array();... Yep right. while (!feof($fhandle)) { $buf[] = fgets($fhandle); ... since you are going to initialize it to an array here anyway. if ($i++ % 10 == 0) { Buffering 10 lines of text in PHP is probably not going to make a significant difference... This is true. It's what I have written to start with. Basically I'm just trying to make sure that I'm not hogging system memory with a huge file b/c there are other apps running at the same time that need system resources as well. That's the main reason why I'm using a buffer to read the file in and parse it a little at a time. By all means test it out on your hardware and see what that buffer needs to be. You'll have to test on your hardware to confirm, but between: 1. Low-level disk IDE buffer 2. Operating System disk cache buffers 3. C code of PHP source disk cache buffers your PHP 10-line buffer in an array is probably more overhead, and much more complicted code to maintain, with no significant benefit. -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Parsing a large file
On Fri, January 13, 2006 4:47 pm, Jay Paulson wrote: Buffering 10 lines of text in PHP is probably not going to make a significant difference... This is true. It's what I have written to start with. Basically I'm just trying to make sure that I'm not hogging system memory with a huge file b/c there are other apps running at the same time that need system resources as well. That's the main reason why I'm using a buffer to read the file in and parse it a little at a time. By all means test it out on your hardware and see what that buffer needs to be. I'm not saying not to read it a little at a time. I'm saying 1 line at a time, using the most natural code, is PROBABLY at least as fast as, if not faster than, the 10-line buffer version posted. And the 1-line buffer of PHP fgets() is far easier to maintain. So unless you've got test data to prove the 10-line buffer helps, throw it out, and just use fgets() 1-line buffer. :-) -- Like Music? http://l-i-e.com/artists.htm -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Parsing a large file
On Fri, Jan 13, 2006 at 04:21:10PM -0600, Richard Lynch wrote: If you DO use grep, don't cat the whole file out to grep it... grep __filename__ __newfile__ oops, forgot the expression :) grep findthis __filename__ __newfile__ -- cat .signature: No such file or directory -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Parsing a large file
On Fri, Jan 13, 2006 at 04:47:11PM -0600, Jay Paulson wrote: On Fri, January 13, 2006 3:33 pm, Jay Paulson wrote: $buf = ; Probably better to initialize it to an empty array();... Yep right. while (!feof($fhandle)) { $buf[] = fgets($fhandle); ... since you are going to initialize it to an array here anyway. if ($i++ % 10 == 0) { Buffering 10 lines of text in PHP is probably not going to make a significant difference... This is true. It's what I have written to start with. Basically I'm just trying to make sure that I'm not hogging system memory with a huge file b/c there are other apps running at the same time that need system resources as well. That's the main reason why I'm using a buffer to read the file in and parse it a little at a time. By all means test it out on your hardware and see what that buffer needs to be. I'd tend to go with Richard's suggestion. You say you are worried about resources and memory? well when you load those 10 lines of code where do they go? memory. if resource and memory is an issue, there are a couple of options i would suggest, being that the bottleneck is really disk I/O and cpu usage. 1) inside the loop (while reading one line at a time) do a usleep(), this will prevent heavy disk access and let the cpu catchup with processing 2) 'nice' the application. run php under nice and give its cpu usage a lower priority of cpu processing time. If you want to test how usleep and the 'nice' thing works here are some sample scripts to benchmark with: // cpu usage try with and without nice while (1) {} vs. while(1) { usleep(500); } //diskio, try with and without nice $fp = fopen('/var/log/messages', 'r') or die('boo'); while(1) { $line = fgets($fp); fseek($fp, 0, SEEK_SET); } vs. $fp = fopen('/var/log/messages', 'r') or die('boo'); while(1) { $line = fgets($fp); fseek($fp, 0, SEEK_SET); usleep(500); } Like Richard said, there are much easier ways to make the app less resource intensive instead of trying to battle io between memory and cpu, within php. Curt. -- cat .signature: No such file or directory -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php