Antonio de la Fuente wrote:
* bob gailer <bgai...@gmail.com> [2009-11-17 15:26:20 -0500]:

Date: Tue, 17 Nov 2009 15:26:20 -0500
From: bob gailer <bgai...@gmail.com>
To: Antonio de la Fuente <t...@muybien.org>
CC: Python Tutor mailing list <tutor@python.org>
Subject: Re: [Tutor] Introduction - log exercise
User-Agent: Thunderbird 2.0.0.23 (Windows/20090812)
Message-ID: <4b0306ec.8000...@gmail.com>

Antonio de la Fuente wrote:
Hi everybody,

This is my first post here. I have started learning python and I am new to
programing, just some bash scripting, no much. Thank you for the
kind support and help that you provide in this list.

This is my problem: I've got a log file that is filling up very quickly, this
log file is made of blocks separated by a blank line, inside these blocks there
is a line "foo", I want to discard blocks with that line inside it, and create a
new log file, without those blocks, that will reduce drastically the size of the
log file.

The log file is gziped, so I am going to use gzip module, and I am going to pass
the log file as an argument, so sys module is required as well.

I will read lines from file, with the 'for loop', and then I will check them for
'foo' matches with a 'while loop', if matches I (somehow) re-initialise the
list, and if there is no matches for foo, I will append line to the list. When I
get to a blank line (end of block), write myList to an external file. And start
with another line.

I am stuck with defining 'blank line', I don't manage to get throught the while
loop, any hint here I will really appreciate it.
I don't expect the solution, as I think this is a great exercise to get wet
with python, but if anyone thinks that this is the wrong way of solving the
problem, please let me know.


#!/usr/bin/python

import sys
import gzip

myList = []

# At the moment not bother with argument part as I am testing it with a
# testing log file
#fileIn = gzip.open(sys.argv[1])

fileIn = gzip.open('big_log_file.gz', 'r')
fileOut = open('outputFile', 'a')

for line in fileIn:
   while line != 'blank_line':
       if line == 'foo':
           Somehow re-initialise myList
            break
       else:
           myList.append(line)
   fileOut.writelines(myList)
Observations:
0 - The other responses did not understand your desire to drop any
paragraph containing 'foo'.

Yes, paragraph == block, that's it

1 - The while loop will run forever, as it keeps processing the same line.

Because the tabs in the line with foo?!

No - because within the loop there is nothing reading the next line of the file!
2 - In your sample log file the line with 'foo' starts with a tab.
line == 'foo' will always be false.

So I need first to get rid of those tabs, right? I can do that with
line.strip(), but then I need the same formatting for the fileOut.

3 - Is the first line in the file Tue Nov 17 16:11:47 GMT 2009 or blank?

First line is Tue Nov 17 16:11:47 GMT 2009

4 - Is the last line blank?

last line is blank.

Better logic:

I would have never thought this way of solving the problem. Interesting.
# open files
paragraph = []
keep = True
for line in fileIn:
if line.isspace(): # end of paragraph

Aha! finding the blank line

   if keep:
     outFile.writelines(paragraph)
   paragraph = []

This is what I called re-initialising the list.

   keep = True
 else:
   if keep:
     if line == '\tfoo':
       keep = False
     else:
       paragraph.append(line)
# anticipating last line not blank, write last paragraph
if keep:
  outFile.writelines(paragraph)

# use shutil to rename

Thank you.

--
Bob Gailer
Chapel Hill NC
919-636-4239



--
Bob Gailer
Chapel Hill NC
919-636-4239
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to