On Aug 20, 6:57 am, m_ahlenius <ahleni...@gmail.com> wrote: > On Aug 20, 5:34 am, Dave Angel <da...@ieee.org> wrote: > > > > > > > m_ahlenius wrote: > > > Hi, > > > > I am relatively new to doing serious work in python. I am using it to > > > access a large number of log files. Some of the logs get corrupted > > > and I need to detect that when processing them. This code seems to > > > work for quite a few of the logs (all same structure) It also > > > correctly identifies some corrupt logs but then it identifies others > > > as being corrupt when they are not. > > > > example error msg from below code: > > > > Could not open the log file: '/disk/7-29-04-02-01.console.log.tar.gz' > > > Exception: CRC check\ > > > failed 0x8967e931 != 0x4e5f1036L > > > > When I manually examine the supposed corrupt log file and use > > > "tar -xzvof /disk/7-29-04-02-01.console.log.tar.gz " on it, it opens > > > just fine. > > > > Is there anything wrong with how I am using this module? (extra code > > > removed for clarity) > > > > if tarfile.is_tarfile( file ): > > > try: > > > xf = tarfile.open( file, "r:gz" ) > > > for locFile in xf: > > > logfile = xf.extractfile( locFile ) > > > validFileFlag = True > > > # iterate through each log file, grab the first and > > > the last lines > > > lines = iter( logfile ) > > > firstLine = lines.next() > > > for nextLine in lines: > > > .... > > > continue > > > > logfile.close() > > > ... > > > xf.close() > > > except Exception, e: > > > validFileFlag = False > > > msg = "\nCould not open the log file: " + repr(file) + " > > > Exception: " + str(e) + "\n" > > > else: > > > validFileFlag = False > > > lTime = extractFileNameTime( file ) > > > msg = ">>>>>>> Warning " + file + " is NOT a valid tar archive > > > \n" > > > print msg > > > I haven't used tarfile, but this feels like a problem with the Win/Unix > > line endings. I'm going to assume you're running on Windows, which > > could trigger the problem I'm going to describe. > > > You use 'file' to hold something, but don't show us what. In fact, it's > > a lousy name, since it's already a Python builtin. But if it's holding > > fileobj, that you've separately opened, then you need to change that > > open to use mode 'rb' > > > The problem, if I've guessed right, is that occasionally you'll > > accidentally encounter a 0d0a sequence in the middle of the (binary) > > compressed data. If you're on Windows, and use the default 'r' mode, > > it'll be changed into a 0a byte. Thus corrupting the checksum, and > > eventually the contents. > > > DaveA > > Hi, > > thanks for the comments - I'll change the variable name. > > I am running this on linux so don't think its a Windows issue. So if > that's the case > is the 0d0a still an issue? > > 'mark
Oh and what's stored currently in The file var us just the unopened pathname to the Target file I want to open -- http://mail.python.org/mailman/listinfo/python-list