On Aug 20, 5:34 am, Dave Angel <da...@ieee.org> wrote: > m_ahlenius wrote: > > Hi, > > > I am relatively new to doing serious work in python. I am using it to > > access a large number of log files. Some of the logs get corrupted > > and I need to detect that when processing them. This code seems to > > work for quite a few of the logs (all same structure) It also > > correctly identifies some corrupt logs but then it identifies others > > as being corrupt when they are not. > > > example error msg from below code: > > > Could not open the log file: '/disk/7-29-04-02-01.console.log.tar.gz' > > Exception: CRC check\ > > failed 0x8967e931 != 0x4e5f1036L > > > When I manually examine the supposed corrupt log file and use > > "tar -xzvof /disk/7-29-04-02-01.console.log.tar.gz " on it, it opens > > just fine. > > > Is there anything wrong with how I am using this module? (extra code > > removed for clarity) > > > if tarfile.is_tarfile( file ): > > try: > > xf = tarfile.open( file, "r:gz" ) > > for locFile in xf: > > logfile = xf.extractfile( locFile ) > > validFileFlag = True > > # iterate through each log file, grab the first and > > the last lines > > lines = iter( logfile ) > > firstLine = lines.next() > > for nextLine in lines: > > .... > > continue > > > logfile.close() > > ... > > xf.close() > > except Exception, e: > > validFileFlag = False > > msg = "\nCould not open the log file: " + repr(file) + " > > Exception: " + str(e) + "\n" > > else: > > validFileFlag = False > > lTime = extractFileNameTime( file ) > > msg = ">>>>>>> Warning " + file + " is NOT a valid tar archive > > \n" > > print msg > > I haven't used tarfile, but this feels like a problem with the Win/Unix > line endings. I'm going to assume you're running on Windows, which > could trigger the problem I'm going to describe. > > You use 'file' to hold something, but don't show us what. In fact, it's > a lousy name, since it's already a Python builtin. But if it's holding > fileobj, that you've separately opened, then you need to change that > open to use mode 'rb' > > The problem, if I've guessed right, is that occasionally you'll > accidentally encounter a 0d0a sequence in the middle of the (binary) > compressed data. If you're on Windows, and use the default 'r' mode, > it'll be changed into a 0a byte. Thus corrupting the checksum, and > eventually the contents. > > DaveA
Hi, thanks for the comments - I'll change the variable name. I am running this on linux so don't think its a Windows issue. So if that's the case is the 0d0a still an issue? 'mark -- http://mail.python.org/mailman/listinfo/python-list