<[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] >I want to scan a file byte for byte for occurences of the the four byte > pattern 0x00000100. I've tried with this: > > # start > import sys > > numChars = 0 > startCode = 0 > count = 0 > > inputFile = sys.stdin > > while True: > ch = inputFile.read(1) > numChars += 1 > > if len(ch) < 1: break > > startCode = ((startCode << 8) & 0xffffffff) | (ord(ch)) > if numChars < 4: continue > > if startCode == 0x00000100: > count = count + 1 > > print count > # end > > But it is very slow. What is the fastest way to do this? Using some > native call? Using a buffer? Using whatever? > > /David
Here is an attempt at counting and using the mmap facility. There appear to be some serious backward compatibility issues. I tried Python 2.1 on Windows and AIX and had some odd results. If you are 2.4.1 or higher that should not be a problem. #!/usr/bin/env python import sys import os import mmap fn = 't.dat' ss = '\x00\x00\x01\x00' fp = open(fn, 'rb') b = mmap.mmap(fp.fileno(), os.stat(fp.name).st_size, access=mmap.ACCESS_READ) count = 0 foundpoint = b.find(ss, 0) while foundpoint != -1 and (foundpoint + 1) < b.size(): count = count + 1 foundpoint = b.find(ss, foundpoint + 1) b.close() print count fp.close() sys.exit(0) -- http://mail.python.org/mailman/listinfo/python-list