On 11/27/2014 05:55 PM, Dave Angel wrote:
On 11/27/2014 04:01 PM, Albert-Jan Roskam wrote:


         for line in self.data:
             if not line:
                 break
             local_lookup.append(record_start)
             if len(local_lookup) > 100:
                 self.lookup.extend(local_lookup)
                 local_lookup = []
             record_start += len(line)
         print(len(local_lookup))

I still have to emphasize that record_start is just wrong.  You must use
ftell() if you're planning to use fseek() on a text file.

You can also probably speed the process up  a good deal by passing the
filename to the other process, rather than opening the file in the
original process.  That will eliminate sharing the self.data across the
process boundary.


To emphasize again, in version 3:


https://docs.python.org/3.4/tutorial/inputoutput.html#methods-of-file-objects

"""In text files (those opened without a b in the mode string), only seeks relative to the beginning of the file are allowed (the exception being seeking to the very file end with seek(0, 2)) and the only valid offset values are those returned from the f.tell(), or zero. Any other offset value produces undefined behaviour."""

All the discussion about byte-compatible, ASCII equivalent, etc. is besides the point. (Although I'm surprised nobody has pointed out that in Windows, a newline is two bytes long even if the file is entirely ASCII.) If you want to seek() later, then use tell() now. In a binary open, there may be other ways, but in a text file...

Perhaps the reason you're resisting it is you're assuming that tell() is slow. It's not. it's probably faster than trying to sum the bytes the way you're doing.

--
DaveA
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Reply via email to