On 11/27/2014 05:55 PM, Dave Angel wrote:
On 11/27/2014 04:01 PM, Albert-Jan Roskam wrote:
for line in self.data:
if not line:
break
local_lookup.append(record_start)
if len(local_lookup) > 100:
self.lookup.extend(local_lookup)
local_lookup = []
record_start += len(line)
print(len(local_lookup))
I still have to emphasize that record_start is just wrong. You must use
ftell() if you're planning to use fseek() on a text file.
You can also probably speed the process up a good deal by passing the
filename to the other process, rather than opening the file in the
original process. That will eliminate sharing the self.data across the
process boundary.
To emphasize again, in version 3:
https://docs.python.org/3.4/tutorial/inputoutput.html#methods-of-file-objects
"""In text files (those opened without a b in the mode string), only
seeks relative to the beginning of the file are allowed (the exception
being seeking to the very file end with seek(0, 2)) and the only valid
offset values are those returned from the f.tell(), or zero. Any other
offset value produces undefined behaviour."""
All the discussion about byte-compatible, ASCII equivalent, etc. is
besides the point. (Although I'm surprised nobody has pointed out that
in Windows, a newline is two bytes long even if the file is entirely
ASCII.) If you want to seek() later, then use tell() now. In a binary
open, there may be other ways, but in a text file...
Perhaps the reason you're resisting it is you're assuming that tell() is
slow. It's not. it's probably faster than trying to sum the bytes the
way you're doing.
--
DaveA
_______________________________________________
Tutor maillist - Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor