"Anders J. Munch" <[EMAIL PROTECTED]> wrote: > Josiah Carlson wrote: > > "Anders J. Munch" <[EMAIL PROTECTED]> wrote: > > > I don't expect file methods and systems calls to map one to one, but > > > you're right, the first time the length is needed, that's an extra > > > system call. > > > > Every time the length is needed, a system call is required > > (you can have > > multiple writers of the same file)... > > Point taken. It's very rarely a good idea to do so, but the > possibility of multiple writers shouldn't be ignored. Still there is > no real performance issue. If anything, replacing > f.seek(0,2);f.tell() with f.length in various places might save a few > system calls.
Any sane person uses os.stat(f.name) or os.fstat(f.fileno()), unless they want to seek to the end of the file for later writing or expected reading of data yet-to-be-written. Interesting that both of these cases basically read and write to the same file at the same time (perhaps even in the same process), something you yourself said, "In all my programming days I don't believe I written to and read from the same file handle even once. Use cases exist, like if you're implementing a DBMS..." > > Flushing during seek is important. By not flushing during > > seek in your > > FileBytes object, you are unnecessarily delaying writes, which could > > cause file corruption. > > That's what the flush method is for. The real reason seek implies > flush is to save the library author the bother of getting the > interactions between input and output buffering right. > Anyway, FileBytes has no seek and no concept of current file position, > so I really don't know what you're talking about :) I was talking about your earlier statement, which I quoted in my earlier reply to you: > My micro-optimisation circuitry blew a fuse when I discovered that > seek always implies flush. You won't get good performance out of code > that does a lot of seeks, whatever you do. Use my upcoming FileBytes > class :) And with the context of a previous message from you: > FileBytes would support the sequence protocol, mimicking bytes objects. > It would support random-access read and write using __getitem__ and > __setitem__, allowing slice assignment for slices of equal size. And > there would be append() to extend the file, and partial __delitem__ > support for truncating. While it doesn't have the methods seek or tell, the underlying implementation needs to use seek and tell (or a memory-mapped file, mmap). You were also talking about buffering writes to reduce the overhead of the underlying seeks and tells because of apparent "optimizations" you wanted to make. Here is a data integrity optimization you can make for me: flush when accessing the file non-sequentially, any other behavior could corrupt the data of users who have been relying on "seek implies flush". I would also mention that your FileBytes class is essentially a fake memory-mapped file, and while I also have implemented an equivalent class (for low-memory testing purposes in a DBMS-like situation), I find that using an mmap to be far faster and generally more reliable (and usable with buffer()) than my FileBytes equivalent, never mind that the vast majority of users don't want a sequence interface to a file, they want a stream interface; which is why you don't see many FileBytes-like objects out in the wild, or really anyone suggesting such a wrapper object be in the standard library. With that said, I'm not sure your FileBytes object is really necessary or desired for the future io library. If people want that kind of an interface, they can use mmap (and push for the various mmap bugs/feature requests to be fixed), otherwise they should be using readable / writable / both streams, something that Tomer has been working towards. - Josiah _______________________________________________ Python-3000 mailing list [email protected] http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
