I just uploaded Patch 1671314 to SourceForge with a C implementation of a Raw File I/O type, along with unit tests. It still needs work (especially for supporting very large files and non-unixy systems), but should serve as a very good starting point.
On 2/26/07, Mike Verdone <[EMAIL PROTECTED]> wrote: > Hi all, > > Daniel Stutzbach and I have prepared a draft PEP for the new IO system > for Python 3000. This document is, hopefully, true to the info that > Guido wrote on the whiteboards here at PyCon. This is still a draft > and there's quite a few decisions that need to be made. Feedback is > welcomed. > > We've published it on Google Docs here: > http://docs.google.com/Doc?id=dfksfvqd_1cn5g5m > > What follows is a plaintext version. > > Thanks, > > Mike. > > > PEP: XXX > Title: New IO > Version: > Last-Modified: > Authors: Daniel Stutzbach, Mike Verdone > Status: Draft > Type: > Created: 26-Feb-2007 > > Rationale and Goals > Python allows for a variety of file-like objects that can be worked > with via bare read() and write() calls using duck typing. Anything > that provides read() and write() is stream-like. However, more exotic > and extremely useful functions like readline() or seek() may or may > not be available on a file-like object. Python needs a specification > for basic byte-based IO streams to which we can add buffering and > text-handling features. > > Once we have a defined raw byte-based IO interface, we can add > buffering and text-handling layers on top of any byte-based IO class. > The same buffering and text handling logic can be used for files, > sockets, byte arrays, or custom IO classes developed by Python > programmers. Developing a standard definition of a stream lets us > separate stream-based operations like read() and write() from > implementation specific operations like fileno() and isatty(). It > encourages programmers to write code that uses streams as streams and > not require that all streams support file-specific or socket-specific > operations. > > The new IO spec is intended to be similar to the Java IO libraries, > but generally less confusing. Programmers who don't want to muck about > in the new IO world can expect that the open() factory method will > produce an object backwards-compatible with old-style file objects. > Specification > The Python I/O Library will consist of three layers: a raw I/O layer, > a buffer I/O layer, and a text I/O layer. Each layer is defined by an > abstract base class, which may have multiple implementations. The raw > I/O and buffer I/O layers deal with units of bytes, while the text I/O > layer deals with units of characters. > Raw I/O > The abstract base class for raw I/O is RawIOBase. It has several > methods which are wrappers around the appropriate operating system > call. If one of these functions would not make sense on the object, > the implementation must raise an IOError exception. For example, if a > file is opened read-only, the .write() method will raise an IOError. > As another example, if the object represents a socket, then .seek(), > .tell(), and .truncate() will raise an IOError. > > .read() > .write() > .seek() > .tell() > .truncate() > .close() > > Additionally, it defines a few other methods: > > (should these "is_" functions be attributes instead? > "file.readable == True") > > .is_readable() > > Returns True if the object was opened for reading, False > otherwise. If False, .read() will raise an IOError if called. > > .is_writable() > > Returns True if the object was opened write writing, False > otherwise. If False, .write() and .truncate() will raise an IOError > if called. > > .is_seekable() (Should this be called .is_random()? or > .is_sequential() with opposite return values?) > > Returns True if the object supports random-access (such as disk > files), or False if the object only supports sequential access (such > as sockets, pipes, and ttys). If False, .seek(), .tell(), and > .truncate() will raise an IOError if called. > > Iff a RawIOBase implementation operates on an underlying file > descriptor, it must additionally provide a .fileno() member function. > This could be defined specifically by the implementation, or a mix-in > class could be used (Need to decide about this). > > .fileno() > > Returns the underlying file descriptor (an integer) > > Initially, three implementations will be provided that implement the > RawIOBase interface: FileIO, SocketIO, and ByteIO (also MMapIO?). > Each implementation must determine whether the object supports random > access as the information provided by the user may not be sufficient > (consider open("/dev/tty", "rw") or open("/tmp/named-pipe", "rw"). As > an example, FileIO can determine this by calling the seek() system > call; if it returns an error, the object does not support random > access. Each implementation may provided additional methods > appropriate to its type. The ByteIO object is analogous to Python 2's > cStringIO library, but operating on the new bytes type instead of > strings. > Buffered I/O > The next layer is the Buffer I/O layer which provides more efficient > access to file-like objects. The abstract base class for all Buffered > I/O implementations is BufferedIOBase, which provides similar methods > to RawIOBase: > > .read() > .write() > .seek() > .tell() > .truncate() > .close() > .is_readable() > .is_writable() > .is_seekable() > > Additionally, the abstract base class provides one member variable: > > .raw > > Provides a reference to the underling RawIOBase object. > > The BufferIOBase methods' syntax is identical to that of RawIOBase, > but may have different semantics. In particular, BufferIOBase > implementations may read more data than requested or delay writing > data using buffers. For the most part, this will be transparent to > the user (unless, for example, they open the same file through a > different descriptor). > > There are four implementations of the BufferIOBase abstract base > class, described below. > BufferedReader > The BufferedReader implementation is for sequential-access read-only > objects. It does not provide a .flush() method, since there is no > sensible circumstance where the user would want to discard the read > buffer. > BufferedWriter > The BufferedWriter implementation is for sequential-access write-only > objects. It provides a .flush() method, which forces all cached data > to be written to the underlying RawIOBase object. > BufferedRWPair > The BufferRWPair implementation is for sequential-access read-write > objects such as sockets and ttys. As the read and write streams of > these objects are completely independent, it could be implemented by > simply incorporating a BufferedReader and BufferedWriter instance. It > provides a .flush() method that has the same semantics as a > BufferWriter's .flush() method. > BufferedRandom > The BufferRandom implementation is for all random-access objects, > whether they are read-only, write-only, or read-write. Compared to > the previous classes that operate on sequential-access objects, the > BufferedRandom class must contend with the user calling .seek() to > reposition the stream. Therefore, an instance of BufferRandom must > keep track of both the logical and true position within the object. > It provides a .flush() method that forces all cached write data to be > written to the underlying RawIOBase object and all cached read data to > be forgotten (so that future reads are forced to go back to the disk). > > Q: Do we want to mandate in the specification that switching between > reading to writing on a read-write object implies a .flush()? Or is > that an implementation convenience that users should not rely on? > > For a read-only BufferRandom object, .is_writable() returns False and > the .write() and .truncate() methods throw IOError. > > For a write-only BufferRandom object, .is_readable() returns False and > the .read() method throws IOError. > Text I/O > The text I/O layer provides functions to read and write strings from > streams. Some new features include universal newlines and character > set encoding and decoding. The Text I/O layer is defined by a > TextIOBase abstract base class. It provides several methods that are > similar to the BufferIOBase methods, but operate on a per-character > basis instead of a per-byte basis. These methods are: > > .read() > .write() > .seek() > .tell() > .truncate() > > TextIOBase implementations also provide several methods that are > pass-throughs to the underlaying BufferIOBase objects: > > .close() > .is_readable() > .is_writable() > .is_seekable() > > TextIOBase class implementations additionally provide the following methods: > > .readline(self) > > Read until newline or EOF and return the line. > > .readlinesiter() > > Returns an iterator that returns lines from the file (which > happens to be 'self'). > > .next() > > Same as readline() > > .__iter__() > > Same as readlinesiter() > > .__enter__() > > Context management protocol. Returns self. > > .__exit__() > > Context management protocol. No-op. > > Two implementations will be provided by the Python library. The > primary implementation, TextIOWrapper, wraps a Buffered I/O object. > Each TextIOWrapper object has a property name ".buffer" that provides > a reference to the underlying BufferIOBase object. It's initializer > has the following signature: > > .__init__(self, buffer, encoding=None, universal_newlines=True, crlf=None) > > Buffer is a reference to the BufferIOBase object to be wrapped > with the TextIOWrapper. "Encoding" refers to an encoding to be used > for translating between the byte-representation and > character-representation. If "None", then the system's locale setting > will be used as the default. If "universal_newlines" is true, then > the TextIOWrapper will automatically translate the bytes "\r\n" into a > single newline character during reads. If "crlf" is False, then a > newline will be written as "\r\n". If "crlf" is True, then a newline > will be written as "\n". If "crlf" is None, then a system-specific > default will be used. > > Another way to do it is as follows (we should pick one or the other): > > .__init__(self, buffer, encoding=None, newline=None) > > Same as above but if newline is not None use that as the > newline pattern (for reading and writing), and if newline is not set > attempt to find the newline pattern from the file and if we can't for > some reason use the system default newline pattern. > > Another implementation, StringIO, creates a file-like TextIO > implementation without an underlying Buffer I/O object. While similar > functionality could be provided by wrapping a BytesIO object in a > Buffered I/O object in a TextIOWrapper, the String I/O object allows > for much greater efficiency as it does not need to actually performing > encoding and decoding. A String I/O object can just store the encoded > string as-is. The String I/O object's __init__ signature is similar > to the TextIOWrapper, but without the "buffer" parameter. > > END OF PEP > _______________________________________________ > Python-3000 mailing list > Python-3000@python.org > http://mail.python.org/mailman/listinfo/python-3000 > Unsubscribe: > http://mail.python.org/mailman/options/python-3000/daniel%40stutzbachenterprises.com > -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises LLC _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com