[reposting since the first time it didn't get through...]
On 26/02/2007 22.35, Mike Verdone wrote: > Daniel Stutzbach and I have prepared a draft PEP for the new IO system > for Python 3000. This document is, hopefully, true to the info that > Guido wrote on the whiteboards here at PyCon. This is still a draft > and there's quite a few decisions that need to be made. Feedback is > welcomed. Thanks for this! > Raw I/O > The abstract base class for raw I/O is RawIOBase. It has several > methods which are wrappers around the appropriate operating system > call. If one of these functions would not make sense on the object, > the implementation must raise an IOError exception. For example, if a > file is opened read-only, the .write() method will raise an IOError. > As another example, if the object represents a socket, then .seek(), > .tell(), and .truncate() will raise an IOError. > > .read(n: int) -> bytes > .readinto(b: bytes) -> int > .write(b: bytes) -> int What are the requirements here? - Can read()/readinto() return *less* bytes than specified? - Can read() return a 0-sized byte object (=no data available)? - Can read() return *more* bytes than specified (think of a datagram socket or a decompressing stream)? - Can readinto() read *less* bytes than specified? - Can readinto() read zero bytes? - Should read()/readinto() raise EOFError? - Can write() write less bytes than specified? - Can write() write zero bytes? Please, see also the examples at the end of the mail before providing an answer :) > .seek(pos: int, whence: int = 0) -> None > .tell() -> int > .truncate(n: int = None) -> None > .close() -> None Why should this very low-level basic type define *two* read methods? Assuming that readinto() is the most primitive, can we have the ABC RawIOBase provide a default read() method that calls readinto? Consider providing more ABC/mixins to help implementations. ReadIOBase/WriteIOBase are pretty obvious: class RawIOBase: def readable(self): return False def writeable(self): return False def seekable(self): return False def read(self,n): raise IOError def readinto(self,b): raise IOError def write(self,b): raise IOError def seek(self,pos,wh): raise IOError def tell(self): raise IOError def truncate(self,n=None): raise IOError class ReadIOBase(RawIOBase): def readable(self): return True def read(self, n): b = bytes(n) #whatever self.readinto(b) return b class MySpecialReader(ReadIOBase): def readinto(self, b): # .... # must implement only this and nothing else class MySpecialReaderWriter(ReadIOBase, WriteIOBase): def readinto(self, b): # .... def write(self, b): # .... > (should these "is_" functions be attributes instead? > "file.readable == True") Yes, I think readable/writeable/seekable/fileno *perfectly* match the good usage of attributes/properties. They all provide a value without any side-effect and that can be computed without doing O(n)-style computations. > Buffered I/O > The next layer is the Buffer I/O layer which provides more efficient > access to file-like objects. The abstract base class for all Buffered I think you probably want the buffer size to be optionally specified by the user, for the standard 4 implementations. > Q: Do we want to mandate in the specification that switching between > reading to writing on a read-write object implies a .flush()? Or is > that an implementation convenience that users should not rely on? I'd be glad if using flush() wasn't a requirement for users of the class. It always strikes me as abstraction leak to me. > TextIOBase class implementations additionally provide the following methods: > > .readline(self) > > Read until newline or EOF and return the line. > > .readlinesiter() > > Returns an iterator that returns lines from the file (which > happens to be 'self'). > > .next() > > Same as readline() > > .__iter__() > > Same as readlinesiter() Note sure why you need "readlinesiter()" at all. I thought Py3k was disposing most of the "fooiter()" functions (thinking of dicts...). > Another way to do it is as follows (we should pick one or the other): > > .__init__(self, buffer, encoding=None, newline=None) I think this is clearer. I can't find a good real-world usecase for requiring the two parameters version. ========================================================================== Now for some real example. Let's say I'm given a readable RawIOBase object. I'm told that it's a foobar-compressed utf-8 text-file. I have this API available: class Foobar: # initialize decompressor __init__() # feed compressed bytes and get uncompressed bytes. # The uncompressed data can be smaller, equal or larger # than the compressed data decompress(bytes) -> bytes # finish decompression and get tail flush() -> bytes This is basically similar to the way zlib.decompress/flush works. I would like to wrap the readable RawIOBase object in a way that I obtain a textual file-like with readline() etc. This is pretty hard to do with the current I/O library (you need to write a lot of code). It'd be good if the new I/O library makes it easier to achieve. Let's see. I start with a raw I/O reader: class FoobarRaw(RawIOBase): def __init__(self, raw): self.raw = raw self._d = Foobar() self._buf = bytes() def readable(self): return True # I assume RawIOBase.read() must return the # exact number of bytes (unless at the end). # I assume RawIOBase.read() raises EOFError when done # I assume readinto() does not exist... def read(self, n): try: while len(self._buf) < n: b = self.raw.read(n) self._buf += self._d.decompress(b) except EOFError: self._buf += self._d.flush() d = self._buf[:n] del self._buf[:n] if not d: raise EOFError return d and complete the job: def foobar_open(raw): return TextIOWrapper(BufferedReader(FoobarRaw(raw)), encoding="utf-8") for L in foobar_open(sock): print(L) Uhm, looks great! ========================================================================== Now, it might be interesting playing with the different semantic of RawIOBase.read(), which I proposed above, and see how the implementation of FoobarRaw.read() changes. For instance (now being radical): why don't we drop the "n" argument altogether? We could just define it like this: # Returns a block of data, whose size is implementation-defined # and may vary between calls. It never returns a zero-sized block. # Raises EOFError when done. read() -> bytes After all, there's a BufferedIO layer to handle buffering and exact-size reads/writes. If we go this way, the above example is even easier: def read(self): try: b = self.raw.read() # any size! return self._d.decompress(b) except EOFError: b = self._d.flush() if not b: raise EOFError return b It would also work well for sockets, since they would return exactly the buffer of data arrived from the network, and simply block once if there's not data available. -- Giovanni Bajo _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com