New submission from Nathaniel Smith <[email protected]>:
Background: Doing I/O to files on disk has a hugely bimodal latency. If the I/O
happens to be in or going to cache (either user-space cache, like in
io.BufferedIOBase, or the OS's page cache), then the operation returns
instantly (~1 µs) without blocking. OTOH if the I/O isn't cached (for reads) or
cacheable (for writes), then the operation may block for 10 ms or more.
This creates a problem for async programs that want to do disk I/O. You have to
use a thread pool for reads/writes, because sometimes they block for a long
time, and you want to let your event loop keep doing other useful work while
it's waiting. But dispatching to a thread pool adds a lot of overhead (~100
µs), so you'd really rather not do it for operations that can be serviced
directly through cache. For uncached operations a thread gives a 100x speedup,
but for cached operations it's a 100x slowdown, and -- this is the kicker --
there's no way to predict which ahead of time.
But, io.BufferedIOBase at least knows when it can satisfy a request directly
from its buffer without issuing any syscalls. And in Linux 4.14, it's even
possible to issue a non-blocking read to the kernel that will only succeed if
the data is immediately available in page cache (bpo-31368).
So, it would be very nice if there were some way to ask a Python file object to
do a "nonblocking read/write", which either succeeds immediately or else raises
an error. The intended usage pattern would be:
async def read(self, *args):
try:
self._fileobj.read(*args, nonblock=True)
except BlockingIOError: # maybe?
return await run_in_worker_thread(self._fileobj.read, *args)
It would *really* help for this to be in the Python core, because right now the
convenient way to do non-blocking disk I/O is to re-use the existing Python I/O
stack, with worker threads. (This is how both aiofiles and trio's async file
support work. I think maybe curio's too.) But to implement this feature
ourselves, we'd have to first reimplement the whole I/O stack, because the
important caching information, and choice of what syscall to use, are hidden
inside.
----------
components: IO
messages: 310032
nosy: benjamin.peterson, njs, stutzbach
priority: normal
severity: normal
status: open
title: Add API to io objects for non-blocking reads/writes
versions: Python 3.8
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue32561>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com