[issue32561] Add API to io objects for non-blocking reads/writes

2019-10-10 Thread Nathaniel Smith

Nathaniel Smith  added the comment:

The proposal is to be able to run io module operations in two modes: the 
regular one, and one where performing actual I/O is forbidden – so if they go 
down the stack and can fulfill the operation from some in-memory buffer, great, 
they do that, and if not, they raise an error.

It turns out that this is actually the right primitive to enable async disk 
access. That's the only target use case, and there's no IO loop involved. If 
you wanted to keep async disk access separate from the io module, then what 
we'd have to do is to create a fork of all the code in the io module, and add 
this feature to it. Which doesn't seem like a good design :-).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32561] Add API to io objects for non-blocking reads/writes

2019-10-10 Thread STINNER Victor


STINNER Victor  added the comment:

I suggest to close the issue and move the discussion to a place to discuss 
asynchronous ideas.


I'm sorry, but I don't understand what it is proposed here. I understand that 
Nathaniel wants to add something like a new "asynchronous" mode in the io 
module which would make FileIO, BufferedReader and TextIOWrapper behave 
differently.

IMHO it's a bad idea. The io module is designed for blocking I/O syscalls. Not 
only the implementation, but also the API.

Non-blocking I/O requires a platform specific implementation for best 
performances, but that requires something like an event loop, and so unusual 
programming style like asyncio "await ...".

I dislike the idea of having a single module for synchronous (blocking) and 
asynchronous (non-blocking) operations. IMHO asynchronous programming is so 
complex that it requires to develop a whole new module.

Maybe new module could reuse io code. Like implement an asynchronous using 
io.TextIOWrapper, but its underlying buffer would be feeded and controlled by 
asynchronous code.

The Python bug tracker is usually used for bugs or to implement a concrete 
proposal. Here I understand that it's more an idea at the design stage. I don't 
think that it's the best place to discuss it. I suggest to open a discussion on 
python-ideas list or a list about asynchronous programming (I looked for 
"async-sig", but it seems like the list is gone?).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32561] Add API to io objects for non-blocking reads/writes

2019-10-10 Thread STINNER Victor


STINNER Victor  added the comment:

The Linux kernel 5.1 also got a new "io_uring" for asynchronous I/O:

"Ringing in a new asynchronous I/O API"
https://lwn.net/Articles/776703/

Linux 5.2: "The io_uring mechanism has a new operation, 
IORING_OP_SYNC_FILE_RANGE, which performs the equivalent of a sync_file_range() 
system call. It is also now possible to register an eventfd with an io_uring 
and get notifications when operations complete."

Linux 5.3: "The io_uring mechanism has gained support for asynchronous 
sendmsg() and recvmsg() operations."

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32561] Add API to io objects for non-blocking reads/writes

2019-10-10 Thread STINNER Victor

STINNER Victor  added the comment:

> Background: Doing I/O to files on disk has a hugely bimodal latency. If the 
> I/O happens to be in or going to cache (either user-space cache, like in 
> io.BufferedIOBase, or the OS's page cache), then the operation returns 
> instantly (~1 µs) without blocking. OTOH if the I/O isn't cached (for reads) 
> or cacheable (for writes), then the operation may block for 10 ms or more.

On Linux 4.14 and newer, Python 3.8 now provides os.preadv(os.RWF_NOWAIT):

"Do not wait for data which is not immediately available. If this flag is 
specified, the system call will return instantly if it would have to read data 
from the backing storage or wait for a lock. If some data was successfully 
read, it will return the number of bytes read. If no bytes were read, it will 
return -1 and set errno to errno.EAGAIN."

At least on recent Linux, it became possible to write a different code path for 
uncached data.

--
nosy: +vstinner

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32561] Add API to io objects for non-blocking reads/writes

2018-06-10 Thread Giampaolo Rodola'


Giampaolo Rodola'  added the comment:

Gotcha. Thanks for clarifying.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32561] Add API to io objects for non-blocking reads/writes

2018-06-10 Thread Nathaniel Smith


Nathaniel Smith  added the comment:

The idea here is *not* to avoid using a thread pool in general. When the data 
is on disk, using a thread pool is (a) unavoidable, because of how operating 
system kernels are written, and (b) basically fine anyway, because the overhead 
added by threads is swamped by the cost of disk access. So for the foreseeable 
future, we're always going to be using a thread pool for actual disk access.

But, if the data *is already in memory*, so the read can succeed without 
hitting the disk, then using a thread pool is *not* fine. Fetching data out of 
memory is super super cheap, so if that's all we're doing then using a thread 
pool adds massive overhead, in relative terms. We'd like to skip using the 
thread pool *specifically in this case*.

So the idea would be: first, attempt a "buffer-only" read. If it succeeds, then 
great we're done and it was really cheap. Otherwise, if it fails, then we know 
we're in the data-on-disk case, so we dispatch the operation to the thread pool.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32561] Add API to io objects for non-blocking reads/writes

2018-06-10 Thread Giampaolo Rodola'


Giampaolo Rodola'  added the comment:

os.preadv() and os.pwritev() are great but to my understanding one essential 
piece is still missing in order to effectively do non-blocking file IO and 
avoid using a thread pool: being notified when the file fd is 
readable/writable. select() and epoll() on Linux are not able to do that 
(according to them regular fds are always "ready"). As such one would 
repeatedly get EAGAIN and hog CPU resources. Am I missing something?

--
nosy: +giampaolo.rodola

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32561] Add API to io objects for non-blocking reads/writes

2018-01-16 Thread YoSTEALTH

YoSTEALTH  added the comment:

There will be lot of confusion using "buffered" & "unbuffered" terminology, 
since python already has BufferedIOBase (as mentioned by Martin). It would be 
more appropriate to create io.CachedIOBase and add non-blocking argument to 
open(blocking=False) to enable this feature.

--
nosy: +YoSTEALTH

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32561] Add API to io objects for non-blocking reads/writes

2018-01-16 Thread Antoine Pitrou

Antoine Pitrou  added the comment:

> Do you think we can deprecate the existing broken non-blocking mode?

I would suggest asking on python-dev.  I wouldn't mind it, but perhaps there 
are people using it.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32561] Add API to io objects for non-blocking reads/writes

2018-01-16 Thread Nathaniel Smith

Nathaniel Smith  added the comment:

That's a reasonable concern. Do you think we can deprecate the existing broken 
non-blocking mode?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32561] Add API to io objects for non-blocking reads/writes

2018-01-16 Thread Antoine Pitrou

Antoine Pitrou  added the comment:

> And all the async file IO APIs I know [1][2][3] have the public API of "just 
> like regular files, but the blocking methods are async". 99% of the time, 
> that means TextWrapper and BufferedStream. So I just don't see any way to get 
> the advantages of this feature without either (a) adding buffer-only support 
> to those layers, or (b) forking those layers into a 3rd-party library, and 
> then adding buffer-only support.

Yeah... The concrete problem is that there's already a poorly thought-out 
"non-blocking mode" that only partly works, and suddenly the code (which 
includes a lot of delicate, performance-critical C code... to give you an idea, 
even recently a consistency bug in truncate() was discovered) will have to be 
massaged to support another non-blocking mode of operation.

So that's why I'm very cautious about integrating this into BufferedReader and 
friends.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32561] Add API to io objects for non-blocking reads/writes

2018-01-16 Thread Nathaniel Smith

Nathaniel Smith  added the comment:

Hmm, why did I use "unbuffered" as my term above? That's a very odd name. It's 
like, exactly the opposite of what we actually want. Clearly I did not think 
this through properly. Please pretend I said "buffer-only" instead, thanks.

> So my opinion here is that only raw IO objects (FileIO) should have this 
> functionality.  People can build their own functionality on top of that (such 
> as Tornado or asyncio do with their streams).

I guess I don't object to such functionality, but it would be useless to me 
personally. FileIO doesn't solve any problems I have with stream processing; 
the reason I care about this is for providing an async file I/O API. And all 
the async file IO APIs I know [1][2][3] have the public API of "just like 
regular files, but the blocking methods are async". 99% of the time, that means 
TextWrapper and BufferedStream. So I just don't see any way to get the 
advantages of this feature without either (a) adding buffer-only support to 
those layers, or (b) forking those layers into a 3rd-party library, and then 
adding buffer-only support.

OTOH, it would be ok if in an initial implementation some methods like 
readline() simply always failed when called in buffer-only mode, since this 
would be a best-effort thing. (This is also a different from the non-blocking 
semantics discussion in bpo-13322, which is kind of scary. I don't want to deal 
with partial writes and reads and carrying crucial data in exceptions! I just 
want to know if the operation can trivially be done without blocking, and if 
not then I'll retry it in blocking mode.)

[1] https://github.com/Tinche/aiofiles
[2] 
https://trio.readthedocs.io/en/latest/reference-io.html#asynchronous-filesystem-i-o
[3] https://curio.readthedocs.io/en/latest/reference.html#module-curio.file

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32561] Add API to io objects for non-blocking reads/writes

2018-01-16 Thread Antoine Pitrou

Antoine Pitrou  added the comment:

> Ideally we would be able to do buffer-only reads through all the of the 
> different read methods (read, readline, readinto, ...),

Hmm... We already have non-blocking support in BufferedIOReader, except it 
*doesn't work*.  The problem is, the semantics mandated by readline() and even 
buffered read() don't work very well with non-blocking IO (see issue13322).

So my opinion here is that only raw IO objects (FileIO) should have this 
functionality.  People can build their own functionality on top of that (such 
as Tornado or asyncio do with their streams).

--
nosy: +pitrou

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32561] Add API to io objects for non-blocking reads/writes

2018-01-15 Thread Xavier G. Domingo

Change by Xavier G. Domingo :


--
nosy: +xgdomingo

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32561] Add API to io objects for non-blocking reads/writes

2018-01-15 Thread Nathaniel Smith

Nathaniel Smith  added the comment:

> BufferedIOBase is an abstract class and, despite the name, doesn’t 
> necessitate a buffer or cache

Right, sorry, skimmed too fast.

> In Issue 32475 there is a proposal to add a “getbuffn” method returning the 
> amount of unread pending data in a reader object. Perhaps that would be 
> enough for reading.

Ideally we would be able to do buffer-only reads through all the of the 
different read methods (read, readline, readinto, ...), and ideally we would be 
able to do it given objects at different points in the IO stack – so a 
buffer-only read on TextWrapper wrapped around a BufferedRandom wrapped around 
a FileIO, should propagate the buffer-only-ness all the way down the stack. I 
don't think getbuffn is enough to solve that? Or at least I don't see how in 
any simple way.

Also, the immediate thing that spurred me to file this issue was learning that 
Linux has just added a non-blocking file read syscall. It would be pretty neat 
if we could expose that. If we had a way to propagate this down, then it could 
just be FileIO's implementation of the buffer-only flag.

But yeah, actually doing that is complicated given the need to continue 
supporting existing implementations of these interfaces.

Here's a straw man proposal: add a unbuffered_supported flag to the abstract IO 
interfaces. If missing or false, you can't do unbuffered reads/writes. If 
present and True, then you can pass a new unbuffered=True kw-only argument to 
their read/write calls.

When (for example) TextWrapper.read needs to call its wrapped object's .read, 
it does a check like:

  if unbuffered:
  # This call is unbuffered, so we're only allowed to call
  # unbuffered methods.
  if not getattr(self.wrapped, "unbuffered_supported", False):
  # lower level doesn't support this, can't be done
  raise ...
  else:
  self.wrapped.read(..., unbuffered=True)
  else:
  # We're a regular call, so we can make a regular call
  self.wrapped.read(...)

(I'm intentionally using "unbuffered" here to distinguish from regular POSIX 
"non-blocking", which is an API that's conceptually very similar but totally 
distinct in implementation. Especially since it's also possible to use the io 
stack with sockets/pipes in non-blocking mode.)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32561] Add API to io objects for non-blocking reads/writes

2018-01-15 Thread Martin Panter

Change by Martin Panter :


--
dependencies: +Add ability to query number of buffered bytes available on 
buffered I/O

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32561] Add API to io objects for non-blocking reads/writes

2018-01-15 Thread Martin Panter

Martin Panter  added the comment:

BufferedIOBase is an abstract class and, despite the name, doesn’t necessitate 
a buffer or cache. Adding methods and properties might break compatibility with 
third-party implementations, or get ugly with optional methods and multiple 
versions of the API. It seems like it would be better to extend the concrete 
APIs: io.BufferedReader, BufferedWriter and/or FileIO.

In Issue 32475 there is a proposal to add a “getbuffn” method returning the 
amount of unread pending data in a reader object. Perhaps that would be enough 
for reading.

I would support an similar API for BufferedWriter etc. Perhaps a property 
called “available_space”. You could check that and decide whether to do a 
direct non-blocking “write”, or launch a blocking “write” in the background.

--
nosy: +martin.panter

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32561] Add API to io objects for non-blocking reads/writes

2018-01-15 Thread Nathaniel Smith

New submission from Nathaniel Smith :

Background: Doing I/O to files on disk has a hugely bimodal latency. If the I/O 
happens to be in or going to cache (either user-space cache, like in 
io.BufferedIOBase, or the OS's page cache), then the operation returns 
instantly (~1 µs) without blocking. OTOH if the I/O isn't cached (for reads) or 
cacheable (for writes), then the operation may block for 10 ms or more.

This creates a problem for async programs that want to do disk I/O. You have to 
use a thread pool for reads/writes, because sometimes they block for a long 
time, and you want to let your event loop keep doing other useful work while 
it's waiting. But dispatching to a thread pool adds a lot of overhead (~100 
µs), so you'd really rather not do it for operations that can be serviced 
directly through cache. For uncached operations a thread gives a 100x speedup, 
but for cached operations it's a 100x slowdown, and -- this is the kicker -- 
there's no way to predict which ahead of time.

But, io.BufferedIOBase at least knows when it can satisfy a request directly 
from its buffer without issuing any syscalls. And in Linux 4.14, it's even 
possible to issue a non-blocking read to the kernel that will only succeed if 
the data is immediately available in page cache (bpo-31368).

So, it would be very nice if there were some way to ask a Python file object to 
do a "nonblocking read/write", which either succeeds immediately or else raises 
an error. The intended usage pattern would be:

async def read(self, *args):
try:
self._fileobj.read(*args, nonblock=True)
except BlockingIOError: # maybe?
return await run_in_worker_thread(self._fileobj.read, *args)

It would *really* help for this to be in the Python core, because right now the 
convenient way to do non-blocking disk I/O is to re-use the existing Python I/O 
stack, with worker threads. (This is how both aiofiles and trio's async file 
support work. I think maybe curio's too.) But to implement this feature 
ourselves, we'd have to first reimplement the whole I/O stack, because the 
important caching information, and choice of what syscall to use, are hidden 
inside.

--
components: IO
messages: 310032
nosy: benjamin.peterson, njs, stutzbach
priority: normal
severity: normal
status: open
title: Add API to io objects for non-blocking reads/writes
versions: Python 3.8

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com