[issue38591] Deprecate Process Child Watchers

2019-11-15 Thread Kyle Stanley


Kyle Stanley  added the comment:

> I think so. It will take a long before we remove it though.

In that case, it could be a long term deprecation notice, where we start the 
deprecation process without having a definitive removal version. This will at 
least encourage users to look towards using the other watchers instead of 
FastChildWatcher. I can start working on a PR.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38591] Deprecate Process Child Watchers

2019-11-15 Thread Andrew Svetlov


Andrew Svetlov  added the comment:

> So are we at least in agreement for starting with deprecating 
> FastChildWatcher?

I think so. It will take a long before we remove it though.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38591] Deprecate Process Child Watchers

2019-11-14 Thread Kyle Stanley


Kyle Stanley  added the comment:

> You have to account also for the thread stack size. I suggest to look at RSS 
> memory instead.

Ah, good point. I believe get_size() only accounts for the memory usage of the 
thread object, not the amount allocated in physical memory from the thread 
stack. Thanks for the clarification. 

> I measured the RSS memory per thread: it's around 13.2 kB/thread. Oh, that's 
> way lower than what I expected.

On Python 3.8 and Linux kernel 5.3.8, I received the following result:

# Starting mem
VmRSS:  8408 kB
# After initializing and starting 1k threads:
VmRSS: 21632 kB

~13224kB for 1k threads, which reflects the ~13.2kB/thread estimate. 

Also, as a sidenote, I think you could remove the "for thread in threads: 
thread.started_event.wait()" portion for our purposes. IIUC, waiting on the 
threading.Event objects wouldn't affect memory usage.

> To be clear: I mean that FastChildWatcher is safe only if all process's code 
> spaws subprocesses by FastChildWatcher.
> If ProcessPoolExecutor or direct subprocess calls are used the watcher is 
> unsafe.
> If some C extension spawns new processes on its own (e.g. in a separate 
> thread) -- the watcher is unsafe.

> I just think that this particular watcher is too dangerous.

So are we at least in agreement for starting with deprecating FastChildWatcher? 
If a server is incredibly tight on memory and it can't spare ~13.2kB/thread, 
SafeChildWatcher would be an alternative to ThreadedChildWatcher.

Personally, I still think this amount is negligible for most production 
servers, and that we can reasonably deprecate SafeChildWatcher as well. But I 
can start with FastChildWatcher.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38591] Deprecate Process Child Watchers

2019-11-14 Thread Andrew Svetlov


Andrew Svetlov  added the comment:

To be clear: I mean that FastChildWatcher is safe only if all process's code 
spaws subprocesses by FastChildWatcher.
If ProcessPoolExecutor or direct subprocess calls are used the watcher is 
unsafe.
If some C extension spawns new processes on its own (e.g. in a separate thread) 
-- the watcher is unsafe.

I just think that this particular watcher is too dangerous.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38591] Deprecate Process Child Watchers

2019-11-14 Thread STINNER Victor


STINNER Victor  added the comment:

I measured the RSS memory per thread: it's around 13.2 kB/thread. Oh, that's 
way lower than what I expected.

Script:

# 10: 9636 kB => 9756 kB: +12 kB/thread
# 100: 9476 kB = 10796 kB: +13.2 kB/thread
# 1000: 9552 kB = 22776 kB: +13.2 kB/thread

import os
import threading

def display_rss():
os.system(f"grep ^VmRSS /proc/{os.getpid()}/status")

def wait(event):
event.wait()

class Thread(threading.Thread):
def __init__(self):
super().__init__()
self.stop_event = threading.Event()
self.started_event = threading.Event()

def run(self):
self.started_event.set()
self.stop_event.wait()

def stop(self):
self.stop_event.set()
self.join()

def main():
nthread = 1000
display_rss()
threads = [Thread() for i in range(nthread)]
for thread in threads:
thread.start()
for thread in threads:
thread.started_event.wait()
display_rss()
for thread in threads:
thread.stop()

main()

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38591] Deprecate Process Child Watchers

2019-11-14 Thread STINNER Victor


STINNER Victor  added the comment:

> The 64 bytes was measured by `sys.getsizeof(threading.Thread())`, which only 
> provides a surface level assessment. I believe this only includes the size of 
> the reference to the thread object.

You have to account also for the thread stack size. I suggest to look at RSS 
memory instead.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38591] Deprecate Process Child Watchers

2019-11-14 Thread Kyle Stanley


Kyle Stanley  added the comment:

> I understand that there's *some* overhead associated with spawning a new 
> thread, but from my impression it's not substantial enough to make a 
> significant impact in most cases.

Although I think this still stands to some degree, I will have to rescind the 
following:

> Each individual instance of threading.Thread is only 64 bytes.

The 64 bytes was measured by `sys.getsizeof(threading.Thread())`, which only 
provides a surface level assessment. I believe this only includes the size of 
the reference to the thread object.

In order to get a better estimate, I implemented a custom get_size() function, 
that recursively adds the size of the object and all unique objects from 
gc.get_referents()  (ignoring several redundant and/or unnecessary types). For 
more details, see 
https://gist.github.com/aeros/632bd035b6f95e89cdf4bb29df970a2a. Feel free to 
critique it if there are any apparent issues (for the purpose of measuring the 
size of threads). 

Then, I used this function on three different threads, to figure how much 
memory was needed for each one:

Python 3.8.0+ (heads/3.8:1d2862a323, Nov  4 2019, 06:59:53) 
[GCC 9.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import threading
>>> from get_size import get_size
>>> a = threading.Thread()
>>> b = threading.Thread()
>>> c = threading.Thread()
>>> get_size(a)
3995
>>> get_size(b)
1469
>>> get_size(c)
1469

1469 bytes seems to be roughly the amount of additional memory required for 
each new thread, at least on Linux kernel 5.3.8 and Python 3.8. I don't know if 
this is 100% accurate, but it at least provides an improved estimate over 
sys.getsizeof().

> But it spawns a new Python thread per process which can be a blocker issue if 
> a server memory is limited. What if you want to spawn 100 processes? Or 1000 
> processes? What is the memory usage?

>From my understanding, ~1.5KB/thread seems to be quite negligible for most 
>modern equipment. The server's memory would have to be very limited for 
>spawning an additional 1000 threads to be a bottleneck/blocker issue:

Python 3.8.0+ (heads/3.8:1d2862a323, Nov  4 2019, 06:59:53) 
[GCC 9.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import threading
>>> from get_size import get_size
>>> threads = []
>>> for _ in range(1000):
... th = threading.Thread()
... threads.append(th)
... 
>>> get_size(threads)
1482435

(~1.5MB)

Victor (or anyone else), in your experience, would the additional ~1.5KB per 
process be an issue for 99% of production servers? If not, it seems to me like 
the additional maintenance cost of keeping SafeChildWatcher and 
FastChildWatcher in asyncio's API wouldn't be worthwhile.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38591] Deprecate Process Child Watchers

2019-11-04 Thread Benjamin Peterson


Benjamin Peterson  added the comment:

FWIW, I started implementing a pidfd-based child process watcher over on #38692.

--
nosy: +benjamin.peterson

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38591] Deprecate Process Child Watchers

2019-10-26 Thread Kyle Stanley


Kyle Stanley  added the comment:

> But it spawns a new Python thread per process which can be a blocker issue if 
> a server memory is limited.

I understand that there's *some* overhead associated with spawning a new 
thread, but from my impression it's not substantial enough to make a 
significant impact in most cases. Each individual instance of threading.Thread 
is only 64 bytes. Have you seen any recent cases where the server memory is 
limited enough for the memory cost associated with having to spawn an 
additional thread per subprocess becomes the limiting factor?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38591] Deprecate Process Child Watchers

2019-10-26 Thread Kyle Stanley


Kyle Stanley  added the comment:

> > If asyncio is only run from the main thread, FastChildWatcher is safe, fast 
> > and has low memory footprint, no?

> Unfortunately, no. FastChildWatcher is safe if you can guarantee that no code 
> executed in asyncio main thread AND thread pools spawn subprocesses

Am I misunderstanding something here or is this supposed to be 
"FastChildWatcher is safe if you can guarantee that no code executed *outside 
of* the asyncio main thread AND ..."? Alternatively, "FastChildWatcher is safe 
if you can guarantee that code *only* executed in the asyncio main thread". 
Both of the above have the same functional meaning. 

I think it was a typo, but I just wanted to make sure because the distinction 
from the original makes a functional difference in this case. 

Also, regarding the second part "thread pools spawn subprocesses", is that to 
say that subprocesses can only spawn within the thread pools? As in, 
FastChildWatcher becomes unsafe if subprocesses are spawned from anywhere else?

These answers may be fairly obvious to someone familiar working within the 
internals of FastChildWatcher, but it may not be overly clear to someone such 
as myself who has mostly just read through the documentation and looked over 
the implementation briefly. I'm only familiar with the internals of 
ThreadedChildWatcher.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38591] Deprecate Process Child Watchers

2019-10-26 Thread Andrew Svetlov


Andrew Svetlov  added the comment:

My non-LTS Ubuntu also has 5.3 kernel but I'm talking about the oldest 
supported RHEL/CentOS.

That's why pidfd_open() cannot be a single implementation. It's so new; my 
local man-pages system has not a record about the API yet (but the web has: 
http://man7.org/linux/man-pages/man2/pidfd_open.2.html).

> If asyncio is only run from the main thread, FastChildWatcher is safe, fast 
> and has low memory footprint, no?

Unfortunately, no. FastChildWatcher is safe if you can guarantee that no code 
executed in asyncio main thread AND thread pools spawn subprocesses. Otherwise, 
the whole Python process becomes broken by the race condition between 
FastChildWatcher and any other wait()/waitpid()/waitid() call.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38591] Deprecate Process Child Watchers

2019-10-25 Thread STINNER Victor


STINNER Victor  added the comment:

> Regarding pidfd and kqueue -- I love to see pull requests with proposals. Now 
> nothing exists. The pidfd is available starting from the latest released 
> Linux 5.3;  we need to wait for a decade before all Linux distros adopt it 
> and we can drop all other implementations.

I hope that it will take less time to expose pidfd_open() in Python! It is 
*already* available in the Linux kernel 5.3. My laptop is already running Linux 
5.3! (Thanks Fedora 30.)

I would prefer to keep an API to choose the child watcher since it seems like 
even in 2019, there are still new APIs (pidfd) to wait for a process 
completion. I expect that each implementation will have advantages and 
drawbacks.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38591] Deprecate Process Child Watchers

2019-10-25 Thread STINNER Victor


STINNER Victor  added the comment:

> ThreadedChildWatcher starts a thread per process but has O(1) complexity.

But it spawns a new Python thread per process which can be a blocker issue if a 
server memory is limited. What if you want to spawn 100 processes? Or 1000 
processes? What is the memory usage?

I like FastChildWatcher!

> ... but working with asyncio subprocess API is still super complicated if 
> asyncio code is running from multiple threads

Well, I like the ability to choose the child watcher implementation depending 
on my use case. If asyncio is only run from the main thread, FastChildWatcher 
is safe, fast and has low memory footprint, no?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38591] Deprecate Process Child Watchers

2019-10-25 Thread Kyle Stanley


Kyle Stanley  added the comment:

> I think FastChildWatcher and SafeChildWatcher should go, ThreadedChildWatcher 
> should be kept default and MultiLoopChildWatcher is an option where 
> ThreadedChildWatcher is not satisfactory.

Okay, I think I can understand the reasoning here. Do you think that 
FastChildWatcher and SafeChildWatcher could be deprecated starting in 3.9 and 
removed in 3.11? If so, I'd be glad to start working on adding the deprecation 
warnings and the 3.9 Whats New entries.

> MultiLoopChildWatcher problems can and should be fixed; there is nothing bad 
> in the idea but slightly imperfect implementation.

Yeah my largest concern was just that the current issues seem especially 
complex to fix and I interpreted your previous comment as "I'd like to 
deprecate all the process watchers in the near future". Thus, it seemed to make 
sense to me that we could start removing them rather than sinking time into 
fixing something that might be removed soon. 

By "slightly imperfect implementation", do you have any ideas for particularly 
imperfect parts of it that could use improvement? 

I feel that I've developed a decent understanding of the implementation for 
ThreadedChildWatcher, but after looking at the race conditions for 
MultiLoopChildWatcher in https://bugs.python.org/issue38323, I'll admit that I 
felt a bit lost for where to find a solution. Primarily because my 
understanding of the signal module is quite limited in comparison to others; 
it's not an area that I've strongly focused on.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38591] Deprecate Process Child Watchers

2019-10-25 Thread Andrew Svetlov


Andrew Svetlov  added the comment:

ThreadedChildWatcher starts a thread per process but has O(1) complexity.

MultiLoopChildWatcher doesn't spawn threads, it can be used safely with asyncio 
loops spawn in multiple threads. The complexity is O(N) plus no other code 
should contest for SIG_CHLD subscription.

FastChildWatcher has O(1), this is the good news. All others are bad: the 
watcher conflicts even with blocking `subprocess.wait()` call, even if the call 
is performed from another thread.

SafeChildWatcher is safer than FastChildWatcher but working with asyncio 
subprocess API is still super complicated if asyncio code is running from 
multiple threads. SafeChildWatcher works well only if asyncio is run from the 
main thread only. Complexity is O(N).

I think FastChildWatcher and SafeChildWatcher should go, ThreadedChildWatcher 
should be kept default and MultiLoopChildWatcher is an option where 
ThreadedChildWatcher is not satisfactory.

MultiLoopChildWatcher problems can and should be fixed; there is nothing bad in 
the idea but slightly imperfect implementation.

Regarding pidfd and kqueue -- I love to see pull requests with proposals. Now 
nothing exists.
The pidfd is available starting from the latest released Linux 5.3;  we need to 
wait for a decade before all Linux distros adopt it and we can drop all other 
implementations.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue38591] Deprecate Process Child Watchers

2019-10-25 Thread Kyle Stanley


Change by Kyle Stanley :


--
title: Deprecating MultiLoopChildWatcher -> Deprecate Process Child Watchers

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com