[issue36867] Make semaphore_tracker track other system resources

2019-06-12 Thread STINNER Victor


STINNER Victor  added the comment:

The new test is not reliable, see: bpo-37244 "test_multiprocessing_forkserver: 
test_resource_tracker() failed on  x86 Gentoo Refleaks 3.8".

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36867] Make semaphore_tracker track other system resources

2019-05-17 Thread STINNER Victor


STINNER Victor  added the comment:

It seems like all known bugs are fixed, I close again the issue. Thanks!

--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36867] Make semaphore_tracker track other system resources

2019-05-17 Thread STINNER Victor


STINNER Victor  added the comment:


New changeset cbe72d842646ded2454784679231e3d1e6252e72 by Victor Stinner 
(Pierre Glaser) in branch 'master':
bpo-36867: _test_multiprocessing: avoid weak sync primitive (GH-13292)
https://github.com/python/cpython/commit/cbe72d842646ded2454784679231e3d1e6252e72


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36867] Make semaphore_tracker track other system resources

2019-05-13 Thread Antoine Pitrou


Antoine Pitrou  added the comment:


New changeset b1dfcad6f0d3a52c9ac31fb9763fc7962a84b27c by Antoine Pitrou 
(Pierre Glaser) in branch 'master':
bpo-36867: Create the resource_tracker before launching SharedMemoryManagers 
(GH-13276)
https://github.com/python/cpython/commit/b1dfcad6f0d3a52c9ac31fb9763fc7962a84b27c


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36867] Make semaphore_tracker track other system resources

2019-05-13 Thread Pierre Glaser


Change by Pierre Glaser :


--
pull_requests: +13203

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36867] Make semaphore_tracker track other system resources

2019-05-13 Thread Antoine Pitrou


Antoine Pitrou  added the comment:


New changeset 50466c66509de556a8579172f82d1abb1560d8e4 by Antoine Pitrou 
(Pierre Glaser) in branch 'master':
bpo-36867: DOC update multiprocessing.rst (GH-13289)
https://github.com/python/cpython/commit/50466c66509de556a8579172f82d1abb1560d8e4


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36867] Make semaphore_tracker track other system resources

2019-05-13 Thread Antoine Pitrou


Change by Antoine Pitrou :


--
pull_requests: +13200

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36867] Make semaphore_tracker track other system resources

2019-05-13 Thread STINNER Victor


STINNER Victor  added the comment:

I like Olivier's pattern. Maybe we can slowly increase the sleep to stop 
shortly if the resource goes away shortly.

deadline = time.monotonic() + 60.0
sleep = 0.010
while ...:
   if : break
   if time.monotonic() > deadline: ... assert error ...
   sleep = min(sleep * 2, 5.0)
   time.sleep(1.0)

It's kind of a common pattern. Maybe it should be an helper in test.support 
module.


> We can do that, or maybe we can try to wait on the `resource_tracker's` pid?

I prefer to make sure that the resource goes away without inspecting 
multiprocessing internals.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36867] Make semaphore_tracker track other system resources

2019-05-13 Thread Pierre Glaser


Pierre Glaser  added the comment:

We can do that, or maybe we can try to wait on the `resource_tracker's` pid?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36867] Make semaphore_tracker track other system resources

2019-05-13 Thread Olivier Grisel


Olivier Grisel  added the comment:

As Victor said, the `time.sleep(1.0)` might lead to Heisen failures. I am not 
sure how to write proper strong synchronization in this case but we could 
instead go for something intermediate such as the following pattern:


...
p.terminate()
p.wait()
for i in range(60):
try:
shared_memory.SharedMemory(name, create=False)
except FileNotFoundError:
# the OS successfully collected the segment as expected
break

time.sleep(1.0)  # wait for the OS to collect the segment

 else:
 raise AssertionError(f"Failed to collect shared_memory segment 
{name}")


What do you think?

--
nosy: +Olivier.Grisel

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36867] Make semaphore_tracker track other system resources

2019-05-13 Thread Pierre Glaser


Change by Pierre Glaser :


--
pull_requests: +13182

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36867] Make semaphore_tracker track other system resources

2019-05-13 Thread STINNER Victor


STINNER Victor  added the comment:

test_shared_memory_cleaned_after_process_termination() uses time as a weak 
synchronization primitive:


# killing abruptly processes holding reference to a shared memory
# segment should not leak the given memory segment.
p.terminate()
p.wait()
time.sleep(1.0)  # wait for the OS to collect the segment

with self.assertRaises(FileNotFoundError):
smm = shared_memory.SharedMemory(name, create=False)

Would it be possible to use a more reliable synchronization? Such test usually 
fail randomly.
https://pythondev.readthedocs.io/unstable_tests.html

--
nosy: +vstinner

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36867] Make semaphore_tracker track other system resources

2019-05-13 Thread Pierre Glaser


Pierre Glaser  added the comment:

Actually, I was properly unlinking the shared_memory segments. The warning 
messages are due to bad interactions between the ResourceTracker and the 
SharedMemoryManager object. In this particular case, it's easy to change a 
little bit the  problematic test to avoid the warnings. I will focus on solving 
those bad interactions right after.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36867] Make semaphore_tracker track other system resources

2019-05-13 Thread Pierre Glaser


Pierre Glaser  added the comment:

Yes, one test I wrote in an unrelated commit does not unlink a memory segment. 
Now the ResourceTracker complains. Fixing it now.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36867] Make semaphore_tracker track other system resources

2019-05-12 Thread Karthikeyan Singaravelan


Karthikeyan Singaravelan  added the comment:

Since the commit there is a warning in CI and locally while running tests.

Travis CI : 

* https://travis-ci.org/python/cpython/jobs/531304357#L2099 
(test_multiprocessing_forkserver)
* https://travis-ci.org/python/cpython/jobs/531304357#L2296 
(test_multiprocessing_fork)

Before commit : 

./python.exe -X tracemalloc -m test --fail-env-changed 
test_multiprocessing_forkserver
Run tests sequentially
0:00:00 load avg: 4.71 [1/1] test_multiprocessing_forkserver
test_multiprocessing_forkserver passed in 2 min 21 sec

== Tests result: SUCCESS ==

1 test OK.

Total duration: 2 min 21 sec
Tests result: SUCCESS

After commit f22cc69b012f52882d434a5c44a004bc3aa5c33c : 

./python.exe -X tracemalloc -m test --fail-env-changed 
test_multiprocessing_forkserver
Run tests sequentially
0:00:00 load avg: 4.08 [1/1] test_multiprocessing_forkserver
/Users/karthikeyansingaravelan/stuff/python/cpython/Lib/multiprocessing/resource_tracker.py:198:
 UserWarning: resource_tracker: There appear to be 1 leaked shared_memory 
objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
/Users/karthikeyansingaravelan/stuff/python/cpython/Lib/unittest/case.py:683: 
ResourceWarning: unclosed file <_io.BufferedReader name=7>
  testMethod()
Object allocated at (most recent call last):
  File "/Users/karthikeyansingaravelan/stuff/python/cpython/Lib/subprocess.py", 
lineno 819
self.stdout = io.open(c2pread, 'rb', bufsize)
test_multiprocessing_forkserver passed in 2 min 20 sec

== Tests result: SUCCESS ==

1 test OK.

Total duration: 2 min 20 sec
Tests result: SUCCESS

== Tests result: SUCCESS ==

1 test OK.

Total duration: 2 min 26 sec
Tests result: SUCCESS

--
nosy: +xtreak

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36867] Make semaphore_tracker track other system resources

2019-05-11 Thread Pierre Glaser


Pierre Glaser  added the comment:

Shared memory segments are now tracked by the brand new resource_tracker!
Thanks Antoine for the review.

Does anyone have an opinion on introducing a public API for users to make the
resource_tracker track resources of their choice?

What We have in mind is:
- making public the existing resource_tracker.register/unregister
- adding a new function, resource_tracker.make_trackable(resource_type,
  cleanup_func), where:

* resource_type is a string (an identifier that will be used each time a
  resource of the type resource_type needs tracking, via the call
  resource_tracker.register(resource_name, resource_type)
* cleanup_func must be a callable taking a single string as argument (the
  name of the resource that needs tracking). This function will be called
  after the end of a process for reach resource of type resource_type the
  process did not clean properly

Under the hood, make_trackable simply populates resource_tracker._CLEANUP_FUNCS
with new items.

Here is a simple example:
import os
import resource_tracker
import shutil
from multiprocessing import util

class ClassCreatingAFolder:
"""Class where each instance creates a temporary folder.

Each temporary folder is supposed to exist for the duration of the 
instance
that created it.
"""
def __init__(self, folder_name):
self.folder_name = folder_name
os.mkdir(folder_name)

# any instance normally garbage-collected should remove its 
folder, and
# notice the resource_tracker that its folder was correctly 
removed.
util.Finalize(self, ClassCreatingAFolder.cleanup, 
args=(folder_name,))

# If this session quits abruptly, the finalizer will not be 
called for
# the instances of ClassCreatingAFolder that were still alive
# before the shutdown. The resource_tracker comes into play, 
and removes
# the folders associated to each of these resources.
resource_tracker.register(
folder_name, # argument to shutil.rmtree
"ClassCreatingAFolder")

@staticmethod
def cleanup(folder_name):
resource_tracker.unregister(folder_name, "ClassCreatingAFolder")
shutil.rmtree(folder_name)

# Tell the resource_tracker how to cleanup resources created by
# ClassCreatingAFolder instances
resource_tracker.make_trackable("ClassCreatingAFolder", shutil.rmtree)


Typical resources that can be made trackable include memmaped objects,
temporary folders. Our use-case is joblib that has its own mmap type that we
would like to track using the semaphore_tracker.

Any thoughts?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36867] Make semaphore_tracker track other system resources

2019-05-10 Thread Antoine Pitrou


Antoine Pitrou  added the comment:


New changeset f22cc69b012f52882d434a5c44a004bc3aa5c33c by Antoine Pitrou 
(Pierre Glaser) in branch 'master':
bpo-36867: Make semaphore_tracker track other system resources (GH-13222)
https://github.com/python/cpython/commit/f22cc69b012f52882d434a5c44a004bc3aa5c33c


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36867] Make semaphore_tracker track other system resources

2019-05-09 Thread Pierre Glaser


Change by Pierre Glaser :


--
keywords: +patch
pull_requests: +13132
stage:  -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36867] Make semaphore_tracker track other system resources

2019-05-09 Thread Pierre Glaser


New submission from Pierre Glaser :

Hi all,

Olivier Grisel, Thomas Moreau and myself are currently working on increasing
the range of action of the semaphore_tracker in Python.

multiprocessing.semaphore_tracker is a little known module, that launches a
server process used to track the life cycle of semaphores created in a python
session, and potentially cleanup those semaphores after all python processes of
the session terminated. Normally, python processes cleanup semaphores they
create. This is however not not guaranteed if the processes get violently
interrupted (using for example the bash command "killall python")

A note on why the semaphore_tracker was introduced: Cleaning up semaphores
after termination is important because the system only supports a limited
number of named semaphores, and they will not be automatically removed till the
next reboot.

Now, Python 3.8 introduces shared memory segments creation. Shared memory is
another sensitive global system resource. Currently, unexpected termination of
processes that created memory segments will result in leaking those memory
segments. This can be problematic for large compute clusters with many users
and that are rebooted rarely.

For this reason, we expanded the semaphore_tracker to also track shared memory
segments, and renamed it resource_tracker. Shared memory segments get
automatically tracked by the resource tracker when they are created. This is a
first, self-contained fix. (1)

Additionally, supporting shared memory tracking led to a more generic design
for the resource_tracker. The resource_tracker can be now easily extended
to track arbitrary resource types.
A public API could potentially be exposed for users willing to track other
types.  One for example may want to add tracking for temporary folders creating
during python sessions.  Another use case is the one of joblib, which
is a widely-used parallel-computing package, and also the backend of
scikit-learn. Joblib relies heavily on memmapping. A public API could extend
the resource_tracker to track memmap-ed objects with very little code.

Therefore, this issue serves two purposes:
- referencing the semaphore_tracker enhancement mentioned in (1)
- discussing a potentially public resource_tracker API.

--
components: Library (Lib)
messages: 341987
nosy: pablogsal, pierreglaser, pitrou
priority: normal
severity: normal
status: open
title: Make semaphore_tracker track other system resources
type: resource usage
versions: Python 3.8

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com