[issue17560] problem using multiprocessing with really big objects?

2013-08-19 Thread Olivier Grisel
Olivier Grisel added the comment: I have implemented a custom subclass of the multiprocessing Pool to be able plug custom pickling strategy for this specific use case in joblib: https://github.com/joblib/joblib/blob/master/joblib/pool.py#L327 In particular it can: - detect mmap-backed numpy

[issue17560] problem using multiprocessing with really big objects?

2013-08-19 Thread Olivier Grisel
Olivier Grisel added the comment: I forgot to end a sentence in my last comment: - detect mmap-backed numpy should read: - detect mmap-backed numpy arrays and pickle only the filename and other buffer metadata to reconstruct a mmap-backed array in the worker processes instead of copying

[issue17560] problem using multiprocessing with really big objects?

2013-08-19 Thread Olivier Grisel
Olivier Grisel added the comment: In 3.3 you can do from multiprocessing.forking import ForkingPickler ForkingPickler.register(MyType, reduce_MyType) Is this sufficient for you needs? This is private (and its definition has moved in 3.4) but it could be made public. Indeed I

[issue18999] Robustness issues in multiprocessing.{get, set}_start_method

2013-09-10 Thread Olivier Grisel
Olivier Grisel added the comment: Related question: is there any good reason that would prevent to pass a custom `start_method` kwarg to the `Pool` constructor to make it use an alternative `Popen` instance (that is an instance different from the `multiprocessing._Popen` singleton

[issue18999] Robustness issues in multiprocessing.{get, set}_start_method

2013-09-11 Thread Olivier Grisel
Olivier Grisel added the comment: Maybe it would be better to have separate contexts for each start method. That way joblib could use the forkserver context without interfering with the rest of the user's program. Yes in general it would be great if libraries could customize

[issue18999] Robustness issues in multiprocessing.{get, set}_start_method

2013-09-12 Thread Olivier Grisel
Olivier Grisel added the comment: The process pool executor [1] from the concurrent futures API would be suitable to explicitly start and stop the helper process for the `forkserver` mode. [1] http://docs.python.org/3.4/library/concurrent.futures.html#concurrent.futures.ProcessPoolExecutor

[issue18999] Robustness issues in multiprocessing.{get, set}_start_method

2013-09-12 Thread Olivier Grisel
Olivier Grisel added the comment: Richard Oudkerk: thanks for the clarification, that makes sense. I don't have the time either in the coming month, maybe later. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue18999

[issue19851] reload problem with submodule

2013-12-09 Thread Olivier Grisel
Olivier Grisel added the comment: I tested the patch on the current HEAD and it fixes a regression introduced between 3.3 and 3.4b1 that prevented to build scipy from source with pip install scipy. -- nosy: +Olivier.Grisel ___ Python tracker rep

[issue19946] multiprocessing crash with forkserver or spawn when run from a non .py ending script

2013-12-10 Thread Olivier Grisel
New submission from Olivier Grisel: Here is a simple python program that uses the new forkserver feature introduced in 3.4b1: name: checkforkserver.py import multiprocessing import os def do(i): print(i, os.getpid()) def test_forkserver(): mp = multiprocessing.get_context

[issue19946] multiprocessing crash with forkserver or spawn when run from a non .py ending script

2013-12-10 Thread Olivier Grisel
Changes by Olivier Grisel olivier.gri...@ensta.org: -- type: - crash ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19946 ___ ___ Python-bugs-list

[issue19946] multiprocessing crash with forkserver or spawn when run from a non .py ending script

2013-12-10 Thread Olivier Grisel
Olivier Grisel added the comment: So the question is exactly what module is being passed to importlib.find_spec() and why isn't it finding a spec/loader for that module. The module is the `nosetests` python script. module_name == 'nosetests' in this case. However, nosetests

[issue19946] Have multiprocessing raise ImportError when spawning a process that can't find the main module

2013-12-11 Thread Olivier Grisel
Olivier Grisel added the comment: I agree that a failure to lookup the module should raise an explicit exception. Second, there is no way that 'nosetests' will ever succeed as an import since, as Oliver pointed out, it doesn't end in '.py' or any other identifiable way for a finder to know

[issue19946] Have multiprocessing raise ImportError when spawning a process that can't find the main module

2013-12-11 Thread Olivier Grisel
Olivier Grisel added the comment: what is sys.modules['__main__'] and sys.modules['__main__'].__file__ if you run under nose? $ cat check_stuff.py import sys def test_main(): print(sys.modules['__main__']=%r % sys.modules['__main__']) print(sys.modules['__main__

[issue19946] Have multiprocessing raise ImportError when spawning a process that can't find the main module

2013-12-11 Thread Olivier Grisel
Olivier Grisel added the comment: Note however that the problem is not specific to nose. If I rename my initial 'check_forserver.py' script to 'check_forserver', add the '#!/usr/bin/env python' header and make it 'chmod +x' I get the same crash. So the problem is related to the fact

[issue19946] Have multiprocessing raise ImportError when spawning a process that can't find the main module

2013-12-11 Thread Olivier Grisel
Olivier Grisel added the comment: Here is a patch that uses `imp.load_source` when the first importlib name-based lookup fails. Apparently it fixes the issue on my box but I am not sure whether this is the correct way to do it. -- keywords: +patch Added file: http://bugs.python.org

[issue19946] Have multiprocessing raise ImportError when spawning a process that can't find the main module

2013-12-13 Thread Olivier Grisel
Olivier Grisel added the comment: Why has this issue been closed? Won't the spawn and forkserver mode work in Python 3.4 for Python program started by a Python script (which is probably the majority of programs written in Python under unix)? Is there any reason not to use the `imp.load_source

[issue19946] Have multiprocessing raise ImportError when spawning a process that can't find the main module

2013-12-13 Thread Olivier Grisel
Olivier Grisel added the comment: The semantics are not going to change in python 3.4 and will just stay as they were in Python 3.3. Well the semantics do change: in Python 3.3 the spawn and forkserver modes did not exist at all. The spawn mode existed but only implicitly and only under

[issue19946] Have multiprocessing raise ImportError when spawning a process that can't find the main module

2013-12-13 Thread Olivier Grisel
Olivier Grisel added the comment: I can wait (or monkey-patch the stuff I need as a temporary workaround in my code). My worry is that Python 3.4 will introduce a new feature that is very crash-prone. Take this simple program that uses the newly introduced `get_context` function (the same

[issue19946] Have multiprocessing raise ImportError when spawning a process that can't find the main module

2013-12-13 Thread Olivier Grisel
Olivier Grisel added the comment: For Python 3.4: Maybe rather than raising ImportError, we could issue warning to notify the users that names from the __main__ namespace could not be loaded and make the init_module_attrs return early. This way a multiprocessing program that only calls

[issue19946] Handle a non-importable __main__ in multiprocessing

2013-12-16 Thread Olivier Grisel
Olivier Grisel added the comment: I applied issue19946_pep_451_multiprocessing_v2.diff and I confirm that it fixes the problem that I reported initially. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue19946

[issue21905] RuntimeError in pickle.whichmodule when sys.modules if mutated

2014-07-02 Thread Olivier Grisel
New submission from Olivier Grisel: `pickle.whichmodule` performs an iteration over `sys.modules` and tries to perform `getattr` calls on those modules. Unfortunately some modules such as those from the `six.moves` dynamic module can trigger imports when calling `getattr` on them, hence

[issue21905] RuntimeError in pickle.whichmodule when sys.modules if mutated

2014-07-03 Thread Olivier Grisel
Olivier Grisel added the comment: New version of the patch to add an inline comment. -- Added file: http://bugs.python.org/file35841/pickle_whichmodule_20140703.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue21905

[issue21905] RuntimeError in pickle.whichmodule when sys.modules if mutated

2014-10-06 Thread Olivier Grisel
Olivier Grisel added the comment: No problem. Thanks Antoine for the review! -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue21905 ___ ___ Python

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-10 Thread Olivier Grisel
Olivier Grisel <olivier.gri...@ensta.org> added the comment: In my last comment, I also reported the user times (not spend in OS level disk access stuff): the code of the PR is on the order of 300-400ms while master is around 800ms o

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-10 Thread Olivier Grisel
Olivier Grisel <olivier.gri...@ensta.org> added the comment: I have pushed a new version of the code that now has a 10% overhead for small bytes (instead of 40% previously). It could be possible to optimize further but I think that would render the code much less readable so I

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-10 Thread Olivier Grisel
Olivier Grisel <olivier.gri...@ensta.org> added the comment: Actually, I think this can still be improved while keeping it readable. Let me try again :) -- ___ Python tracker <rep...@bugs.python.org> <https://bugs.python

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-10 Thread Olivier Grisel
Olivier Grisel <olivier.gri...@ensta.org> added the comment: Alright, the last version has now ~4% overhead for small bytes. -- ___ Python tracker <rep...@bugs.python.org> <https://bugs.python

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-11 Thread Olivier Grisel
Olivier Grisel <olivier.gri...@ensta.org> added the comment: Alright, I found the source of my refcounting bug. I updated the PR to include the C version of the dump for PyBytes. I ran Serhiy's microbenchmarks on the C version and I could not detect any overhead on small bytes objects w

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-10 Thread Olivier Grisel
Olivier Grisel <olivier.gri...@ensta.org> added the comment: BTW, I am looking at the C implementation at the moment. I think I can do it. -- ___ Python tracker <rep...@bugs.python.org> <https://bugs.python

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-12 Thread Olivier Grisel
Olivier Grisel <olivier.gri...@ensta.org> added the comment: > While we are here, wouldn't be worth to flush the buffer in the C > implementation to the disk always after committing a frame? This will save a > memory when dump a lot of small objects. I think it's a good idea

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-12 Thread Olivier Grisel
Olivier Grisel <olivier.gri...@ensta.org> added the comment: Flushing the buffer at each frame commit will cause a medium-sized write every 64kB on average (instead of one big write at the end). So that might actually cause a performance regression for some users if the individual file-

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-10 Thread Olivier Grisel
Olivier Grisel <olivier.gri...@ensta.org> added the comment: I have tried to implement the direct write bypass for the C version of the pickler but I get a segfault in a Py_INCREF on obj during the call to memo_put(self, obj) after the call to _Pickler_write_large_bytes. Here is th

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-12 Thread Olivier Grisel
Olivier Grisel <olivier.gri...@ensta.org> added the comment: Thanks Antoine, I updated my code to what you suggested. -- ___ Python tracker <rep...@bugs.python.org> <https://bugs.python

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-09 Thread Olivier Grisel
New submission from Olivier Grisel <olivier.gri...@ensta.org>: I noticed that both pickle.Pickler (C version) and pickle._Pickler (Python version) make unnecessary memory copies when dumping large str, bytes and bytearray objects. This is caused by unnecessary concatenation of the

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-09 Thread Olivier Grisel
Olivier Grisel <olivier.gri...@ensta.org> added the comment: More benchmarks with the unix time command: ``` (py37) ogrisel@ici:~/code/cpython$ git checkout master Switched to branch 'master' Your branch is up-to-date with 'origin/master'. (py37) ogrisel@ici:~/code/cpython$ time python

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-09 Thread Olivier Grisel
Olivier Grisel <olivier.gri...@ensta.org> added the comment: I wrote a script to monitor the memory when dumping 2GB of data with python master (C pickler and Python pickler): ``` (py37) ogrisel@ici:~/code/cpython$ python ~/tmp/large_pickle_dump.py Allocating source data... => pe

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2017-11-09 Thread Olivier Grisel
Olivier Grisel <olivier.gri...@ensta.org> added the comment: Note that the time difference is not significant. I rerun the last command I got: ``` (py37) ogrisel@ici:~/code/cpython$ python ~/tmp/large_pickle_dump.py --use-pypickle Allocating source data... => peak memory usage:

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2018-01-06 Thread Olivier Grisel
Olivier Grisel <olivier.gri...@ensta.org> added the comment: Shall we close this issue now that the PR has been merged to master? -- ___ Python tracker <rep...@bugs.python.org> <https://bugs.python

[issue31993] pickle.dump allocates unnecessary temporary bytes / str

2018-01-06 Thread Olivier Grisel
Olivier Grisel <olivier.gri...@ensta.org> added the comment: Thanks for the very helpful feedback and guidance during the review. -- ___ Python tracker <rep...@bugs.python.org> <https://bugs.python

[issue35900] Add pickler hook for the user to customize the serialization of user defined functions and types.

2019-02-13 Thread Olivier Grisel
Olivier Grisel added the comment: Adding such a hook would make it possible to reimplement cloudpickle.CloudPickler by deriving from the fast _pickle.Pickler class (instead of the slow pickle._Pickler as done currently). This would mean rewriting most of the CloudPickler method to only rely

[issue36867] Make semaphore_tracker track other system resources

2019-05-13 Thread Olivier Grisel
Olivier Grisel added the comment: As Victor said, the `time.sleep(1.0)` might lead to Heisen failures. I am not sure how to write proper strong synchronization in this case but we could instead go for something intermediate such as the following pattern: ... p.terminate