Thanks, Holger. I appreciate the hint about where to do the testing and waiting in pytest_runtest_setup and I think the atomic file rename idea is an interesting way to set up a signal.
I may not fully understand about xdist. I certainly agree it is efficient use of mpirun that is the crux of doing the job right ... so probably any load balancing offered by xdist is going to be wasted on me. The main service I was looking for out of xdist was the ability to run tests concurrently. As I think you realize, if I have a pool of 16 processors and the first four tests collected require 8, 4, 8, 4 processors, I would want this behavior: 1. the first test to start immediately 2. the second test to start immediately without the first finishing 3. the third test to either wait or start in a python sense but "sleep" before launching mpi 4. the fourth test to start immediately Is vanilla py.test able to do this kind of concurrent testing? Or would I need to tweak it to launch tests in threads according to my criterion for readiness? I think we have settled how I would allocate resources, but your idea implies I might have all the test hints in one place. If I have full control all the test launches this might allow me to do some sort of knapsack problem-ish kind of reorganization to keep everything fully utilized rather than taking the test in the order they were collected. For instance, if I had 16 processors and the first four tests take 12-12-4-4 I could do this in the order (12+4 concurrently) (12+4 concurrently). Do I have this level of control? Thanks, Eli _________________________ _______________ From: holger krekel [hol...@merlinux.eu] Sent: Friday, January 20, 2012 12:50 AM To: Ateljevich, Eli Cc: py-dev@codespeak.net Subject: Re: [py-dev] xdist and thread-safe resource counting Hi Eli, interesting problem. On Wed, Jan 18, 2012 at 20:55 -0800, Ateljevich, Eli wrote: > I have a question about managing resources in a threadsafe way across xdist > -n. > > My group is using py.test as a high-level driver for testing an mpi-based > numerical code. Many of our system-level tests wrap a system call to mpirun > then postprocess results. I have a decorator for the tests that hints at the > number of processors needed (usually something like 1,2,8). > > I would like to launch as much as I can at once given the available > processors. For instance, if 16 processors are available there is no reason I > couldn't be doing a 12 and a 4 processor test. I was thinking of using xdist > with some modest number of processors representing the maximum number of > concurrent tests. The xdist test processors would launch mpi jobs when enough > processors become available to satisfy the np hint for that test. This would > be managed by having the tests "check out" cores and sleep if they aren't > available yet. > > This design requires a threadsafe method to query, acquire and lock the count > of available mpi cores. I could use some sort of lock or semaphore from > threading, but I thought it would be good to run this by the xdist > cognoscenti and find out if there might be a preferred way of doing this > given how xdist itself distributes its work or manages threads. pytest-xdist itself does not provide or use a method to query the number of available processors. Quick background of xdist: Master process starts a number of processes which collect tests (see output of py.test --collectonly) and the master sees the test ids of all those collections. It then decides the scheduling (Each or Load at the moment, "-n5" implies load-balancing) and sends test ids to the nodes to execute. It pre-loads tests with test ids and then waits for completion for sending more test ids to each node. There is no node-to-node communication for co-ordination. It might be easiest to not try to extend the xdist-mechanisms but to implement an independent method which co-ordinates the number of running MPI tests / used processors via a file or so. For example, on posix you can get read/write a file with some meta-information and use the atomic os.rename operation. Not sure about the exact semantics but this should be doable and testable without any xdist involvement. If you have such a method which helps to restrict the number of MPI-processes you can then use it from a pytest_runtest_setup which can read your decorator-attributes/markers and then make the decision if to wait or run the test. This method also makes you rather independent from the number of worker processes started with "-nNUM". HTH, holger _______________________________________________ py-dev mailing list py-dev@codespeak.net http://codespeak.net/mailman/listinfo/py-dev