Re: [py-dev] xdist and thread-safe resource counting

Ateljevich, Eli Fri, 20 Jan 2012 12:58:06 -0800

Thanks, Holger. I appreciate the hint about where to do the testing and waiting 
in pytest_runtest_setup and I think the atomic file rename idea is an 
interesting way to set up a signal.

I may not fully understand about xdist. I certainly agree it is efficient use 
of mpirun that is the crux of doing the job right ... so probably any load 
balancing offered by xdist is going to be wasted on me. 

The main service I was looking for out of xdist was the ability to run tests 
concurrently. As I think you realize, if I have a pool of 16 processors and the 
first four tests collected require 8, 4, 8, 4 processors, I would want this 
behavior:
1.  the first test to start immediately
2.  the second test to start immediately without the first finishing
3.  the third test to either wait or start in a python sense but "sleep" before 
launching mpi
4.  the fourth test to start immediately

Is vanilla py.test able to do this kind of concurrent testing? Or would I need 
to tweak it to launch tests in threads according to my criterion for readiness? 

I think we have settled how I would allocate resources, but your idea implies I 
might have all the test hints in one place. If I have full control all the test 
launches this might allow me to do some sort of knapsack problem-ish kind of 
reorganization to keep everything fully utilized rather than taking the test in 
the order they were collected. For instance, if I had 16 processors and the 
first four tests take 12-12-4-4 I could do this in the order (12+4 
concurrently) (12+4 concurrently). Do I have this level of control?

Thanks,
Eli

_________________________
_______________
From: holger krekel [hol...@merlinux.eu]
Sent: Friday, January 20, 2012 12:50 AM
To: Ateljevich, Eli
Cc: py-dev@codespeak.net
Subject: Re: [py-dev] xdist and thread-safe resource counting

Hi Eli,

interesting problem.

On Wed, Jan 18, 2012 at 20:55 -0800, Ateljevich, Eli wrote:
> I have a question about managing resources in a threadsafe way across xdist 
> -n.
>
> My group is using py.test as a high-level driver for testing an mpi-based 
> numerical code. Many of our system-level tests wrap a system call to mpirun 
> then postprocess results. I have a decorator for the tests that hints at the 
> number of processors needed (usually something like 1,2,8).
>
> I would like to launch as much as I can at once given the available 
> processors. For instance, if 16 processors are available there is no reason I 
> couldn't be doing a 12 and a 4 processor test. I was thinking of using xdist 
> with some modest number of processors representing the maximum number of 
> concurrent tests. The xdist test processors would launch mpi jobs when enough 
> processors become available to satisfy the np hint for that test. This would 
> be managed by having the tests "check out" cores and sleep if they aren't 
> available yet.
>
> This design requires a threadsafe method to query, acquire and lock the count 
> of available mpi cores. I could use some sort of lock or semaphore from 
> threading, but I thought it would be good to run this by the xdist 
> cognoscenti and find out if there might be a preferred way of doing this 
> given how xdist itself distributes its work or manages threads.

pytest-xdist itself does not provide or use a method to query the number
of available processors.  Quick background of xdist: Master process starts
a number of processes which collect tests (see output of py.test --collectonly)
and the master sees the test ids of all those collections.  It then decides
the scheduling (Each or Load at the moment, "-n5" implies load-balancing) and
sends test ids to the nodes to execute.  It pre-loads tests with test ids
and then waits for completion for sending more test ids to each node.
There is no node-to-node communication for co-ordination.

It might be easiest to not try to extend the xdist-mechanisms
but to implement an independent method which co-ordinates the number of running
MPI tests / used processors via a file or so.  For example, on posix you
can get read/write a file with some meta-information and use the
atomic os.rename operation.  Not sure about the exact semantics but
this should be doable and testable without any xdist involvement.
If you have such a method which helps to restrict the number
of MPI-processes you can then use it from a pytest_runtest_setup which
can read your decorator-attributes/markers and then make the decision
if to wait or run the test.  This method also makes you rather independent
from the number of worker processes started with "-nNUM".

HTH,
holger
_______________________________________________
py-dev mailing list
py-dev@codespeak.net
http://codespeak.net/mailman/listinfo/py-dev

Re: [py-dev] xdist and thread-safe resource counting

Reply via email to