El 29/11/13 13:08, Reuti escribió:
Hi,

Am 29.11.2013 um 12:41 schrieb Txema Heredia:

Hi all,

We are having some problems with jobs using a C++ binary program that, simply 
put, ignores all slot allocations.

The C code in question uses a call to "sysconf(_SC_NPROCESSORS_ONLN);" to 
determine the number of threads it can open, and pthreads to parallelize.
The problem is that this retrieves all the online cores, not just the assigned 
ones by either SGE or core-binding. So we end up with 12 jobs in a node, each 
with 1 assigned slot by SGE, the whole job core-binded to that core, but each 
job using 12 threads that are fighting for cpu cycles inside that single core. 
Then, load average skyrockets and the node is no longer usable until the 
cpu-storm passed.

I have been investigating a little and I haven't found any "out-of-the-box" method to 
have C report the "granted" number of cores. All the direct methods (single-function 
call) I have tested report the total number of cores in the system:
sysconf(_SC_NPROCESSORS_ONLN);
sysconf(_SC_NPROCESSORS_CONF);
get_nprocs_conf ();
get_nprocs ();
Why not just using the $NSLOTS environment variable with getenv("NSLOTS")? Best 
would be to test whether the result is NULL and so it's running outside of SGE and to use 
the original used functions in this case.

In case your applications are dynamically linked, you could load a prepared library with 
LD_PRELOAD replacing the the "sysconf(_SC_NPROCESSORS_ONLN);" calls by the 
result of the environment variable and forward other cases to the default libc.

-- Reuti

Using the NSLOTS variable has implications when the user is, for instance, testing things in the front-end node. We could reach a compromise using:

#include <pthread.h>
#include <unistd.h>
#include <algorithm>


int ImprovedGetCPUCount()
{
        cpu_set_t cs;
        CPU_ZERO(&cs);
        sched_getaffinity(0, sizeof(cs), &cs);
        int count_binding = 0;
        int nprocs=sysconf(_SC_NPROCESSORS_ONLN);
        for (int i = 0; i < nprocs; i++)
        {
                if (CPU_ISSET(i, &cs))
                        count_binding++;
        }

        char *env = getenv("NSLOTS");
        int nslots = 1;
        if (env != NULL) {
                nslots = atoi(env);
        } else {
                return count_binding;
        }
        return std::min(nslots,count_binding);
}


This gives either the minimum between the allowed core-binding and the value of NSLOTS, or directly the core-binding value if NSLOTS is not present. But this arises new questions. What do we want to do in the login node? Allow all threads or shield the system against user's misbehaviour?

I gave a look to the LD_PRELOAD hack, and even fiddled a little with it, but it is something extremely messy: Anything done with it should be applied not only system-wide, but cluster-wide. The chances of screwing something important up are too high. Its a really cool idea, but it is too dangerous to use it in a non-controlled scenario.

Maybe it could be used strictly in the given problematic binaries by replacing the binary for an export + binary call, but this relies on users making a good use of it. I'll see if I can make a functional nslots.so and report back.

As for other non-source-modifying and non-dynamic-libraries-messing methods, the only thing I have found is using this:

|echo 0 > /sys/devices/system/cpu/cpuX/online
|

which, fully disables a CPU and, obviously, is not an option.


The only method I have found ( 
http://stackoverflow.com/questions/4586405/get-number-of-cpus-in-linux-using-c 
) to report the proper number of assigned cores, requires creating a function 
that loops and checks the job affinity for all the cores.
This method (apparently) works. It at least reports the number of core-binded 
cores.

For reference, this is the code I tested:

#include <pthread.h>
#include <unistd.h>
#include <sys/sysinfo.h>
#include <stdio.h>


int GetCPUCount()
{
        cpu_set_t cs;
        CPU_ZERO(&cs);
        sched_getaffinity(0, sizeof(cs), &cs);

        int count = 0;
        for (int i = 0; i < get_nprocs(); i++)
        {
                if (CPU_ISSET(i, &cs))
                        count++;
        }
        return count;
}


int main(int argc, char* argv[]){
        long sc = sysconf(_SC_NPROCESSORS_ONLN);
        long sc_conf = sysconf(_SC_NPROCESSORS_CONF);
        long nprocs_conf = get_nprocs_conf ();
        long nprocs = get_nprocs ();
        long sched = GetCPUCount();

        printf("sysconf(_SC_NPROCESSORS_ONLN) = %d\n",sc);
        printf("sysconf(_SC_NPROCESSORS_CONF) = %d\n",sc_conf);
        printf("get_nprocs_conf() = %d\n",nprocs_conf);
        printf("get_nprocs() =  %d\n",nprocs);
        printf("sched_getaffinity = %d\n",sched);
}


After submitting it in a job, these are the results:

#1-slot, core binding=1
qsub -cwd -l h_vmem=500M -binding linear:1 -b y ./test_n_procs

sysconf(_SC_NPROCESSORS_ONLN) = 12
sysconf(_SC_NPROCESSORS_CONF) = 12
get_nprocs_conf() = 12
get_nprocs() =  12
sched_getaffinity = 1

#3-slots, core binding=1
qsub -cwd -l h_vmem=500M -pe threaded 3 -binding linear:1 -b y ./test_n_procs

sysconf(_SC_NPROCESSORS_ONLN) = 12
sysconf(_SC_NPROCESSORS_CONF) = 12
get_nprocs_conf() = 12
get_nprocs() =  12
sched_getaffinity = 1

#3-slots, core binding=3
qsub -cwd -l h_vmem=500M -pe threaded 3 -binding linear:3 -b y ./test_n_procs

sysconf(_SC_NPROCESSORS_ONLN) = 12
sysconf(_SC_NPROCESSORS_CONF) = 12
get_nprocs_conf() = 12
get_nprocs() =  12
sched_getaffinity = 3

#3-to-6-slots, core binding=6
qsub -cwd -l h_vmem=500M -pe threaded 3-6 -binding linear:6 -b y ./test_n_procs

sysconf(_SC_NPROCESSORS_ONLN) = 12
sysconf(_SC_NPROCESSORS_CONF) = 12
get_nprocs_conf() = 12
get_nprocs() =  12
sched_getaffinity = 6



Has anyone encountered this problem before? Is there a more elegant solution? 
Is there a way that doesn't require reprograming all the software that faces 
this problem?

Thanks in advance,

Txema
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to