Re: [hwloc-users] Problem getting cpuset of MPI task
Hendryk Bockelmann, le Thu 10 Feb 2011 11:00:25 +0100, a écrit : > btw: are there any plans to fully support POWER6 and/or POWER7 running > AIX6.1 for the future? Actually we can get the topology right but cache > sizes are missing. obj->attr->cache.size = 0; /* TODO: ? */ :) I don't know which AIX API can provide it. Samuel
Re: [hwloc-users] Problem getting cpuset of MPI task
Hello Samuel, thanx for the hint ... now I start my program with: hwloc_topology_init(&topology); hwloc_topology_set_flags(topology,HWLOC_TOPOLOGY_FLAG_WHOLE_SYSTEM); hwloc_topology_load(topology); and can access all information I need to rebind my MPI-tasks or to rearrange the MPI communicators. btw: are there any plans to fully support POWER6 and/or POWER7 running AIX6.1 for the future? Actually we can get the topology right but cache sizes are missing. Hendryk On 10/02/11 10:40, Samuel Thibault wrote: Hello, Hendryk Bockelmann, le Thu 10 Feb 2011 09:08:11 +0100, a écrit : On our clusters the job scheduler binds the MPI tasks, but it is not always clear to which resources. So for us it would be great to know where a task runs such that we might adopt the MPI communicators to increase performance. Ok, so get_cpubind should be enough to know what binding the job scheduler does. Maybe just a note on the hwloc output on the cluster: while on my locale machine all MPI tasks are able to explore the whole topology, on the cluster each task only sees itself, e.g. for task 7: 7:Machine#0(Backend=AIXOSName=AIXOSRelease=1OSVersion=6HostName=p191Architecture=00C83AC24C00), cpuset: 0xc000 7: NUMANode#0, cpuset: 0xc000 7:L2Cache#0(0KB line=0), cpuset: 0xc000 7: Core#0, cpuset: 0xc000 7:PU, cpuset: 0x4000 7:PU#0, cpuset: 0x8000 7:--> root_cpuset of process 7 is 0xc000 Yes, because by default hwloc restricts itself to what you are allowed to use anyway. To see more, use --whole-system. Nevertheless, all MPI-tasks have different cpusets and since the nodes are homogeneous one can guess the whole binding using the information of lstopo and the HostName of each task. Perhaps you can tell me whether such a restricted topology is due to hwloc or due to the fixed binding by the job scheduler? It's because by default hwloc follows the fixed binding :) Samuel smime.p7s Description: S/MIME Cryptographic Signature
Re: [hwloc-users] Problem getting cpuset of MPI task
Hello, Hendryk Bockelmann, le Thu 10 Feb 2011 09:08:11 +0100, a écrit : > On our clusters the job scheduler binds the MPI tasks, but it is not > always clear to which resources. So for us it would be great to know > where a task runs such that we might adopt the MPI communicators to > increase performance. Ok, so get_cpubind should be enough to know what binding the job scheduler does. > Maybe just a note on the hwloc output on the cluster: while on my locale > machine all MPI tasks are able to explore the whole topology, on the > cluster each task only sees itself, e.g. for task 7: > > 7:Machine#0(Backend=AIXOSName=AIXOSRelease=1OSVersion=6HostName=p191Architecture=00C83AC24C00), > > cpuset: 0xc000 > 7: NUMANode#0, cpuset: 0xc000 > 7:L2Cache#0(0KB line=0), cpuset: 0xc000 > 7: Core#0, cpuset: 0xc000 > 7:PU, cpuset: 0x4000 > 7:PU#0, cpuset: 0x8000 > 7:--> root_cpuset of process 7 is 0xc000 Yes, because by default hwloc restricts itself to what you are allowed to use anyway. To see more, use --whole-system. > Nevertheless, all MPI-tasks have different cpusets and since the nodes > are homogeneous one can guess the whole binding using the information of > lstopo and the HostName of each task. Perhaps you can tell me whether > such a restricted topology is due to hwloc or due to the fixed binding > by the job scheduler? It's because by default hwloc follows the fixed binding :) Samuel
Re: [hwloc-users] Problem getting cpuset of MPI task
Hey Brice, I already though so, but thank you for the explanation. On our clusters the job scheduler binds the MPI tasks, but it is not always clear to which resources. So for us it would be great to know where a task runs such that we might adopt the MPI communicators to increase performance. Maybe just a note on the hwloc output on the cluster: while on my locale machine all MPI tasks are able to explore the whole topology, on the cluster each task only sees itself, e.g. for task 7: 7:Machine#0(Backend=AIXOSName=AIXOSRelease=1OSVersion=6HostName=p191Architecture=00C83AC24C00), cpuset: 0xc000 7: NUMANode#0, cpuset: 0xc000 7:L2Cache#0(0KB line=0), cpuset: 0xc000 7: Core#0, cpuset: 0xc000 7:PU, cpuset: 0x4000 7:PU#0, cpuset: 0x8000 7:--> root_cpuset of process 7 is 0xc000 Nevertheless, all MPI-tasks have different cpusets and since the nodes are homogeneous one can guess the whole binding using the information of lstopo and the HostName of each task. Perhaps you can tell me whether such a restricted topology is due to hwloc or due to the fixed binding by the job scheduler? Greetings, Hendryk On 09/02/11 17:12, Brice Goglin wrote: Le 09/02/2011 16:53, Hendryk Bockelmann a écrit : Since I am new to hwloc there might be a misunderstanding from my side, but I have a problem getting the cpuset of MPI tasks. I just want to run a simple MPI program to see on which cores (or CPUs in case of hyperthreading or SMT) the tasks run, so that I can arrange my MPI communicators. For the program below I get the following output: Process 0 of 2 on tide Process 1 of 2 on tide --> cpuset of process 0 is 0x000f --> cpuset of process 0 after singlify is 0x0001 --> cpuset of process 1 is 0x000f --> cpuset of process 1 after singlify is 0x0001 So why do both MPI tasks report the same cpuset? Hello Hendryk, Your processes are not bound, there may run anywhere they want. hwloc_get_cpubind() tells you where they are bound. That's why the cpuset is 0xf first (all the existing logical processors in the machine). You want to know where they actually run. It's different from where there are bound. The former is included in the latter. The former is a single processor, while the later may be any combination of any processors). hwloc cannot tell you where a task run. But I am looking at implementing it. I actually sent a patch to hwloc-devel about it yesterday [1]. You would just have to replace get_cpubind with get_cpuexec (or whatever the final function name is). You should note that such a function would not be guaranteed to return something true since the process may migrate to another processor in the meantime. Also note that hwloc_bitmap_singlify is usually used to "simplify" a cpuset (to avoid migration between multiple SMT for instance) before binding a task (calling set_cpubind). It's useless in your code above. Brice [1] http://www.open-mpi.org/community/lists/hwloc-devel/2011/02/1915.php Here is the program (attached you find the output of hwloc-gather-topology.sh): #include #include #include "hwloc.h" #include "mpi.h" int main(int argc, char* argv[]) { hwloc_topology_t topology; hwloc_bitmap_t cpuset; char *str = NULL; int myid, numprocs, namelen; char procname[MPI_MAX_PROCESSOR_NAME]; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&numprocs); MPI_Comm_rank(MPI_COMM_WORLD,&myid); MPI_Get_processor_name(procname,&namelen); printf("Process %d of %d on %s\n", myid, numprocs, procname); hwloc_topology_init(&topology); hwloc_topology_load(topology); /* get native cpuset of this process */ cpuset = hwloc_bitmap_alloc(); hwloc_get_cpubind(topology, cpuset, 0); hwloc_bitmap_asprintf(&str, cpuset); printf("--> cpuset of process %d is %s\n", myid, str); free(str); hwloc_bitmap_singlify(cpuset); hwloc_bitmap_asprintf(&str, cpuset); printf("--> cpuset of process %d after singlify is %s\n", myid, str); free(str); hwloc_bitmap_free(cpuset); hwloc_topology_destroy(topology); MPI_Finalize(); return 0; } smime.p7s Description: S/MIME Cryptographic Signature
Re: [hwloc-users] Problem getting cpuset of MPI task
Le 09/02/2011 16:53, Hendryk Bockelmann a écrit : > Since I am new to hwloc there might be a misunderstanding from my > side, but I have a problem getting the cpuset of MPI tasks. I just > want to run a simple MPI program to see on which cores (or CPUs in > case of hyperthreading or SMT) the tasks run, so that I can arrange my > MPI communicators. > > For the program below I get the following output: > > Process 0 of 2 on tide > Process 1 of 2 on tide > --> cpuset of process 0 is 0x000f > --> cpuset of process 0 after singlify is 0x0001 > --> cpuset of process 1 is 0x000f > --> cpuset of process 1 after singlify is 0x0001 > > So why do both MPI tasks report the same cpuset? Hello Hendryk, Your processes are not bound, there may run anywhere they want. hwloc_get_cpubind() tells you where they are bound. That's why the cpuset is 0xf first (all the existing logical processors in the machine). You want to know where they actually run. It's different from where there are bound. The former is included in the latter. The former is a single processor, while the later may be any combination of any processors). hwloc cannot tell you where a task run. But I am looking at implementing it. I actually sent a patch to hwloc-devel about it yesterday [1]. You would just have to replace get_cpubind with get_cpuexec (or whatever the final function name is). You should note that such a function would not be guaranteed to return something true since the process may migrate to another processor in the meantime. Also note that hwloc_bitmap_singlify is usually used to "simplify" a cpuset (to avoid migration between multiple SMT for instance) before binding a task (calling set_cpubind). It's useless in your code above. Brice [1] http://www.open-mpi.org/community/lists/hwloc-devel/2011/02/1915.php > Here is the program (attached you find the output of > hwloc-gather-topology.sh): > > #include > #include > #include "hwloc.h" > #include "mpi.h" > > int main(int argc, char* argv[]) { > >hwloc_topology_t topology; >hwloc_bitmap_t cpuset; >char *str = NULL; >int myid, numprocs, namelen; >char procname[MPI_MAX_PROCESSOR_NAME]; > >MPI_Init(&argc,&argv); >MPI_Comm_size(MPI_COMM_WORLD,&numprocs); >MPI_Comm_rank(MPI_COMM_WORLD,&myid); >MPI_Get_processor_name(procname,&namelen); > >printf("Process %d of %d on %s\n", myid, numprocs, procname); > >hwloc_topology_init(&topology); >hwloc_topology_load(topology); > >/* get native cpuset of this process */ >cpuset = hwloc_bitmap_alloc(); >hwloc_get_cpubind(topology, cpuset, 0); >hwloc_bitmap_asprintf(&str, cpuset); >printf("--> cpuset of process %d is %s\n", myid, str); >free(str); >hwloc_bitmap_singlify(cpuset); >hwloc_bitmap_asprintf(&str, cpuset); >printf("--> cpuset of process %d after singlify is %s\n", myid, str); >free(str); > >hwloc_bitmap_free(cpuset); >hwloc_topology_destroy(topology); > >MPI_Finalize(); >return 0; > } > > > ___ > hwloc-users mailing list > hwloc-us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users >
Re: [hwloc-users] Problem getting cpuset of MPI task
Hendryk Bockelmann, le Wed 09 Feb 2011 16:57:43 +0100, a écrit : > Since I am new to hwloc there might be a misunderstanding from my side, > but I have a problem getting the cpuset of MPI tasks. >/* get native cpuset of this process */ >cpuset = hwloc_bitmap_alloc(); >hwloc_get_cpubind(topology, cpuset, 0); get_cpubind gives where the threads are bound, not where they are actually running. If you haven't bound them yourself, the default is no binding, i.e. all CPUs are allowed, thus a full mask, that's why you get 0xf for all of them. >hwloc_bitmap_singlify(cpuset); Singlify is just an operation on the resulting cpu mask, taking the first of them. That's why you end up with just 0x1. Adding a function that returns where threads are actually running is on the TODO list for hwloc 1.2. Samuel