[hwloc-users] glibc struggling with get_nprocs and get_nprocs_conf

2022-02-07 Thread Samuel Thibault
Hello,

For information, glibc is struggling with the problematic of the
precise meaning of get_nprocs, get_nprocs_conf, _SC_NPROCESSORS_CONF,
_SC_NPROCESSORS_ONLN

https://sourceware.org/pipermail/libc-alpha/2022-February/136177.html

Samuel
___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users


Re: [hwloc-users] hwloc cant detect hardware topology error.

2020-07-18 Thread Samuel Thibault
Yogesh Sharma, le sam. 18 juil. 2020 15:59:57 +0530, a ecrit:
> i am new to ubuntu. can you give me a moment and help me get through command
> lines here

It is really just the same as you tried, but with the libhwloc-dev
package instead of hwloc.

Samuel
___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users


Re: [hwloc-users] hwloc cant detect hardware topology error.

2020-07-18 Thread Samuel Thibault
Hello,

Yogesh Sharma, le sam. 18 juil. 2020 15:02:30 +0530, a ecrit:
>  i tried  sudo apt -get hwloc=1.11.3

It is the libhwloc-dev package that you need to downgrade.

But 1.11 is old, and the software really needs to be ported to the hwloc
2 API.

Samuel
___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users

Re: [hwloc-users] One more silly warning squash

2020-06-02 Thread Samuel Thibault
Balaji, Pavan, le mar. 02 juin 2020 09:31:29 +, a ecrit:
> > On Jun 1, 2020, at 4:11 AM, Balaji, Pavan via hwloc-users 
> >  wrote:
> >> On Jun 1, 2020, at 4:10 AM, Balaji, Pavan  wrote:
> >>> On Jun 1, 2020, at 4:06 AM, Samuel Thibault  
> >>> wrote:
> >>> could you check whether the attached patch avoids the warning?
> >>> (we should really not need a cast to const char*)
> >> 
> >> The attached patch is basically the same as what we are using, isn't it?  
> >> It does avoid the warning.
> > 
> > Oh, sorry, I see now that you skipped the extra cast in that case.  Let me 
> > try it out and get back to you.
> 
> I've verified that the patch works.

Ok, I pushed the fix to master, thanks!

Samuel
___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users


Re: [hwloc-users] One more silly warning squash

2020-06-01 Thread Samuel Thibault
Balaji, Pavan, le lun. 01 juin 2020 09:10:21 +, a ecrit:
> > On Jun 1, 2020, at 4:06 AM, Samuel Thibault  
> > wrote:
> > could you check whether the attached patch avoids the warning?
> > (we should really not need a cast to const char*)
> 
> The attached patch is basically the same as what we are using, isn't it?

Yes, but without the cast, which a compiler should really not require :)

> It does avoid the warning.

Ok, thanks.

Samuel
___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users


Re: [hwloc-users] One more silly warning squash

2020-06-01 Thread Samuel Thibault
Hello,

Balaji, Pavan via hwloc-users, le lun. 01 juin 2020 03:39:02 +, a ecrit:
> We are seeing some warnings with the Intel compiler with hwloc (listed 
> below).  The warnings seem to be somewhat silly because there already is a 
> cast to "char *" from the string literal,

Well, I'd agree with icc that casting a string literal to (char*) is in
general a bad idea :)

> but it seems to expect a cast to "const char *" before casting to "char *".  
> We are maintaining the below patch to workaround it.  Can you either 
> integrate this or a better fix for the warning?
> 
> https://github.com/pmodels/hwloc/commit/fb27dc6e21bac14754d1b50b57f752e37d475704

I fixed them except:

>   CC   topology-hardwired.lo
> traversal.c(598): warning #3179: deprecated conversion of string literal to 
> char* (should be const char*)
> const char *quote = strchr(info->value, ' ') ? "\"" : "";
> ^

Which is more silly (these are already const char* in essence) and just
seems to me like icc's limited analysis.  Our version of icc doesn't get
these, could you check whether the attached patch avoids the warning?
(we should really not need a cast to const char*)

Samuel
diff --git a/hwloc/traversal.c b/hwloc/traversal.c
index 4062a19d..14549422 100644
--- a/hwloc/traversal.c
+++ b/hwloc/traversal.c
@@ -654,7 +654,11 @@ hwloc_obj_attr_snprintf(char * __hwloc_restrict string, 
size_t size, hwloc_obj_t
 unsigned i;
 for(i=0; iinfos_count; i++) {
   struct hwloc_info_s *info = >infos[i];
-  const char *quote = strchr(info->value, ' ') ? "\"" : "";
+  const char *quote;
+  if (strchr(info->value, ' '))
+quote = "\"";
+  else
+quote = "";
   res = hwloc_snprintf(tmp, tmplen, "%s%s=%s%s%s",
 prefix,
 info->name,
___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users

Re: [hwloc-users] question about hwloc_set_area_membind_nodeset

2017-11-12 Thread Samuel Thibault
Brice Goglin, on dim. 12 nov. 2017 05:19:37 +0100, wrote:
> That's likely what's happening. Each set_area() may be creating a new "virtual
> memory area". The kernel tries to merge them with neighbors if they go to the
> same NUMA node. Otherwise it creates a new VMA.

Mmmm, that sucks. Ideally we'd have a way to ask the kernel not to
strictly bind the memory, but just to allocate on a given memory
node, and just hope that the allocation will not go away (e.g. due to
swapping), which thus doesn't need a VMA to record the information. As
you describe below, first-touch achieves that but it's not necessarily
so convenient.

> I can't find the exact limit but it's something like 64k so I guess
> you're exhausting that.

It's sysctl vm.max_map_count

> Question 2 : Is there a better way of achieving the result I'm looking for
> (such as a call to membind with a stride of some kind to say put N pages 
> in
> a row on each domain in alternation).
> 
> 
> Unfortunately, the interleave policy doesn't have a stride argument. It's one
> page on node 0, one page on node 1, etc.
> 
> The only idea I have is to use the first-touch policy: Make sure your buffer
> isn't is physical memory yet, and have a thread on node 0 read the "0" pages,
> and another thread on node 1 read the "1" page.

Or "next-touch" if that was to ever get merged into mainline Linux :)

Samuel
___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users


Re: [hwloc-users] linkspeed in hwloc_obj_attr_u::hwloc_pcidev_attr_s struct while traversing topology

2017-10-13 Thread Samuel Thibault
Hello,

TEJASWI k, on ven. 13 oct. 2017 14:44:53 +0530, wrote:
> Thanks I could get the linkspeed when i tried with root user.
> But is there no other way?

See Brice's answer :)

> And what is the reason behind this limitation?

Ask Linux people, not us :)

I can only guess that they are afraid of exposing too much config
information, and thus only whitelist the first part.

Samuel
___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users


Re: [hwloc-users] linkspeed in hwloc_obj_attr_u::hwloc_pcidev_attr_s struct while traversing topology

2017-10-13 Thread Samuel Thibault
Hello,

TEJASWI k, on ven. 13 oct. 2017 14:23:00 +0530, wrote:
> All the other details I am able to query but linkspeed (pciObj->attr->
> bridge.upstream.pci.linkspeed) is always 0.
> Do I need to enable any other flag to get linkspeed or am I going wrong
> somewhere?

You need to run as root for hwloc to be able to read the linkspeed from
Linux.

Samuel
___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users


[hwloc-users] process names in lstopo --ps

2017-08-23 Thread Samuel Thibault
Hello,

The other day I modified the output of lstopo --ps to contain the end of
the cmdline instead of the beginning, because with module systems,
spack, etc. the path to application binaries get longer and longer, and
eventually the actual name of the binary goes away on the right.

But conversely, now it's the end of the options passed to the
application which show up, and when there are a lot, the application
name goes away again, on the left.

Would it be fine to people to just get rid of the path leading to the
application binary, and only show the file name of the binary in the
output of lstopo --ps?

(Basically, taking /proc/$pid/comm instead of trying to find the right
part of /proc/$pid/cmdline to be displayed.)

Samuel
___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/hwloc-users


Re: [OMPI users] Message reception not getting pipelined with TCP

2017-07-20 Thread Samuel Thibault
Gilles Gouaillardet, on ven. 21 juil. 2017 10:57:36 +0900, wrote:
> if you are fine with using more memory, and your application should not
> generate too much unexpected messages, then you can bump the eager_limit
> for example
> 
> mpirun --mca btl_tcp_eager_limit $((8*1024*1024+128)) ...

Thanks for the workaround!  Normally we shouldn't have many unexpected
messages, the memory consumption would be concerning, though.

Samuel
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] Message reception not getting pipelined with TCP

2017-07-20 Thread Samuel Thibault
Hello,

George Bosilca, on jeu. 20 juil. 2017 19:05:34 -0500, wrote:
> Can you reproduce the same behavior after the first batch of messages ?

Yes, putting a loop around the whole series of communications, event
with a 1-second pause in between, gets the same behavior repeated.

> Assuming the times showed on the left of your messages are correct, the first
> MPI seems to deliver the entire set of messages significantly faster than the
> second.

The second log was with mpich2.

Samuel
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


[OMPI users] Message reception not getting pipelined with TCP

2017-07-20 Thread Samuel Thibault
Hello,

We are getting a strong performance issue, which is due to a missing
pipelining behavior from OpenMPI when running over TCP. I have attached
a test case. Basically what it does is

if (myrank == 0) {
for (i = 0; i < N; i++)
MPI_Isend(...);
} else {
for (i = 0; i < N; i++)
MPI_Irecv(...);
}
for (i = 0; i < N; i++)
MPI_Wait(...);

with corresponding printfs. And the result is:

0.182620: Isend 0 begin
0.182761: Isend 0 end
0.182766: Isend 1 begin
0.182782: Isend 1 end
...
0.183911: Isend 49 begin
0.183915: Isend 49 end
0.199028: Irecv 0 begin
0.199068: Irecv 0 end
0.199070: Irecv 1 begin
0.199072: Irecv 1 end
...
0.199187: Irecv 49 begin
0.199188: Irecv 49 end
0.233948: Isend 0 done!
0.269895: Isend 1 done!
...
1.982475: Isend 49 done!
1.984065: Irecv 0 done!
1.984078: Irecv 1 done!
...
1.984131: Irecv 49 done!

i.e. almost two seconds happen between the start of the application and
the first Irecv completes, and then all Irecv complete immediately too,
i.e. it seems the communications were grouped altogether.

This is really bad, because in our real use case, we trigger
computations after each MPI_Wait calls, and we use several messages so
as to pipeline things: the first computation can start as soon as one
message gets received, thus overlapped with further receptions.

This problem is only with openmpi on TCP, I'm not getting this behavior
with openmpi on IB, and I'm not getting it either with mpich or madmpi:

0.182168: Isend 0 begin
0.182235: Isend 0 end
0.182237: Isend 1 begin
0.182242: Isend 1 end
...
0.182842: Isend 49 begin
0.182844: Isend 49 end
0.200505: Irecv 0 begin
0.200564: Irecv 0 end
0.200567: Irecv 1 begin
0.200569: Irecv 1 end
...
0.201233: Irecv 49 begin
0.201234: Irecv 49 end
0.269511: Isend 0 done!
0.273154: Irecv 0 done!
0.341054: Isend 1 done!
0.344507: Irecv 1 done!
...
3.767726: Isend 49 done!
3.770637: Irecv 49 done!

There we do have pipelined reception.

Is there a way to get the second, pipelined behavior with openmpi on
TCP?

Samuel
#include 
#include 
#include 
#include 
#include 

/* run with mpirun --map-by node */

#define SIZE (8*1024*1024)
#define N 50

//#define DEBUG

int main(int argc, char *argv[]) {
	char *c[N];
	int rank;
	int i, repeat, flag;
	MPI_Request request[N];
	MPI_Status status;
	int done[N] = { 0 };
	char *actions[2] = { "Isend", "Irecv" };
	int ret;
	double start;
	struct utsname name;

	uname();
	MPI_Init(, );
	MPI_Comm_rank(MPI_COMM_WORLD,);
	fprintf(stderr,"I'm %d on %s\n", rank, name.nodename);
	MPI_Barrier(MPI_COMM_WORLD);
	start = MPI_Wtime();

	for (i = 0; i < N; i++)
	{
		c[i] = calloc(1,SIZE);
		c[i][0] = i;
		c[i][SIZE-1] = i;
	}

	if (rank == 0) {
		for (i = 0; i < N; i++)
		{
			fprintf(stderr,"%f: Isend %d begin\n", MPI_Wtime() - start, i);
			ret = MPI_Isend(c[i], SIZE, MPI_CHAR, 1, 0, MPI_COMM_WORLD, [i]);
			//ret = MPI_Issend(c[i], SIZE, MPI_CHAR, 1, 0, MPI_COMM_WORLD, [i]);
			//ret = MPI_Send(c[i], SIZE, MPI_CHAR, 1, 0, MPI_COMM_WORLD);
			assert(ret == MPI_SUCCESS);
			fprintf(stderr,"%f: Isend %d end\n", MPI_Wtime() - start, i);
		}
	} else {
		for (i = 0; i < N; i++)
		{
			fprintf(stderr,"%f: Irecv %d begin\n", MPI_Wtime() - start, i);
			ret = MPI_Irecv(c[i], SIZE, MPI_CHAR, 0, 0, MPI_COMM_WORLD, [i]);
			assert(ret == MPI_SUCCESS);
			fprintf(stderr,"%f: Irecv %d end\n", MPI_Wtime() - start, i);
		}
	}

//if (rank)
{
#if 0
	do {
		repeat = 0;
		for (i = 0; i < N; i++)
		{
			if (!done[i])
			{
repeat = 1;
#ifdef DEBUG
fprintf(stderr,"%f: %s Test %d begin\n", MPI_Wtime() - start, actions[rank], i);
#endif
ret = MPI_Test([i], [i], );
assert(ret == MPI_SUCCESS);
#ifdef DEBUG
fprintf(stderr,"%f: %s Test %d end\n", MPI_Wtime() - start, actions[rank], i);
#endif
if (done[i])
{
	fprintf(stderr,"%f: %s %d done!\n", MPI_Wtime() - start, actions[rank], i);
	if (rank)
	{
		assert(c[i][0] == i);
		assert(c[i][SIZE-1] == i);
	}
}
			}
		}
	} while(repeat);
#elif 0
	repeat = N;
	do {
		ret = MPI_Testany(N, request, , , );
		assert(ret == MPI_SUCCESS);
		if (flag)
		{
			fprintf(stderr,"%f: %s %d done!\n", MPI_Wtime() - start, actions[rank], i);
			if (rank)
			{
assert(c[i][0] == i);
assert(c[i][SIZE-1] == i);
			}
			repeat--;
		}
	} while (repeat);
#elif 0
	for (i = 0; i < N; i++)
	{
		do
		{
#ifdef DEBUG
			fprintf(stderr,"%f: %s Test %d begin\n", MPI_Wtime() - start, actions[rank], i);
#endif
			ret = MPI_Test([i], , );
			assert(ret == MPI_SUCCESS);
#ifdef DEBUG
			fprintf(stderr,"%f: %s Test %d end\n", MPI_Wtime() - start, actions[rank], i);
#endif
		} while(!flag);
		fprintf(stderr,"%f: %s %d done!\n", MPI_Wtime() - start, actions[rank], i);
		if (rank)
		{
			assert(c[i][0] == i);
			assert(c[i][SIZE-1] == i);
		}
	}
#else
	for (i = 0; i < N; i++)
	{
		ret = MPI_Wait([i], );
		assert(ret == MPI_SUCCESS);
		fprintf(stderr,"%f: %s %d done!\n", MPI_Wtime() - start, actions[rank], i);
		if (rank)
		{
			

Re: [hwloc-users] ? Finding cache & pci info on SPARC/Solaris 11.3

2017-06-08 Thread Samuel Thibault
Hello,

Maureen Chew, on jeu. 08 juin 2017 10:51:56 -0400, wrote:
> Should finding cache & pci info work?

AFAWK, there is no user-available way to get cache information on
Solaris, so it's not implemented in hwloc.

Concerning pci, you need libpciaccess to get PCI information.

Samuel
___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/hwloc-users


Re: [hwloc-users] Building hwloc for X11 on Mac OS X

2017-05-08 Thread Samuel Thibault
Hello,

Gunter, David O, on jeu. 04 mai 2017 20:44:16 +, wrote:
> launching lstopo always produces the text-based output. I cannot seem
> to get the X-display features to work. And yes, I am able to launch
> xterms and other X11-based apps correctly.

Do you have the DISPLAY environment variable set?  lstopo uses it to
determine whether it should run the X11 output or not.

Samuel
___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/hwloc-users


Re: [hwloc-users] Building hwloc for a Cray/KNL system

2017-01-27 Thread Samuel Thibault
Hello,

Gunter, David O, on Fri 27 Jan 2017 18:05:44 +, wrote:
> $ aprun -n 1 -L 193 ~hwloc-tt/bin/lstopo-no-graphics

Does aprun give you allocation of all cores?  By default lstopo only
shows the allocated cores.  To see all of them, use the --whole-system
option.

Samuel
___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/hwloc-users


Re: [hwloc-users] Issue running hwloc on Xeon-Phi Coprocessor uOS

2017-01-16 Thread Samuel Thibault
Hello,

Jacob Peter Caswell, on Mon 16 Jan 2017 11:53:56 -0600, wrote:
> x86_64-k1om-linux-ld: i386:x86-64 architecture of input file `.libs/support.o'
> is incompatible with k1om output

Did you make clean before reconfiguring+making?

Samuel
___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/hwloc-users


Re: [hwloc-users] hwloc on Zynq

2016-12-12 Thread Samuel Thibault
Hello,

Alberto Ortiz, on Mon 12 Dec 2016 18:03:23 +0100, wrote:
> These gpios are included to the PS by looking into the device tree, and 
> located
> in /sys/class.

> I know hwloc is able to find PCI devices, but i would like to know if hwloc is
> able to detect other type of I/O like the ones i've just mentioned

hwloc currently doesn't have support for gpios, but we could add it if
there is enough information about it in /sys/class.  What does it look
like?  On my LIME2 box, I only have

/sys/class/gpio/gpiochip0

without much information since it's an integrated device. Could you send
us a tarball of your /sys/class/gpio?

Samuel
___
hwloc-users mailing list
hwloc-users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/hwloc-users


Re: [hwloc-users] [hwloc-announce] Hardware Locality (hwloc) v1.11.3 released

2016-04-27 Thread Samuel Thibault
Brice Goglin, on Tue 26 Apr 2016 15:45:49 +0200, wrote:
> The Hardware Locality (hwloc) team is pleased to announce the release
> of v1.11.3:

I'm getting one testsuite issue:

FAIL: 16-2gr2gr2n2c+misc.xml

(gdb) bt
#0  strlen () at ../sysdeps/x86_64/strlen.S:106
#1  0x77346d8e in __GI___strdup (s=0x0) at strdup.c:41
#2  0x004032ee in hwloc_utils_userdata_import_cb (topology=0x62a520, 
obj=0x639c00, name=0x639330 "normal:MyName0", 
buffer=0x0, length=0) at ../../utils/hwloc/misc.h:312
#3  0x77bb48e1 in hwloc__xml_import_userdata (topology=0x62a520, 
obj=0x639c00, state=0x7fffd2f0)
at topology-xml.c:624
#4  0x77bb519e in hwloc__xml_import_object (topology=0x62a520, 
data=0x6399d0, obj=0x639c00, state=0x7fffd3e0)
at topology-xml.c:766
#5  0x77bb5b27 in hwloc_look_xml (backend=0x6398e0) at 
topology-xml.c:1021
#6  0x77b9d962 in hwloc_discover (topology=0x62a520) at topology.c:2499
#7  0x77b9e974 in hwloc_topology_load (topology=0x62a520) at 
topology.c:2994
#8  0x004054e7 in main (argc=0, argv=0x7fffd728) at lstopo.c:734

312   u->buffer = strdup(buffer);
(gdb) p buffer
$1 = (const void *) 0x0

624   topology->userdata_import_cb(topology, obj, fakename,
buffer, length);
(gdb) p buffer
$2 = 0x0

so it looks like

617   ret = state->global->get_content(state, , reallength);

didn't actually fill buffer, but

(gdb) p name
$13 = 0x64ff4c "MyName0"
(gdb) p encoded
$10 = 0
(gdb) p length
$11 = 0
(gdb) p reallength
$12 = 0

so maybe that's "expected" :)

I'll be using the attached patch in Debian.

Samuel
diff --git a/src/topology-xml.c b/src/topology-xml.c
index 220afd1..35fb19e 100644
--- a/src/topology-xml.c
+++ b/src/topology-xml.c
@@ -612,7 +612,7 @@ hwloc__xml_import_userdata(hwloc_topology_t topology 
__hwloc_attribute_unused, h
   return -1;

   } else if (topology->userdata_not_decoded) {
-  char *buffer, *fakename;
+  char *buffer = "", *fakename;
   size_t reallength = encoded ? BASE64_ENCODED_LENGTH(length) : length;
   ret = state->global->get_content(state, , reallength);
   if (ret < 0)


Re: [hwloc-users] Selecting real cores vs HT cores

2014-12-11 Thread Samuel Thibault
Jeff Squyres (jsquyres), le Thu 11 Dec 2014 21:12:27 +, a écrit :
> When the BIOS is set to enable hyper threading, then several resources on the 
> core are split when the machine is booted up (e.g., some of the queue depths 
> for various processing units in the core are half the length that they are 
> when hyperthreading is disabled in the BIOS).

Perhaps some queues get divided, but most of the resources (such as
cache, TLB, etc.) are completely available when using only one
hyperthread, like they would be with HT disabled.

Samuel


Re: [hwloc-users] Processor numbering in Ivy-bridge

2014-09-29 Thread Samuel Thibault
Vishwanath Venkatesan, le Mon 29 Sep 2014 13:38:35 -0700, a écrit :
> I was trying to use HWLOC on Ivybridge. I found that there is some
> inconsistency in the core numbering.
> 
> In the attached image (generated from running lstopo (hwloc - 1.9.1), we can
> see that cores 6,7 do not exist although, PU#6 and PU#7 does exist.

I am not very surprised. Those are physical numbers, which BIOS & such determine
in various ways, which may not be contiguous.  If you are looking for a
contiguous numbering, you need to have a look at the logical numbers, obtained
from lstopo -l.

Samuel


Re: [hwloc-users] hwloc-ls graphical output

2014-09-24 Thread Samuel Thibault
Dennis Jacobfeuerborn, le Thu 25 Sep 2014 02:01:48 +0200, a écrit :
> The question I guess is how does the command determine the availability
> of png as an output? Both cairo and libpng are installed.

It depends on the backends which were built into cairo.

Samuel


Re: [hwloc-users] BGQ question.

2014-03-25 Thread Samuel Thibault
Biddiscombe, John A., le Tue 25 Mar 2014 08:56:02 +, a écrit :
> Looking at /proc/cpuinfo on the io node itself, I see only 60 cores listed. I
> wonder if they’ve reserved one socket of 4 cores for IO purposes

That's possible, yes.

> and in fact hwloc is seeing the correct information.

At least it provides the correct information according to the content of
/proc and /sys.

Samuel


Re: [hwloc-users] [hwloc-announce] Hardware Locality (hwloc) v1.8.1 released

2014-02-13 Thread Samuel Thibault
Brice Goglin, le Thu 13 Feb 2014 23:18:04 +0100, a écrit :
> IIRC, Windows warnings are function pointer casts that should be OK.

IIRC too.

Samuel


Re: [hwloc-users] Using hwloc to map GPU layout on system

2014-02-06 Thread Samuel Thibault
Brock Palen, le Thu 06 Feb 2014 21:31:42 +0100, a écrit :
>   GPU L#3 "nvml2"
>   GPU L#5 "nvml3"
>   GPU L#7 "nvml0"
>   GPU L#9 "nvml1"
> 
> Is the L# always going to be in the oder I would expect?  Because then I 
> already have my map then. 

No, L# is just following the machine topology. CUDA numbering does not
necessarily follows that (e.g. if a slow GPU is somewhere in the middle).

Samuel


Re: [hwloc-users] Having trouble getting CPU Model string on Windows 7 x64

2014-01-28 Thread Samuel Thibault
Brice Goglin, le Tue 28 Jan 2014 12:46:24 +0100, a écrit :
>   42: xchg %ebx,%rbx
> 
> I guess having both ebx and rbx on these lines isn't OK. On Linux, I get
> rsi instead of ebx, no problem.
> 
> Samuel, any idea?

Mmm, IIRC, "unsigned long" on windows may not be 64bit but 32bit?
Perhaps we could rather include stdint.h and use uintptr_t or uint64_t
there (so any other unix with 32bit unsigned long is fixed), and in the
case of windows, include windows.h and use DWORDLONG.

Samuel


Re: [hwloc-users] How to build hwloc static to link into a shared lib on Linux

2014-01-18 Thread Samuel Thibault
Erik Schnetter, le Sat 18 Jan 2014 07:29:37 +0100, a écrit :
> You probably need to set CFLAGS in addition to CXXFLAGS.

Yes, CXXFLAGS is for C++ files.  hwloc doesn't have any :)
It's CFLAGS which is for C.

That being said, I wonder the gain you will have: all the probing
functions will still get pulled in, and for Linux that'll be the most
part of hwloc. Be sure to explicitly disable PCI and such at configure
time to at least avoid including these probing functions.

Samuel


Re: [hwloc-users] [windows] hwloc_get_proc_cpubind issue, even with current process handle as 2nd parameter

2014-01-06 Thread Samuel Thibault
Samuel Thibault, le Mon 06 Jan 2014 18:07:59 +0100, a écrit :
> Eloi Gaudry, le Mon 06 Jan 2014 17:16:53 +0100, a écrit :
> > the PID of the process. I was assuming that casting this member to a HANDLE 
> > object would allow me to use hwloc_get_proc_cpubind,

Let me fix my typos:

No, PIDs are mere numbers, they have nothing to do with HANDLEs. More
interestingly, PID values are valid along the whole system, while
HANDLE values are only valid within a given process. You have to use
OpenProcess(), to create a HANDLE from a PID value.

Samuel


Re: [hwloc-users] [windows] hwloc_get_proc_cpubind issue, even with current process handle as 2nd parameter

2014-01-06 Thread Samuel Thibault
Eloi Gaudry, le Mon 06 Jan 2014 17:16:53 +0100, a écrit :
> the PID of the process. I was assuming that casting this member to a HANDLE 
> object would allow me to use hwloc_get_proc_cpubind,

No, PIDs are mere numbers, they have nothing to do with HANDLES. More
interestingly, PID values are valid along the whole systems, while
HANDLE values are only valid with a given process. You have to use
OpenProcess, to create a HANDLE from a PID.

Samuel


Re: [hwloc-users] [windows] hwloc_get_proc_cpubind issue, even with current process handle as 2nd parameter

2014-01-06 Thread Samuel Thibault
Eloi Gaudry, le Mon 06 Jan 2014 16:37:55 +0100, a écrit :
> AFAIK, the issue seems related to the GetAffinityMask call inside
> hwloc_win_get_proc_cpubind : it always returns 0.

So it's really the win32 layer which does not like seeing
GetAffinityMask called.  Just to make sure: you are using at least
Windows XP, right?

Samuel


Re: [hwloc-users] [windows] hwloc_get_proc_cpubind issue, even with current process handle as 2nd parameter

2014-01-06 Thread Samuel Thibault
Eloi Gaudry, le Mon 06 Jan 2014 16:04:27 +0100, a écrit :
> On Windows, hwloc_get_cpubind and hwloc_set_cpubind works correctly but I
> cannot use hwloc_get_proc_cpubind or hwloc_set_proc_cpubind using the current
> process handle as 2^nd parameter (no matter what the last one is).
> 
> Any clue on this ?

Not really, it should just work. Do GetProcessAffinityMask() or
SetProcessAffinityMask() work if you call them the same way?

Do you perhaps have more than 64 processors ? We still haven't
found access to such system in order to implement the use of
Get/SetProcessGroupAffinity.

Samuel


Re: [hwloc-users] Hwloc and Electric Fence (libefence).

2013-01-29 Thread Samuel Thibault
cesse...@free.fr, le Tue 29 Jan 2013 19:12:32 +0100, a écrit :
> It was a very stupid question indeed !

Well, no it's not stupid :)
Zero-allocs can indeed be frowned upon. Some algorithms like doing it,
but some others to actually bug out at the same time allocating 0 bytes.

Samuel


Re: [hwloc-users] hwloc tutorial material

2013-01-22 Thread Samuel Thibault
Kenneth A. Lloyd, le Mon 21 Jan 2013 22:46:37 +0100, a écrit :
> Thanks for making this tutorial available.  Using hwloc 1.7, how far down
> into, say, NVIDIA cards can the architecture be reflected?  Global memory
> size? SMX cores? None of the above?

None of the above for now.  Both are available in the cuda svn branch,
however.

Samuel


[hwloc-users] AIX test? Re: Hardware locality (hwloc) v1.6rc2 released

2012-11-21 Thread Samuel Thibault
Hello,

Brice Goglin, le Tue 20 Nov 2012 15:26:37 +0100, a écrit :
> I just released 1.6rc2 (mirrors will update soon).

It seems fine in my tests, can somebody test on AIX?

Samuel


Re: [hwloc-users] Windows api threading functions equivalent to hwloc?

2012-11-19 Thread Samuel Thibault
Andrew Somorjai, le Tue 20 Nov 2012 01:39:47 +0100, a écrit :
> "CreateThread() and WaitForMultipleObjects() are not in hwloc since they have 
> nothing to do with topologies."
> 
> I thought hwloc was also for threading?

It can bind your threads, yes, but the way to create the thread is
yours, it can be CreateThread, or OpenMP, etc...

> "DWORD_PTR m_id = 0;
> DWORD_PTR m_mask = 1 << i;
> 
> m_threads[i] = CreateThread(NULL, 0, (LPTHREAD_START_ROUTINE)threadMain, 
> (LPVOID)i, NULL, _id);
> SetThreadAffinityMask(m_threads[i], m_mask);
> 
> This will likely be something such as:
> 
> hwloc_bitmap_t bitmap = hwloc_bitmap_alloc();
> hwloc_bitmap_set_only(bitmap, i);
> hwloc_set_thread_cpubind(topology, m_threads[i], bitmap, 0);
> hwloc_bitmap_free(bitmap);"
> 
> How would I pass a function like threadMain in the above CreateThread 
> function into the thread itself. Someone told me to use this library for this 
> purpose so I wasn't sure what it was made for. 

You should indeed use hwloc to replace the SetThreadAffinityMask, but
keep your CreateThread.

> How would I create an array m_threads and pass it
> into hwloc_set_thread_cpubind. I would still need this part then correct?
> 
> m_threads[i] = CreateThread(NULL, 0, (LPTHREAD_START_ROUTINE)threadMain, 
> (LPVOID)i, NULL, _id); 

Yes, something like:

m_threads[i] = CreateThread(NULL, 0, (LPTHREAD_START_ROUTINE)threadMain, 
(LPVOID)i, NULL, _id);

hwloc_bitmap_t bitmap = hwloc_bitmap_alloc();
hwloc_bitmap_set_only(bitmap, i);
hwloc_set_thread_cpubind(topology, m_threads[i], bitmap, 0);
hwloc_bitmap_free(bitmap);"

> I would like to be independent of windows.h   by the way, not using windows 
> api calls is the motivation for all of this.

Ah, then you may want to also use the pthread-win32 package, which is
meant to replace CreateThread, and use pthread_getw32threadhandle_np in
the windows case to convert from pthread-win32's pthread_t into a HANDLE
for hwloc.

Samuel


Re: [hwloc-users] Windows api threading functions equivalent to hwloc?

2012-11-19 Thread Samuel Thibault
Brice Goglin, le Mon 19 Nov 2012 21:09:33 +0100, a écrit :
> hwloc_bitmap_t bitmap = hwloc_bitmap_alloc();
> hwloc_bitmap_set_only(bitmap, i);
> hwloc_set_thread_cpubind(topology, m_threads[i], bitmap, 0);
> hwloc_bitmap_free(bitmap);

Or perhaps 

hwloc_set_thread_cpubind(topology, m_threads[i],
hwloc_get_obj_by_type(topology, HWLOC_OBJ_CORE, i),
0);

if you want to get core number in logical order rather than physical
order (or use HWLOC_OBJ_PU if that's the hardware threads you want to
get).

> To get the number of processors with hwloc, use something like:
>   hwloc_get_nbobjs_by_type(topology, HWLOC_OBJ_CORE);
> or
>   hwloc_get_nbobjs_by_type(topology, HWLOC_OBJ_PU);
> Then it depends if you want real cores (the former or hardware threads (the
> latter).

Samuel


Re: [hwloc-users] [hwloc-announce] Hardware locality (hwloc) v1.6rc1 released

2012-11-15 Thread Samuel Thibault
Hello,

Brice Goglin, le Tue 13 Nov 2012 13:45:28 +0100, a écrit :
> The Hardware Locality (hwloc) team is pleased to announce the first
> release candidate for v1.6:

I'm getting an odd failure in hwloc_pci_backend:

lt-hwloc_pci_backend: hwloc-1.6rc1/tests/hwloc_pci_backend.c:68: main: 
Assertion `!nb' failed.

It seems that even with flags == 0, pci stuff gets loaded from the xml
output. It happens on only one of our machines, hannibal. I wonder what
is special there.

Samuel


Re: [hwloc-users] Strange binding issue on 40 core nodes and cgroups

2012-11-05 Thread Samuel Thibault
Brice Goglin, le Mon 05 Nov 2012 23:23:42 +0100, a écrit :
> top can also sort by the last used CPU. Type f to enter the config menu,
> hilight the "last cpu" line, and hit 's' to make it the sort column.

With older versions of top, type F, then j, then space.

Samuel


Re: [hwloc-users] How do I access CPUModel info string

2012-10-29 Thread Samuel Thibault
Olivier Cessenat, le Sat 27 Oct 2012 19:10:55 +0200, a écrit :
> Just in case, I also provide the output of sysctl hw:

Thanks. There is indeed no package information (hw.packages), that's why
hwloc does not include any socket object.

Brice wrote:
> One way to solve this problem (which may also occur on old Linux 
> distribs) would be to store the CPU model in the machine object. But 
> we'll have to make sure all processors in the machine are indeed of the 
> same model. On MacOSX, it looks like sysctl reports a single socket 
> description anyway, so no problem. 

So we have to resort to that, now commited.

Samuel


Re: [hwloc-users] How do I access CPUModel info string

2012-10-25 Thread Samuel Thibault
Robin Scher, le Thu 25 Oct 2012 23:57:38 +0200, a écrit :
> ; eax = 0x8002 --> eax, ebx, ecx, edx: get processor name string
> (part 1)
> mov eax,0x8002
> cpuid

Oh, this is indeed *exactly* the model name string. I only knew about
the vendor_id string.

> I don't know if that would work on Win64, though.

It should: cpuid is not a privileged instruction.

> Do you think those could be added to hwloc?

Yes: we already use cpuid for the x86 backend. That will only work on
x86 hosts of course.

Brice, that actually brings another piece to the plugin engine: on
Windows ideally we should still get the topology from the OS, but take
the cpu string from the x86 backend...

Samuel


Re: [hwloc-users] How do I access CPUModel info string

2012-10-25 Thread Samuel Thibault
Robin Scher, le Thu 25 Oct 2012 23:39:46 +0200, a écrit :
> Is there a way to get this string (e.g. "Intel(R) Core(TM) i7 CPU M 620 @
> 2.67GHz") consistently on Windows, Linux, OS-X and Solaris?

Currently, no.

hwloc itself does not have a table of such strings, and each OS has its
own table.

Samuel


Re: [hwloc-users] hwloc 1.5, freebsd and linux output on the same hardware

2012-10-05 Thread Samuel Thibault
Sebastian Kuzminsky, le Sat 06 Oct 2012 00:55:57 +0200, a écrit :
> binding to CPU0
> could not bind to CPU0: Resource deadlock avoided

Mmm, from what I read in the freebsd kernel:

/*
 * Create a set in the space provided in 'set' with the provided parameters.
 * The set is returned with a single ref.  May return EDEADLK if the set
 * will have no valid cpu based on restrictions from the parent.
 */

_cpuset_create(struct cpuset *set, struct cpuset *parent, const cpuset_t *mask,
cpusetid_t id)
{

if (!CPU_OVERLAP(>cs_mask, mask))
return (EDEADLK);

Could it be that due to administration rules lstopo is not allowed to
bind on cpu 0-9 ? In that case the x86 backend can not detect anything
there.

Samuel


Re: [hwloc-users] hwloc 1.5, freebsd and linux output on the same hardware

2012-10-04 Thread Samuel Thibault
Sebastian Kuzminsky, le Wed 03 Oct 2012 17:24:55 +0200, a écrit :
> So that's an improvement over the svn trunk
> yesterday, but it's not all the way fixed yet!

Ok.  Apparemently hwloc can't bind itself to procs 0-9 for some reason.
I have added debug to the trunk, could you try it again (no need for the
config.log any more, but I still need --enable-debug).

Samuel


Re: [hwloc-users] hwloc 1.5, freebsd and linux output on the same hardware

2012-10-02 Thread Samuel Thibault
Hello,

Sebastian Kuzminsky, le Wed 03 Oct 2012 01:08:46 +0200, a écrit :
> Here you go (the list server rejected it because it was too big, but this
> compressed version should make it through).

Thanks!

There were two bugs which resulted into cpuid not being properly
compiled. I have fixed them in the trunk, could you try again?

Samuel


Re: [hwloc-users] hwloc 1.5, freebsd and linux output on the same hardware

2012-10-02 Thread Samuel Thibault
Hello,

Sebastian Kuzminsky, le Tue 02 Oct 2012 23:47:05 +0200, a écrit :
> I've attached the output from both platforms.

On freebsd, could you pass --enable-debug to ./configure and rerun
lstopo, to get more debugging information?

Samuel


Re: [hwloc-users] Solaris and hwloc

2012-09-14 Thread Samuel Thibault
Jeff Squyres, le Thu 13 Sep 2012 17:10:00 +0200, a écrit :
> After a little more thought, I'm also thinking that having a "it's ok if 
> binding fails" CLI flag is a bad idea.  If the user really wants something to 
> run without binding, then you can just do that in the shell:
> 
> -
> hwloc-bind ...whatever... my_executable
> if test "$?" != "0"; then
>   # run without binding
>   my_executable
> fi

Well, I find this a bit tedious.

Other than that, I agree.

Samuel


Re: [hwloc-users] Solaris and hwloc

2012-09-12 Thread Samuel Thibault
Jeff Squyres, le Thu 13 Sep 2012 00:46:33 +0200, a écrit :
> On Sep 12, 2012, at 6:44 PM, Samuel Thibault wrote:
> 
> >> Anyone have an opinion?  I'm 60/40 in favor of not letting it run, under 
> >> the rationale that the user asked for something that we can't deliver, so 
> >> we shouldn't continue.
> > 
> > Well, it depends on the situation. The binding might only be an
> > optimization, and failing just because of that is not nice. When it's an
> > administration decision, it's different, but then one would use cgroups
> > & such instead.
> 
> 
> How about adding a flag to make it fail if it doesn't bind?

Now I understand Brice's --strict flag mentioning :)

Samuel


Re: [hwloc-users] Solaris and hwloc

2012-09-12 Thread Samuel Thibault
Jeff Squyres, le Thu 13 Sep 2012 00:45:56 +0200, a écrit :
> On Sep 12, 2012, at 6:42 PM, Samuel Thibault wrote:
> > No, we have it, but not all solaris systems have it.
> 
> 
> Ah, I see.  So if Siegmar had done "hwloc-bind socket:0 ..." -- assuming his 
> system has lgrp support -- that should work.  right?

Rather node:0, but yes.

Samuel


Re: [hwloc-users] Solaris and hwloc

2012-09-12 Thread Samuel Thibault
I forgot to answer this:

Jeff Squyres, le Wed 12 Sep 2012 16:16:57 +0200, a écrit :
> Sidenote: if hwloc-bind fails to bind, should we still launch the child 
> process?

Well, it's up to you to decide :)

Samuel


Re: [hwloc-users] Solaris and hwloc

2012-09-12 Thread Samuel Thibault
Jeff Squyres, le Wed 12 Sep 2012 16:16:57 +0200, a écrit :
> He seems to get an hwloc error any time he tries to bind to more than 1 PU.  
> Is that expected on Solaris?

Without lgrp support, unfortunately yes: the processor_bind solaris interface 
only permits to bind to one processor.

With lgrp support, on should be able to bind oneself to sets of whole
NUMA nodes. I don't know any interface which would provide a granularity
between one processor and one NUMA node.

Samuel


Re: [hwloc-users] lstopo and GPus

2012-08-28 Thread Samuel Thibault
Brice Goglin, le Tue 28 Aug 2012 14:43:53 +0200, a écrit :
> > $ lstopo
> >   Socket #0
> >   Socket #1
> > PCI...
> > (connected to socket #1)
> >
> > vs
> >
> > $ lstopo
> >   Socket #0
> >   Socket #1
> >   PCI...
> > (connected to both sockets)
> 
> Fortunately, this won't occur in most cases (including Gabriele's
> machines) because there's a NUMAnode object above each socket.

Oops, I actually meant NUMAnode above

> Both the socket and the PCI bus are drawn inside the NUMA box, so
> things appear OK in graphics to.

Indeed, if the PCI bus was connected to one NUMAnode/socket only, it
would be drawn inside, which is not the case.

> Gabriele, assuming you have a dual Xeon X56xx Westmere machine, there
> are plenty of such platforms where the GPU is indeed connected to both
> sockets. Or it could be a buggy BIOS.

Agreed.

Samuel


Re: [hwloc-users] lstopo and GPus

2012-08-28 Thread Samuel Thibault
Gabriele Fatigati, le Tue 28 Aug 2012 14:19:44 +0200, a écrit :
> I'm using hwloc 1.5. I would to see how GPUs are connected with the processor
> socket using lstopo command. 

About connexion with the socket, there is indeed no real graphical
difference between "connected to socket #1" and "connected to all
sockets". You can use the text output for that:

$ lstopo
  Socket #0
  Socket #1
PCI...
(connected to socket #1)

vs

$ lstopo
  Socket #0
  Socket #1
  PCI...
(connected to both sockets)

Samuel


Re: [hwloc-users] possible concurrency issue with reading /proc data on Linux

2012-04-21 Thread Samuel Thibault
Vlad, le Sat 21 Apr 2012 23:37:11 +0200, a écrit :
> 433  /* take the number of links as a good estimate for the number of tids */
> 434  if (fstat(dirfd(taskdir), ) == 0)
> 435max_tids = sb.st_nlink;
> 
> "taskdir" here is /proc//task, correct? In which case the threads will be
> doing readdir() on the same DIR stream...

No, each thread opens its own DIR in hwloc_linux_foreach_proc_tid.

Samuel


Re: [hwloc-users] Problems on SMP with 48 cores

2012-03-15 Thread Samuel Thibault
Samuel Thibault, le Thu 15 Mar 2012 07:42:40 +0100, a écrit :
> Brice Goglin, le Wed 14 Mar 2012 22:32:07 +0100, a écrit :
> > We debugged this in private emails with Hartmut. His 48-core platform is
> > now detected properly. Everything got fixed with a patch
> > functionnally-identical to what Samuel sent earlier.
> 
> Is the 32bit-on-64bit build fixed too?

It'd also be good to test 32-on-32, where there would be two groups,
because binding on groups has not been implemented at all due to lacking
access to a machine with several groups.

Samuel


Re: [hwloc-users] Problems on SMP with 48 cores

2012-03-14 Thread Samuel Thibault
Hartmut Kaiser, le Wed 14 Mar 2012 08:52:59 -0500, a écrit :
> 
> > Le 14/03/2012 09:39, Brice Goglin a écrit :
> > > Le 13/03/2012 19:08, Hartmut Kaiser a écrit :
> > >>> -  hwloc_bitmap_from_ith_ulong(obj->cpuset,
> > GroupMask[i].Group,
> > >>> GroupMask[i].Mask);
> > >>> +  hwloc_bitmap_from_ith_ulong(obj->cpuset,
> > 2*GroupMask[i].Group,
> > >>> GroupMask[i].Mask & 0xfff);
> > > There's a missing 'f' above.
> > > Here's another almost untested patch, with additional debug printf.
> > > Please remove the previous one and apply this one instead.
> > 
> > Grrr, I failed to fix the missing f. New patch attached.
> 
> Your patch relies on two symbols which I'm not able to resolve:
> hwloc_debug_bitmap_2args and hwloc_debug_2args. If I comment those the
> picture has changed (see attached), but still no overall luck

Here is a fixed patch concerning the debugging statements.

Samuel
Index: src/topology-windows.c
===
--- src/topology-windows.c  (révision 4385)
+++ src/topology-windows.c  (copie de travail)
@@ -532,7 +532,9 @@
obj = hwloc_alloc_setup_object(type, id);
 obj->cpuset = hwloc_bitmap_alloc();
hwloc_debug("%s#%u mask %lx\n", hwloc_obj_type_string(type), id, 
procInfo[i].ProcessorMask);
-   hwloc_bitmap_from_ulong(obj->cpuset, procInfo[i].ProcessorMask);
+   hwloc_bitmap_from_ulong(obj->cpuset, procInfo[i].ProcessorMask & 
0x);
+   hwloc_bitmap_from_ith_ulong(obj->cpuset, i, procInfo[i].ProcessorMask 
>> 32);
+   hwloc_debug_2args_bitmap("%s#%u bitmap %s\n", 
hwloc_obj_type_string(type), id, obj->cpuset);

switch (type) {
  case HWLOC_OBJ_NODE:
@@ -634,7 +636,9 @@
  mask = procInfo->Group.GroupInfo[id].ActiveProcessorMask;
  hwloc_debug("group %u %d cpus mask %lx\n", id,
   procInfo->Group.GroupInfo[id].ActiveProcessorCount, mask);
- hwloc_bitmap_from_ith_ulong(obj->cpuset, id, mask);
+ hwloc_bitmap_from_ith_ulong(obj->cpuset, 2*id, mask & 0x);
+ hwloc_bitmap_from_ith_ulong(obj->cpuset, 2*id+1, mask >> 32);
+ hwloc_debug_2args_bitmap("group %u %d bitmap %s\n", id, 
procInfo->Group.GroupInfo[id].ActiveProcessorCount, obj->cpuset);
  hwloc_insert_object_by_cpuset(topology, obj);
}
continue;
@@ -648,8 +652,10 @@
 obj->cpuset = hwloc_bitmap_alloc();
 for (i = 0; i < num; i++) {
   hwloc_debug("%s#%u %d: mask %d:%lx\n", hwloc_obj_type_string(type), 
id, i, GroupMask[i].Group, GroupMask[i].Mask);
-  hwloc_bitmap_from_ith_ulong(obj->cpuset, GroupMask[i].Group, 
GroupMask[i].Mask);
+  hwloc_bitmap_from_ith_ulong(obj->cpuset, 2*GroupMask[i].Group, 
GroupMask[i].Mask & 0xfff);
+  hwloc_bitmap_from_ith_ulong(obj->cpuset, 2*GroupMask[i].Group+1, 
GroupMask[i].Mask >> 32);
 }
+   hwloc_debug("%s#%u bitmap %lx\n", hwloc_obj_type_string(type), id, 
obj->cpuset);

switch (type) {
  case HWLOC_OBJ_NODE:


Re: [hwloc-users] V1.4.1: Windows x64 import library broken

2012-03-14 Thread Samuel Thibault
Hartmut Kaiser, le Mon 12 Mar 2012 23:05:44 +0100, a écrit :
> The import library libhwloc.lib distributed with the Windows x64 binaries is
> broken in V1.4.1 (even if it was ok in V1.4). The library internally refers
> to libhwloc-4.dll (instead of libhwloc-5.dll). While it is not a problem to
> generate a correct import library from the supplied definition file, it
> would be good to be able to use the supplied binaries as is.

I've uploaded 1.4.1.1 windows builds, whose only change is that.

Thanks for the report,
Samuel


Re: [hwloc-users] Problems on SMP with 48 cores

2012-03-13 Thread Samuel Thibault
Brice Goglin, le Tue 13 Mar 2012 18:55:29 +0100, a écrit :
> Le 13/03/2012 17:04, Hartmut Kaiser a écrit :
> >>> But the problems I was seeing were not MSVC specific. It's a
> >>> proliferation of arcane (non-POSIX) function use (like strcasecmp,
> >>> etc.) missing use of HAVE_UNISTD_H, HAVE_STRINGS_H to wrap
> >>> non-standard headers, unsafe mixing of
> >>> int32<->int64 data types, reliance on int (and other types) having a
> >>> certain bit-size, totally unsafe shift operations, wide use of
> >>> (non-C-standard) gcc extensions, etc. Should I go on?
> > More investigation shows that the code currently assumes group (and
> > processor) masks to be 32 bit, which is not true on 64 bit systems. For
> > instance this (topology-windows.c: line 643):
> >
> > hwloc_bitmap_from_ith_ulong(obj->cpuset, GroupMask[i].Group,
> > GroupMask[i].Mask);
> 
> Try applying something like the patch below. Totally untested obviously,
> but we'll see if that starts improving lstopo.

That won't work on 32bit systems, where the mask is 32bit only and thus
>> 32 is undefined. He will probably be able to provide me with an
account on such windows system, let's just wait for that.

Samuel


Re: [hwloc-users] Problems on SMP with 48 cores

2012-03-13 Thread Samuel Thibault
Samuel Thibault, le Tue 13 Mar 2012 13:33:05 +0100, a écrit :
> > I tried to recompile the library using MSVC which would allow me to debug
> > the issue, but after several hours of tweaking I gave up. As it turns out
> > the code base is everything but portable, which is really unfortunate for a
> > library which is supposed to be cross platform. 
> 
> I'm afraid to have to answer that MSVC does everything but respecting
> standards, even when they are more that 10 years old. The hwloc code
> compiles as such on a variety of unix compilers, and we didn't need many
> tweaks for that. The mingw toolchain saves a lot of such concerns, so I
> can only advise to use it.

Just to make it clear: patches for making hwloc compile with MSVC are
welcome and will be happily applied, I'm just very reluctant to spend
time on writing them while the mingw build just works.

Samuel


Re: [hwloc-users] creation and destruction of bound threads

2012-01-30 Thread Samuel Thibault
Albert Solernou, le Mon 30 Jan 2012 12:37:31 +0100, a écrit :
> I am working on a threaded code, and want to bind threads to cores. However,
> the process creates and destroys the threads, so here is the question:
>   What happens if I enter on a threaded part of the code, bind "thread X" to
> a core, return to a serial part and then thread again? Can I expect to find
> thread X bound to the core I bound it previously?

It depends on what actually creates the threads. For instance, most
implementations of OpenMP reuse the same kernel threads, without
actually destroying them. But nothing in the standard asserts that, so
you'd probably prefer to re-bind just to be sure.

Samuel


Re: [hwloc-users] Bogus files in 64bit Windows binary distribution (1.4rc1)

2012-01-19 Thread Samuel Thibault
Hartmut Kaiser, le Fri 20 Jan 2012 00:43:32 +0100, a écrit :
> > Hartmut Kaiser, le Thu 19 Jan 2012 22:48:50 +0100, a écrit :
> > > We are using hwloc with VS2010 and were happy to realize that after
> > > the (for
> > > us) totally broken Windows binary distribution in V1.3
> > 
> > Broken?  How so?  It worked for me.
> 
> Try it, the autoconf/config.h has settings not compatible with VC++, for
> instance:
> 
> /* Maybe before gcc 2.95 too */
> #if !defined(HWLOC_HAVE_ATTRIBUTE_UNUSED) && defined(__GNUC__)
> # define HWLOC_HAVE_ATTRIBUTE_UNUSED 1
> #else
> # define HWLOC_HAVE_ATTRIBUTE_UNUSED 1
> #endif
> #if HWLOC_HAVE_ATTRIBUTE_UNUSED
> # define __hwloc_attribute_unused __attribute__((__unused__))
> #else
> # define __hwloc_attribute_unused
> #endif
> 
> etc. This essentially always defines __hwloc_attribute_unused to expand to
> the __attribute__() (from hwloc-win64-build-1.3.1.zip).

Ok, so the problem is not actually in the binaries, but the headers :)

This was also reported in another case and already fixed for the next
1.3 release.

Samuel


Re: [hwloc-users] hwloc_get_last_cpu_location and hwloc_get_cpubind

2012-01-17 Thread Samuel Thibault
Marc-André Hermanns, le Tue 17 Jan 2012 11:47:43 +0100, a écrit :
> It seems now that it has the whole system in the cpuset. How can I 
> really infer the PU this process was run on? I would have expected the 
> cpuset to have only 1 element per level to indicate the path from 
> machine to PU.

That is what is expected, yes (though only at the PU level, since
only that one is completely included in the cpuset, you would need
"intersects" to get the path). and that's what I get on my machine:

€ ./test 
This system has 7 levels
Cpuset: 0x0040
Number of objects at depth 0: 0
Number of objects at depth 1: 0
Number of objects at depth 2: 0
Number of objects at depth 3: 0
Number of objects at depth 4: 0
Number of objects at depth 5: 0
Number of objects at depth 6: 1

> Evidently my understanding of this functionality is still
> not correct.

No, it's completely correct, it just seems there's an odd thing
somewhere. Could you run through strace so we can check what the kernel
returns?

Samuel


Re: [hwloc-users] Compiling hwloc into a static library on Windows and Linux

2012-01-13 Thread Samuel Thibault
Andrew Helwer, le Fri 13 Jan 2012 18:16:16 +0100, a écrit :
> libhwloc.lib(traversal.o) : error LNK2019: unresolved external symbol 
> __ms_vsnpr
> intf referenced in function snprintf

Do you also link msvcrt in? mingw needs it for almost everything.

Samuel


Re: [hwloc-users] Compiling hwloc into a static library on Windows and Linux

2012-01-12 Thread Samuel Thibault
Andrew Helwer, le Fri 13 Jan 2012 01:35:27 +0100, a écrit :
> It fails with the following:
> 
> *** Warning: linker path does not have real file for library -lgdi32.

Ah, that's a dark bug in libtool.

> gcc -I/cygdrive/c/hwloc-asdf/include -I/cygdrive/c/hwloc-asdf/include 
> -I/cygdriv
> e/c/hwloc-asdf/includedolib.c   -o dolib
> ./dolib "/cygdrive/c/Program Files (x86)/Microsoft Visual Studio 
> 10.0/VC/bin/lib
> " X86 .libs/libhwloc.def libhwloc- .libs/libhwloc.lib
> The system cannot find the path specified.
> "/cygdrive/c/Program Files (x86)/Microsoft Visual Studio 10.0/VC/bin/lib" 
> /machi
> ne:X86 /def:.libs/libhwloc.def /name:libhwloc- /out:.libs/libhwloc.lib failed
> Makefile:758: recipe for target `.libs/libhwloc.lib' failed

Well, AIUI, you don't actually need the shared version, so you can as
well pass --disable-shared to ./configure to just get rid of this bug.

That said, isn't the just-uploaded-to-hwloc-website win64 build enough
for you?  It contains the libhwloc.a static build in lib/

Samuel


Re: [hwloc-users] Compiling hwloc into a static library on Windows and Linux

2012-01-12 Thread Samuel Thibault
Andrew Helwer, le Tue 10 Jan 2012 02:08:46 +0100, a écrit :
> the Visual Studio compiler runs into a lot of issues.

What kind of issues for instance?

Samuel


Re: [hwloc-users] Compiling hwloc into a static library on Windows and Linux

2012-01-12 Thread Samuel Thibault
Hello,

Andrew Helwer, le Thu 12 Jan 2012 02:11:58 +0100, a écrit :
> If I run the command manually, it can't find the libhwloc.def file. Which is 
> reasonable, as it does not appear to exist in the .lib directory. Am I 
> missing something?

In principle the .def file is generated by the linker. Could you run

make V=1

to get the command lines, and check that HWLOC_HAVE_WINDOWS is 1 in

./include/hwloc/autogen/config.h

? At worse, I believe you can just copy the libhwloc.def contained
in the 32bit build of the exact same version of hwloc, it should be
compatible.

Thanks,
Samuel


Re: [hwloc-users] Compiling hwloc into a static library on Windows and Linux

2012-01-09 Thread Samuel Thibault
Andrew Helwer, le Tue 10 Jan 2012 02:08:46 +0100, a écrit :
> First of all, is Windows 64-bit supported? There is only a 32-bit release on
> the downloads page.

I have never tried to build a 64bit binary, but there is little reason
it should fail.

> However, when I specify the --enable-embedded-mode flag in configure in Linux,
> no libraries are built at all - the specified prefix directory contains only
> empty directories.

But the library is built, it's just not installed because projects often
prefer to link the library in, or something similar. If you want to
install libhwloc.a, simply fetch it from src/.libs/

> I've managed to compile a working static library on Linux using the headers
> generated by configure,

I'm not sure to understand. Doesn't passing --enable-static to
./configure already generate a static library?

> but am having a lot of difficulty doing the same on Windows - the
> Visual Studio compiler runs into a lot of issues. Is there a simple
> way to do this?

I have to say I know basically nothing about what Visual Studio expects
from a static library.

Samuel


Re: [hwloc-users] GPU/NIC/CPU locality

2011-11-29 Thread Samuel Thibault
Stefan Eilemann, le Tue 29 Nov 2011 11:40:18 +0100, a écrit :
> Maybe I'm missing something, but I don't see any PCI-related output with 
> lstopo.

You are probably missing the libpci-devel package.

Samuel


Re: [hwloc-users] Process and thread binding

2011-09-12 Thread Samuel Thibault
Gabriele Fatigati, le Mon 12 Sep 2011 15:50:45 +0200, a écrit :
> thanks very much for your explanations. But I don't understand why a process
> inherits core bound of his threads

On Linux, there is no such thing as "process binding", only "thread
binding". hwloc emulates the former by using the latter.

Samuel


Re: [hwloc-users] Re : Re : hwloc topology check initializing

2011-09-03 Thread Samuel Thibault
Gabriele Fatigati, le Sat 03 Sep 2011 16:09:11 +0200, a écrit :
> What about hwloc_topology check()?
> 
> What types of check does?

Mostly that the hwloc library itself didn't do anything wrong.

Samuel


Re: [hwloc-users] Numa availability

2011-08-28 Thread Samuel Thibault
Brice Goglin, le Sun 28 Aug 2011 12:36:31 +0200, a écrit :
> >  Is there a hwloc routine to check this?
> 
> get_nbobjs_by_type(topology, HWLOC_OBJ_NODE) tells how many NUMA node
> objects exist.
> If you get >1, the machine is NUMA.
> If the non-NUMA case, I think you can get 0 or 1 depending on whether
> the OS is NUMA-aware or not (not sure we should remove this possible
> difference).

The useful difference is that 0 means we don't know, while 1 means we do
know there is only one node.

Samuel


Re: [hwloc-users] [hwloc-announce] Hardware Locality (hwloc) v1.2.1rc3 released

2011-08-16 Thread Samuel Thibault
Brice Goglin, le Tue 16 Aug 2011 19:49:10 +0200, a écrit :
> hwloc 1.2.1 *rc3* is out (web mirrors will update shortly). It fixes
> hwloc_get_last_cpu_location() for Linux threads. Apart from that,
> nothing important. Let's hope this one will become the final 1.2.1
> within a couple days.

Since the Bordeaux university machines are mostly down, I won't be able
to perform all the usual tests before thursday afternoon.

Samuel


Re: [hwloc-users] Magny Cours L3 cache issue

2011-08-16 Thread Samuel Thibault
Wheeler, Kyle Bruce, le Tue 16 Aug 2011 16:52:54 +0200, a écrit :
> hwloc-gather-topology doesn't seem to work on my compute nodes... not sure 
> why. It doesn't report any failures, but it doesn't create the tarball either 
> (just spits out more lstopo output).

Maybe try to replace /bin/sh with /bin/bash in the script?

Samuel


Re: [hwloc-users] Get CPU associated to a thread

2011-08-12 Thread Samuel Thibault
Hello,

PULVERAIL Sébastien, le Fri 12 Aug 2011 13:59:46 +0200, a écrit :
> Does a such function exist ?

See hwloc_get_last_cpu_location()

Samuel


Re: [hwloc-users] hwloc get cpubind function

2011-08-11 Thread Samuel Thibault
Gabriele Fatigati, le Thu 11 Aug 2011 18:26:28 +0200, a écrit :
> Gabriele Fatigati, le Thu 11 Aug 2011 18:05:25 +0200, a écrit :
> > char* bitmap_string=(char*)malloc(256);
> >
> > hwloc_bitmap_t set = hwloc_bitmap_alloc();
> >
> > hwloc_linux_get_tid_cpubind(, tid, set);
>
> with gettid() works well.

Well in that case you can use the more portable

hwloc_get_cpubind(topology, set, HWLOC_CPUBIND_THREAD);

which will also work on non-Linux.

Samuel


Re: [hwloc-users] hwloc get cpubind function

2011-08-11 Thread Samuel Thibault
Gabriele Fatigati, le Thu 11 Aug 2011 18:05:25 +0200, a écrit :
> char* bitmap_string=(char*)malloc(256);
> 
> hwloc_bitmap_t set = hwloc_bitmap_alloc();
> 
> hwloc_linux_get_tid_cpubind(, tid, set);

Where does "tid" come from? hwloc_linux_get_tid_cpubind() only takes
Linux tids (as in gettid()), not OpenMP thread IDs.

Samuel


Re: [hwloc-users] hwloc get cpubind function

2011-08-11 Thread Samuel Thibault
Gabriele Fatigati, le Thu 11 Aug 2011 10:32:23 +0200, a écrit :
> I'm using hwloc-1.3a1r3606.  Now hwloc_get_last_cpu_location() works well:
> 
> thread 0  bind:  0x0008   as core number 3
> thread 1 bind: 0x0800 as core number 11

Good.

> but hwloc_linux_get_tid_cpubind() has still some problems because after 
> binding
> one thread on just one core it give me:
> 
> thread 0 bind:  0x0008   as core number 3
> thread 1 bind: "0x00ff"  as all available cores!!

How do you use it exactly?

Samuel


Re: [hwloc-users] hwloc get cpubind function

2011-08-10 Thread Samuel Thibault
Samuel Thibault, le Wed 10 Aug 2011 16:24:39 +0200, a écrit :
> Gabriele Fatigati, le Wed 10 Aug 2011 16:13:27 +0200, a écrit :
> > there is something wrong. I'm using two thread, the first one is bound on
> > HWLOC_OBJ_PU number 2, the second one on  HWLOC_OBJ_PU number 10,
> 
> It seems that hwloc_linux_get_tid_last_cpu_location erroneously assume
> that /proc/self/stat points to its own thread state indeed, we need to
> fix that.

This should now be fixed in the trunk and the v1.2 branch. You can
either upgrade from svn, or wait for this night's snapshot.

Samuel


Re: [hwloc-users] hwloc get cpubind function

2011-08-10 Thread Samuel Thibault
Gabriele Fatigati, le Wed 10 Aug 2011 16:13:27 +0200, a écrit :
> there is something wrong. I'm using two thread, the first one is bound on
> HWLOC_OBJ_PU number 2, the second one on  HWLOC_OBJ_PU number 10,

It seems that hwloc_linux_get_tid_last_cpu_location erroneously assume
that /proc/self/stat points to its own thread state indeed, we need to
fix that.

Samuel


Re: [hwloc-users] hwloc get cpubind function

2011-08-10 Thread Samuel Thibault
Gabriele Fatigati, le Wed 10 Aug 2011 15:41:19 +0200, a écrit :
> hwloc_cpuset_t set = hwloc_bitmap_alloc();
> 
> int return_value = hwloc_get_last_cpu_location(topology, set,
>  HWLOC_CPUBIND_THREAD);
> 
> printf( " bitmap_string: %s \n", bitmap_string[0]);
> 
> give me:
> 
> 0x0800
> 
> converted in binary:
> 
> 1000
> 
> So, CPU 0 I suppose,

Do you mean linear 0 or physical 0?

cpusets are always physical, 0x800 means CPU with physical number 11.

Samuel


Re: [hwloc-users] hwloc get cpubind function

2011-08-10 Thread Samuel Thibault
Gabriele Fatigati, le Wed 10 Aug 2011 15:29:43 +0200, a écrit :
> hwloc_obj_t core = hwloc_get_obj_by_type(topology, HWLOC_OBJ_MACHINE, 0);
> 
> int return_value = hwloc_get_last_cpu_location(topology, core->cpuset,
> HWLOC_CPUBIND_THREAD);
> 
> and now in "core->cpuset" I get the new cpuset bitmap, where process/threads
> runs. Is it right?

Err, yes, but why using core->cpuset?? Giving it as parameter to
hwloc_get_last_cpu_location will only overwrite its content with the
content returned by hwloc_get_last_cpu_location (which is forbidden, see
the documentation of the cpuset field).

Samuel


Re: [hwloc-users] hwloc get cpubind function

2011-08-10 Thread Samuel Thibault
Gabriele Fatigati, le Wed 10 Aug 2011 09:35:19 +0200, a écrit :
> these lines, doesn't works:
> 
> set = hwloc_bitmap_alloc();
> hwloc_get_cpubind(topology, , 0);
> 
> hwloc_get_cpubind() crash, because I have to pass set, not  i suppose.

Right, of course.

> I think hwloc_get_last_cpu_location() is used coupled with 
> hwloc_get_cpubind()?

Well, they don't _have_ to. They provide a different information. It
just happens that get_last_cpu_location usually returns an index withing
what get_cpubind returns ("always", if the binding is strict).

Samuel


Re: [hwloc-users] hwloc get cpubind function

2011-08-09 Thread Samuel Thibault
Gabriele Fatigati, le Tue 09 Aug 2011 18:14:55 +0200, a écrit :
> hwloc_get_cpubind() function, return, according to the manual, "current 
> process
> or thread binding". What does it means?

The cpuset to which the current process or thread (according to flags)
was last bound to. That is, the converse of set_cpubind().

> It return cpu index where process/ thread runs?

No, hwloc_get_last_cpu_location() does that.

> If yes, which cpuset  I have to use in function arguments?

get_cpubind returns a cpuset, you just provide one you have allocated
the way you prefer.

> Could you give me a little example to use it? 

It is really just the converse of hwloc_set_cpubind(), so for instance:

set = hwloc_bitmap_alloc();
hwloc_get_cpubind(topology, , 0)

Samuel


Re: [hwloc-users] Difference between HWLOC_OBJ_CORE and HWLOC_OBJ_PU

2011-08-09 Thread Samuel Thibault
Gabriele Fatigati, le Tue 09 Aug 2011 17:04:04 +0200, a écrit :
> >There is no difference concerning the cpuset.
> 
> It means they have the same logical index?

Since there is exactly one pu per core and they'll be sorted the same,
yes, by construction they will have the same logical index.

Samuel


Re: [hwloc-users] Thread core affinity

2011-08-04 Thread Samuel Thibault
Gabriele Fatigati, le Thu 04 Aug 2011 16:56:22 +0200, a écrit :
> L#0 and L#1 are physically near because hwloc consider shared caches map when
> build topology?

Yes. That's the whole point of sorting objects topologically first, and
numbering them afterwards. See the glossary entry for "logical index":

“The ordering is based on topology first, and then on OS CPU numbers”

I.e. OS CPU numbers are only used when no topology information (shared
cache etc.) provides any better sorting.

> Because if not, i don't know how hwloc understand the physical
> proximity of cores :(

Physical proximity of cores does not mean logical proximity. cores can
be next one to the other, and still share no cache at all. Forget the
expression "physical proximity", it does not provide any interesting
information. What matters is logical proximity. And that's *precisely*
what logical indexes express.

Samuel


Re: [hwloc-users] Thread core affinity

2011-08-04 Thread Samuel Thibault
Gabriele Fatigati, le Thu 04 Aug 2011 16:35:36 +0200, a écrit :
> so physical OS index 0 and 1 are not true are physically near on the die.

They quite often aren't. See the updated glossary of the documentation:

"The index that the operating system (OS) uses to identify the object.
This may be completely arbitrary, non-unique, non-contiguous, not
representative of proximity, and may depend on the BIOS configuration."

> Considering that, how I can use cache locality and cache sharing by cores if I
> don't know where my threads will physically bound?

By using logical indexes, not physical indexes. And almost all hwloc
functions use logical indexes, not physical indexes.

> If L#0 and L#1  where I bind my threads are physically far, may give me bad
> performance.

L#0 and L#1 are physically near, that's precisely the whole point of
hwloc: it provides you with *logical* indexes which express proximity,
instead of the P#0 and P#1 physical/OS indexes, which are quite often
simply arbitrary.

Samuel


Re: [hwloc-users] Thread core affinity

2011-08-04 Thread Samuel Thibault
Gabriele Fatigati, le Thu 04 Aug 2011 15:52:09 +0200, a écrit :
> how the topology gave by lstopo is built? In particolar, how the logical index
> P# are initialized?

P# are not logical indexes, they are physical indexes, as displayed in
/proc/cpuinfo & such.

The logical indexes, L#, displayed when passing the -l option to lstopo,
are numbered simply linearly, after having sorted the PUs according to
topology.

Samuel


Re: [hwloc-users] Thread core affinity

2011-08-04 Thread Samuel Thibault
Hello,

Gabriele Fatigati, le Mon 01 Aug 2011 12:32:44 +0200, a écrit :
> So, are not physically near. I aspect that with Hyperthreading, and 2 hardware
> threads each core, PU P#0 and PU P#1 are on the same core.

Since these are P#0 and 1, they may not be indeed (physical indexes).
That's the whole problem of the indexes provided by operating systems.

Fortunately,

> If is it not true,
> using in a OMP PARALLEL region with 2 software threads:
> 
> $ pragma omp paralle num_threads(2)
> 
> tid= omp_get_thread_num();
> 
> hwloc_obj_t core = hwloc_get_obj_by_type(topology, HWLOC_OBJ_PU, tid);
> hwloc_cpuset_t set = hwloc_bitmap_dup(core->cpuset);
> hwloc_bitmap_singlify(set);
> 
> hwloc_set_cpubind(topology, set,  HWLOC_CPUBIND_THREAD);
> 
> 
> 
> i would bind thread 0 on PU P#0 and thread 1 on PU P#1, supposing are
> physically near.

No, because hwloc functions do not use physical, but logical indexes,
which it computes according to the topology. Use lstopo --top to check
the actual binding being used.

Samuel


Re: [hwloc-users] Multiple thread binding

2011-08-02 Thread Samuel Thibault
Gabriele Fatigati, le Tue 02 Aug 2011 17:22:31 +0200, a écrit :
> and in this way are equivalent?
> 
> #pragma omp parallel num_threads(1)
> {
> hwloc_obj_t core = hwloc_get_obj_by_type(*topology, HWLOC_OBJ_PU, 0);
> hwloc_cpuset_t set = hwloc_bitmap_dup(core->cpuset);
> hwloc_set_cpubind(*topology, set,  HWLOC_CPUBIND_THREAD | 
> HWLOC_CPUBIND_STRICT);
> hwloc_set_cpubind(*topology, set,  HWLOC_CPUBIND_THREAD | 
> HWLOC_CPUBIND_NOMEMBIND);
> }

Since the first call does not have NOMEMBIND, it might bind the memory
on some OSes, and since the second call does not have the strict flag,
the thread will in the end not be strictly bound.

Samuel


Re: [hwloc-users] Multiple thread binding

2011-08-02 Thread Samuel Thibault
Gabriele Fatigati, le Tue 02 Aug 2011 17:13:15 +0200, a écrit :
> $pragma omp parallel num_thread(1)
> {
> hwloc_set_cpubind(*topology, set,  HWLOC_CPUBIND_THREAD | 
> HWLOC_CPUBIND_STRICT 
> |   HWLOC_CPUBIND_NOMEMBIND);
> }
> 
> is equivalent to?
> 
> $pragma omp parallel num_thread(1)
> {
> hwloc_set_cpubind(*topology, set,  HWLOC_CPUBIND_THREAD);
> hwloc_set_cpubind(*topology, set, HWLOC_CPUBIND_STRICT);
> hwloc_set_cpubind(*topology, set, HWLOC_CPUBIND_NOMEMBIND);
> 
> }

As I said, no. The latter will perform the three operations one after
the other, piling the effect of each of them, which is different from
specifying all the flags at the same time. For instance, in the first
case, only the current thread will be bound, while in the second case,
the second and third calls will bind the whole process! (since there is
no THREAD flag).

> You said HWLOC_CPUBIND_STRICT bind process and memory.

I should have said "potentially memory too". And it's not the STRICT
flag which does this, it's the absence of NOMEMBIND which does this.

> Why also the memory?

Because some OS do this too.

Samuel


Re: [hwloc-users] Multiple thread binding

2011-08-02 Thread Samuel Thibault
Gabriele Fatigati, le Tue 02 Aug 2011 16:23:12 +0200, a écrit :
> hwloc_set_cpubind(*topology, set,  HWLOC_CPUBIND_THREAD | HWLOC_CPUBIND_STRICT
> |   HWLOC_CPUBIND_NOMEMBIND);
> 
> is it possible do multiple call to hwloc_set_cpubind passing each flag per
> time? 
> 
> hwloc_set_cpubind(*topology, set,  HWLOC_CPUBIND_THREAD);
> hwloc_set_cpubind(*topology, set, HWLOC_CPUBIND_STRICT);
> hwloc_set_cpubind(*topology, set, HWLOC_CPUBIND_NOMEMBIND);
> 
> or only the last have effect?

Err, it will simply do the three operations, i.e. first bind the current
thread and memory, then strictly bind the whole process and memory, and
eventually bind the process but not memory (but it will still bound
since it was by the second call).

Samuel


Re: [hwloc-users] [hwloc-announce] Hardware Locality (hwloc) v1.2.1rc1 released

2011-08-02 Thread Samuel Thibault
Hello,

Hendryk Bockelmann, le Tue 02 Aug 2011 10:54:54 +0200, a écrit :
> I will test hwloc-1.2.1rc1r3567.tar.gz in the next days on our POWER6
> cluster running AIX6.1 and report the results to you resp. to the list

Maybe rather wait for next nightly snapshot, as I've just fixed a bug
with xml test which will probably hit you.

Samuel


Re: [hwloc-users] Thread core affinity

2011-08-01 Thread Samuel Thibault
Gabriele Fatigati, le Mon 01 Aug 2011 14:48:11 +0200, a écrit :
> so, if I inderstand well, PU P# numbers are not  the same specified  as
> HWLOC_OBJ_PU flag?

They are, in the os_index (aka physical index) field.

Samuel


Re: [hwloc-users] Thread core affinity

2011-07-29 Thread Samuel Thibault
Gabriele Fatigati, le Fri 29 Jul 2011 13:34:29 +0200, a écrit :
> I forgot to tell you these code block is inside a parallel OpenMP region. This
> is the complete code:
> 
> #pragma omp parallel num_threads(6)
> {
> int tid = omp_get_thread_num();
> 
> hwloc_obj_t core = hwloc_get_obj_by_type(topology, HWLOC_OBJ_CORE, tid);
> 
> and other code block is:
> 
> #pragma omp parallel num_threads(6)
> {
> int tid = omp_get_thread_num();
> 
> hwloc_obj_t core = hwloc_get_obj_by_type(topology, HWLOC_OBJ_PU, tid);

Ok, so it depends whether you want to put your OpenMP threads on
separate cores (then the first code which distributes among cores), or
if you're ok with letting them share a core (then the first code which
distributes among threads).

Maybe try and run lstopo --top to see the result.

Samuel


Re: [hwloc-users] Thread core affinity

2011-07-29 Thread Samuel Thibault
Gabriele Fatigati, le Fri 29 Jul 2011 13:24:17 +0200, a écrit :
> yhanks for yout quick reply!
> 
> But i have a litte doubt. in a non SMT machine, Is it better use this:
> 
> hwloc_obj_t core = hwloc_get_obj_by_type(topology, HWLOC_OBJ_CORE, tid);
> hwloc_cpuset_t set = hwloc_bitmap_dup(core->cpuset);
> hwloc_bitmap_singlify(set);
> hwloc_set_cpubind(topology, set,  HWLOC_CPUBIND_THREAD);
> 
> or:
> 
> hwloc_obj_t core = hwloc_get_obj_by_type(topology, HWLOC_OBJ_PU, tid);
> hwloc_cpuset_t set = hwloc_bitmap_dup(core->cpuset);
> hwloc_bitmap_singlify(set);
> hwloc_set_cpubind(topology, set,  HWLOC_CPUBIND_THREAD);
> 
> because work in the same way( i suppose).

They'll both work about the same way on SMT too, since in the end it'll
pick up only one thread. Whether you want to assign threads to cores or
threads then depends on your application: do you want to let its threads
share a core or not.

Samuel


Re: [hwloc-users] Thread core affinity

2011-07-29 Thread Samuel Thibault
Hello,

Gabriele Fatigati, le Fri 29 Jul 2011 12:43:47 +0200, a écrit :
> I'm so confused. I see couples of cores with the same core id! ( Core#8 for
> example)  How is it possible? 

That's because they are on different sockets. These are physical IDs
(not logical IDs), and are thus not garanteed to be unique.

> 2) logical Core id and Physical core id maybe differents. If i want to be sure
> that id 0 and id 1 are physically near, i have to use core id or PU id? PU ids
> are ever physically near?

Using core or thread ID does not matter. What matters is that you take
the proper ID. Physical IDs will in general never bring you any
proximity indication. What you want is logical IDs, which hwloc takes
care of meaning proximity. Using adjacent logical IDs (be it for core or
threads) will bring you adjacent cores/threads.

> 3) Binding a thread on a core, what's the difference between hwloc_set_cpubind
> () and hwloc_set_thread_cpubind()? More in depth, my code example works well
> with:
> 
> hwloc_set_cpubind(topology, set,  HWLOC_CPUBIND_THREAD);
> 
> and crash with:
> 
> hwloc_set_thread_cpubind(topology, tid, set,  HWLOC_CPUBIND_THREAD);

Note that tid is hwloc_thread_t, i.e. pthread_t on unixes.
It is not a (Linux-specific) tid. If what you have is a (Linux-specific)
tid, use the Linux-specific function, hwloc_linux_set_tid_cpubind.

Samuel


Re: [hwloc-users] hwloc 1.2 compilation problems

2011-07-12 Thread Samuel Thibault
Carl Smith, le Tue 12 Jul 2011 02:46:27 +0200, a écrit :
> > is it perhaps the presence of -L/usr/local/lib which makes the linking
> > fail? I've commited something that might help.
> 
>   Perhaps.  Your latest change does work on this AIX system.  Thanks
> for persisting.

Great!

I've backported to the 1.2 branch.

Samuel


Re: [hwloc-users] hwloc 1.2 compilation problems

2011-07-09 Thread Samuel Thibault
Carl Smith, le Fri 08 Jul 2011 03:51:07 +0200, a écrit :
> > Alright, I give up trying to use autoconf high-end macros, here is
> > another, low-level try.
> 
>   Alas, I think this one comes full circle:  it's deciding on ncurses,
> then failing the link step.

Uh. That's not coherent:

checking curses support using curses.h and -lncurses... yes

means that ./configure was able to compile & link with -lncurses the
following:

#include 
#include 
int main(void) {
NULL, 0, 0, 0, 0, 0, 0, 0, 0, 0);
}

but then it fails at lstopo-text link, which does the same?!

is it perhaps the presence of -L/usr/local/lib which makes the linking
fail? I've commited something that might help.

Samuel


Re: [hwloc-users] hwloc 1.2 compilation problems

2011-07-07 Thread Samuel Thibault
Carl Smith, le Fri 08 Jul 2011 01:01:53 +0200, a écrit :
> > Oops, I hadn't realized that AC_CHECK_HEADERS checks for all of them.
> > I've rewritten it quite a bit, in an actually more straightforward way,
> > could you test it?
> 
>   Sure - still no joy.  It's still selecting ncurses.

Ow, AC_SEARCH_LIBS is actually not using ac_includes_default.

Alright, I give up trying to use autoconf high-end macros, here is
another, low-level try.

Samuel


Re: [hwloc-users] hwloc 1.2 compilation problems

2011-07-03 Thread Samuel Thibault
Samuel Thibault, le Tue 21 Jun 2011 02:10:22 +0200, a écrit :
> Carl Smith, le Tue 21 Jun 2011 02:07:09 +0200, a écrit :
> > > Ah, ok. So what fails to link is
> > > 
> > > /* cc test.c -o test -lncurses */
> > > #include 
> > > #include 
> > > int main(void) {
> > > }
> > > 
> > > is that right?
> > 
> > Yes, and
> > 
> > > /* cc test.c -I/usr/include/ncurses -o test -lncurses */
> > 
> > does not fail.
> 
> Ok, then good, I'll simply include term.h when checking -lfoocurses, to
> make it fail with ncurses on your AIX box (but succeed with curses right
> after that)

I've done so in svn, could you check?

Samuel


Re: [hwloc-users] Patch to disable GCC __builtin_ operations

2011-06-09 Thread Samuel Thibault
Josh Hursey, le Thu 09 Jun 2011 14:52:39 +0200, a écrit :
> The odd thing about this environment is that the head node seems to
> have a slightly different setup than the compute nodes (not sure why
> exactly, but that's what it is). So hwloc is configured and runs
> correctly on the head node, but when it is asked to run on the compute
> nodes it segvs at the call site of the __builtin_ functions.

Could you post a disassembly of the site?

> I suspect that the ABI compatibility of the libc interface is what is
> enabling the remainder of the code to work in both environments, and
> that the __builtin_ functions bypass that ABI to put in system
> specific code that (for whatever reason) does not match on the compute
> nodes.

But the odd thing is that there shouldn't be any ABI things here, it's
meant to be inlined.

Samuel


  1   2   >