Re: [hwloc-devel] hwloc-bind syntax

2009-12-04 Thread Jeff Squyres
On Dec 4, 2009, at 5:36 AM, Brice Goglin wrote:

> > It might be good to safely ignore 0x if it's present, but that's a small 
> > feature enhancement that can be done at any time (I filed a future ticket).
> 
> It seems to work actually :)

Hmm -- I don't think so...?  "0x1" can't pass this test in 
hwloc_mask_process_arg():

  } else if (strlen(arg) == strspn(arg, "0123456789abcdefABCDEF,")) {

In my tests, it's falling through to the "err = -1" case, but just not printing 
out an error.  Even more fun -- note the lack of error shown, and the lack of 
"ls" output, except for when we specify -v:


[8:33] rtp-jsquyres-8711:~/svn/hwloc % ./utils/hwloc-bind 0x1 ls
[8:33] rtp-jsquyres-8711:~/svn/hwloc % ./utils/hwloc-bind -v 0x1 ls
assuming the command starts at 0x1
execvp: No such file or directory
-

If think that if execvp() fails, we should *always* print an error, not just if 
-v was specified.  I'll fix.

> > Linux is likely to be among the most popular target for hwloc -- so can you 
> > explain in good words definitions for the following:

[snipped]

Thanks.

> > Additionally -- the word "father" is used in the docs.  Should we use the 
> > gender-neutral "parent" instead?
> 
> I am not sure. The object structure contains a father pointer. We use
> parent in the API, but it might refer to different things, like father,
> grandfather, ...

FWIW, the english word "parent" definitely refers to the immediate ancestor.  
It does *not* refer to grandparents or great-grandparents, etc.

> > What I meant by my question was -- aren't the 3 diagrams above equivalent 
> > to "core:6"? If so, what's the value of the foo.bar.baz notation?
> 
> If you have a 96 core machine like we do, the hierarchical notation
> (foo.bar.baz) is really nice. If I want to bind on
> node:2.socket:3.core:4, it's much easier than looking at the topology
> and finding that it's core:70.

Ah, ok.  Fair enough.

> Using physical or logical indexes doesn't
> change anything here. I agree that we don't do that often in real
> applications, but I actually use that quite a lot for my own debugging :)

Another good reason.  :-)

> I actually don't see why people would like to use physical numbers in
> such a hierarchical notation since physical socket/core numbers are
> often strange/illogical and nobody remembers them. However, I agree that
> the physical indexes are useful when *not* using a hierarchical
> notation, ie I want to bind on thread OS index #46.

As a server vendor, using physical/OS indexes is actually quite useful to me 
(e.g., to ensure that the hardware and OS are playing nicely).

My point is that everyone has a different view here -- we should just support 
both.  IMHO, the common case is logical indexes -- so let's make those the 
default.  But there are definitely cases where physical indexes are useful as 
well.

-- 
Jeff Squyres
jsquy...@cisco.com




Re: [hwloc-devel] hwloc-bind syntax

2009-12-04 Thread Jeff Squyres
On Dec 4, 2009, at 5:32 AM, Ashley Pittman wrote:

> > It might be good to safely ignore 0x if it's present, but that's a small 
> > feature enhancement that can be done at any time (I filed a future ticket).
> 
> Maybe not relevant but it bit me so I'll say it here, using "%x" with
> sscanf on a string of "0x1" will match the whole thing and give a value
> of 1 on Linux but on Solaris it'll match the "0" as a hex value of 0 and
> not match the "x1" at all leading to further errors in subsequent
> matches as well.  The most annoying thing is that sscanf() thinks it's
> matched and it's return code will be set accordingly.

Yuck!

Thankfully, we don't appear to be using sscanf() to convert the cpuset strings.

-- 
Jeff Squyres
jsquy...@cisco.com




Re: [hwloc-devel] hwloc-bind again

2009-12-04 Thread Jeff Squyres
On Dec 4, 2009, at 1:09 AM, Brice Goglin wrote:

> > shell$ hwloc-bind
> >
> > (i.e., invoking hwloc-bind with no arguments)
> >
> > returns an exit status of 0.  Shouldn't it return non-zero?
> 
> Yeah maybe

I'm going to interpret that as "Hell yes!  Please implement.  THANKS!!!"

;-)

-- 
Jeff Squyres
jsquy...@cisco.com



Re: [hwloc-devel] Disabling X component

2009-12-04 Thread Ashley Pittman
On Fri, 2009-12-04 at 13:04 +0100, Samuel Thibault wrote:
> Ashley Pittman, le Fri 04 Dec 2009 11:06:12 +, a écrit :
> > The debian version of -.txt (lstopo 0.9.3rc1) leaves my terminal with
> > the colours inverted after I call it, I have to do a reset to get back
> > to black on grey background.
> 
> Uh, odd. Which terminal are you using?

gnome-terminal with $TERM set to xterm.  I've not done anything special
with this, it's just a debian unstable install.

Ashley,

-- 

Ashley Pittman, Bath, UK.

Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk




Re: [hwloc-devel] Disabling X component

2009-12-04 Thread Samuel Thibault
Ashley Pittman, le Fri 04 Dec 2009 11:06:12 +, a écrit :
> The debian version of -.txt (lstopo 0.9.3rc1) leaves my terminal with
> the colours inverted after I call it, I have to do a reset to get back
> to black on grey background.

Uh, odd. Which terminal are you using?

Samuel


Re: [hwloc-devel] Disabling X component

2009-12-04 Thread Brice Goglin
Ashley Pittman wrote:
> I installed the debian package of hwloc yesterday and discovered that
> the default action of lstopo is to display a window with a picture in.
> I guess I don't have the right development packages installed for this
> to be enabled in my local build.
>
> In my tool I want to ensure the text version is displayed, padb popping
> up a number of windows isn't what people will expect or want.  Obviously
> I can unset DISPLAY before calling lstopo but a --no-x or --text-based
> option would be a nice thing to have as well.
>   

lstopo - tells lstopo to output on /dev/stdout

Brice



[hwloc-devel] Disabling X component

2009-12-04 Thread Ashley Pittman

I installed the debian package of hwloc yesterday and discovered that
the default action of lstopo is to display a window with a picture in.
I guess I don't have the right development packages installed for this
to be enabled in my local build.

In my tool I want to ensure the text version is displayed, padb popping
up a number of windows isn't what people will expect or want.  Obviously
I can unset DISPLAY before calling lstopo but a --no-x or --text-based
option would be a nice thing to have as well.

Ashley, 

-- 

Ashley Pittman, Bath, UK.

Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk



Re: [hwloc-devel] hwloc-bind syntax

2009-12-04 Thread Brice Goglin
Jeff Squyres wrote:
> It might be good to safely ignore 0x if it's present, but that's a small 
> feature enhancement that can be done at any time (I filed a future ticket).
>   

It seems to work actually :)

>> We might want to drop the Linux "cpuset" word and use "cgroup" instead.
>> Both are supported by Linux, but the latter now contains the former and
>> more, so people are supposed to use cgroup now. hwloc supports both.
>> 
>
> Linux is likely to be among the most popular target for hwloc -- so can you 
> explain in good words definitions for the following:
>
> - hwloc cpuset
>   

Opaque structure describing a set of logical processors. Each hwloc
object structure contains a cpuset field that describes which logical
processors are contained in the corresponding physical object. hwloc
cpusets are used by hwloc binding routines.

> - Linux cpuset
> - Linux cgroup
>   

See http://www.mjmwired.net/kernel/Documentation/cgroups.txt, and look
for cpusets in there:

Control Groups provide a mechanism for aggregating/partitioning sets of
tasks, and all their future children, into hierarchical groups with
specialized behaviour.
[...]
On their own, the only use for cgroups is for simple job tracking.
The intention is that other subsystems hook into the generic
cgroup support to provide new attributes for cgroups, such as
accounting/limiting the resources which processes in a cgroup can
access. For example, cpusets allows you to associate a set of CPUs and
a set of memory nodes with the  tasks in each cgroup.


> Additionally -- the word "father" is used in the docs.  Should we use the 
> gender-neutral "parent" instead?
>   

I am not sure. The object structure contains a father pointer. We use
parent in the API, but it might refer to different things, like father,
grandfather, ...

>> You don't care about starting with system or something else. You can
>> ignore the system level as you could ignore the socket level between
>> nodes and cores.
>>
>> If you have 1 system with 2 nodes with 2 sockets each with 2 cores each,
>> you get:
>> node:1 core:2 is equivalent to system:0 node:1 socket:2 core:0 and
>> equivalent to system:0 core:6
>> 
>
> Did you mean:
>
>   node:1.core:2 == system:0.node:1.socket:2.core:0 == system:0.core:6
>
> ?
>   

Yes.

> What I meant by my question was -- aren't the 3 diagrams above equivalent to 
> "core:6"? If so, what's the value of the foo.bar.baz notation?

If you have a 96 core machine like we do, the hierarchical notation
(foo.bar.baz) is really nice. If I want to bind on
node:2.socket:3.core:4, it's much easier than looking at the topology
and finding that it's core:70. Using physical or logical indexes doesn't
change anything here. I agree that we don't do that often in real
applications, but I actually use that quite a lot for my own debugging :)

I actually don't see why people would like to use physical numbers in
such a hierarchical notation since physical socket/core numbers are
often strange/illogical and nobody remembers them. However, I agree that
the physical indexes are useful when *not* using a hierarchical
notation, ie I want to bind on thread OS index #46.

Brice



Re: [hwloc-devel] hwloc-bind syntax

2009-12-04 Thread Ashley Pittman
On Thu, 2009-12-03 at 20:32 -0500, Jeff Squyres wrote:
> > > Ah, ok.  To be clear, is it accurate to say that it is one of the 
> > > following forms:
> > >
> > > - a hex number (without leading "0x" -- would "0x" be ignored if it is 
> > > supplied?)
> > 
> > We never used 0x there.
> 
> Ok.
> 
> It might be good to safely ignore 0x if it's present, but that's a small 
> feature enhancement that can be done at any time (I filed a future ticket).

Maybe not relevant but it bit me so I'll say it here, using "%x" with
sscanf on a string of "0x1" will match the whole thing and give a value
of 1 on Linux but on Solaris it'll match the "0" as a hex value of 0 and
not match the "x1" at all leading to further errors in subsequent
matches as well.  The most annoying thing is that sscanf() thinks it's
matched and it's return code will be set accordingly.

Ashley,

-- 

Ashley Pittman, Bath, UK.

Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk



Re: [hwloc-devel] hwloc-bind again

2009-12-04 Thread Brice Goglin
Jeff Squyres wrote:
> I notice that
>
> shell$ hwloc-bind
>
> (i.e., invoking hwloc-bind with no arguments)
>
> returns an exit status of 0.  Shouldn't it return non-zero?  I'd think it was 
> an error if you didn't give hwloc-bind anything to do.  For example, we 
> wouldn't want a script with something like this:
>
> hwloc-bind $actions_to_do
>
> to return 0 if $actions_to_do was mistakenly empty.
>
> Right?
>   


Yeah maybe