Re: [OMPI users] Determining what parameters a scheduler passes to OpenMPI

2014-06-08 Thread Ralph Castain
Sorry about the comment re cpus-per-proc - confused this momentarily with 
another user also using Torque. I confirmed that this works fine with 1.6.5, 
and would guess you are hitting some bug in 1.6.0. Can you update?


On Jun 6, 2014, at 12:20 PM, Ralph Castain  wrote:

> You might want to update to 1.6.5, if you can - I'll see what I can find
> 
> On Jun 6, 2014, at 12:07 PM, Sasso, John (GE Power & Water, Non-GE) 
>  wrote:
> 
>> Version 1.6 (i.e. prior to 1.6.1)
>> 
>> -Original Message-
>> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph Castain
>> Sent: Friday, June 06, 2014 3:03 PM
>> To: Open MPI Users
>> Subject: Re: [OMPI users] Determining what parameters a scheduler passes to 
>> OpenMPI
>> 
>> It's possible that you are hitting a bug - not sure how much the 
>> cpus-per-proc option has been exercised in 1.6. Is this 1.6.5, or some other 
>> member of that series?
>> 
>> I don't have a Torque machine handy any more, but should be able to test 
>> this scenario on my boxes
>> 
>> 
>> On Jun 6, 2014, at 10:51 AM, Sasso, John (GE Power & Water, Non-GE) 
>>  wrote:
>> 
>>> Re: $PBS_NODEFILE, we use that to create the hostfile that is passed via 
>>> --hostfile (i.e. the two are the same).  
>>> 
>>> To further debug this, I passed "--display-allocation --display-map" to 
>>> orterun, which resulted in:
>>> 
>>> ==   ALLOCATED NODES   ==
>>> 
>>> Data for node: node0001Num slots: 16   Max slots: 0
>>> Data for node: node0002  Num slots: 8Max slots: 0
>>> 
>>> =
>>> 
>>>    JOB MAP   
>>> 
>>> Data for node: node0001Num procs: 24
>>>  Process OMPI jobid: [24552,1] Process rank: 0
>>>  Process OMPI jobid: [24552,1] Process rank: 1
>>>  Process OMPI jobid: [24552,1] Process rank: 2
>>>  Process OMPI jobid: [24552,1] Process rank: 3
>>>  Process OMPI jobid: [24552,1] Process rank: 4
>>>  Process OMPI jobid: [24552,1] Process rank: 5
>>>  Process OMPI jobid: [24552,1] Process rank: 6
>>>  Process OMPI jobid: [24552,1] Process rank: 7
>>>  Process OMPI jobid: [24552,1] Process rank: 8
>>>  Process OMPI jobid: [24552,1] Process rank: 9
>>>  Process OMPI jobid: [24552,1] Process rank: 10
>>>  Process OMPI jobid: [24552,1] Process rank: 11
>>>  Process OMPI jobid: [24552,1] Process rank: 12
>>>  Process OMPI jobid: [24552,1] Process rank: 13
>>>  Process OMPI jobid: [24552,1] Process rank: 14
>>>  Process OMPI jobid: [24552,1] Process rank: 15
>>>  Process OMPI jobid: [24552,1] Process rank: 16
>>>  Process OMPI jobid: [24552,1] Process rank: 17
>>>  Process OMPI jobid: [24552,1] Process rank: 18
>>>  Process OMPI jobid: [24552,1] Process rank: 19
>>>  Process OMPI jobid: [24552,1] Process rank: 20
>>>  Process OMPI jobid: [24552,1] Process rank: 21
>>>  Process OMPI jobid: [24552,1] Process rank: 22
>>>  Process OMPI jobid: [24552,1] Process rank: 23
>>> 
>>> I have been going through the man page of mpirun as well as the OpenMPI 
>>> mailing list and website, and thus far have been unable to determine the 
>>> reason for the oversubscription of the head node (node0001) when even the 
>>> PBS scheduler is passing along the correct slot count #s (16 and 8, resp).
>>> 
>>> Am I running into a bug w/ OpenMPI 1.6?
>>> 
>>> --john
>>> 
>>> 
>>> 
>>> -Original Message-
>>> From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Ralph 
>>> Castain
>>> Sent: Friday, June 06, 2014 1:30 PM
>>> To: Open MPI Users
>>> Subject: Re: [OMPI users] Determining what parameters a scheduler 
>>> passes to OpenMPI
>>> 
>>> 
>>> On Jun 6, 2014, at 10:24 AM, Gus Correa  wrote:
>>> 
 On 06/06/2014 01:05 PM, Ralph Castain wrote:
> You can always add --display-allocation to the cmd line to see what 
> we thought we received.
> 
> If you configure OMPI with --enable-debug, you can set --mca 
> ras_base_verbose 10 to see the details
> 
> 
 
 Hi John
 
 On the Torque side, you can put a line "cat $PBS_NODEFILE" on the job 
 script.  This will list the nodes (multiple times according to the number 
 of cores requested).
 I find this useful documentation,
 along with job number, work directory, etc.
 "man qsub" will show you all the PBS_* environment variables 
 available to the job.
 For instance, you can echo them using a Torque 'prolog' script, if 
 the user didn't do it. That will appear in the Torque STDOUT file.
 
 From outside the job script, "qstat -n" (and variants, say, with -u
 username) will list the nodes allocated to each job, again multiple 
 times as per the requested cores.
 
 "tracejob job_number" will show similar information.

Re: [OMPI users] Bind multiple cores to rank - OpenMPI 1.8.1

2014-06-08 Thread tmishima


It's a good idea to provide the default setting for the modifier pe.

Okay, I can take a look to review but a bit busy now, so please give me
a few days.

Regards,

Tetsuya

> Okay, I revised the command line option to be a little more
user-friendly. You can now specify the equivalent of the old
--cpus-per-proc as just "--map-by :pe=N", leaving the mapping policy set as

> the default. We will default to NUMA so the cpus will all be in the same
NUMA region, if possible, thus providing better performance.
>
> Scheduled this for 1.8.2, asking Tetsuya to review.
>
> On Jun 6, 2014, at 6:25 PM, Ralph Castain  wrote:
>
> > HmmmTetsuya is quite correct. Afraid I got distracted by the
segfault (still investigating that one). Our default policy for 2 processes
is to map-by core, and that would indeed fail when
> cpus-per-proc > 1. However, that seems like a non-intuitive requirement,
so let me see if I can make this be a little more user-friendly.
> >
> >
> > On Jun 6, 2014, at 2:25 PM, tmish...@jcity.maeda.co.jp wrote:
> >
> >>
> >>
> >> Hi Dan,
> >>
> >> Please try:
> >> mpirun -np 2 --map-by socket:pe=8 ./hello
> >> or
> >> mpirun -np 2 --map-by slot:pe=8 ./hello
> >>
> >> You can not bind 8 cpus to the object "core" which has
> >> only one cpu. This limitation started from 1.8 series.
> >>
> >> The objcet "socket" has 8 cores in your case. So you
> >> can do it. And, the object "slot" is almost same as the
> >> "core" but it can exceed the limitation, because it's a
> >> fictitious object which has no size.
> >>
> >> Regards,
> >> Tetsuya Mishima
> >>
> >>
> >>> No problem -
> >>>
> >>> These are model name : Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
chips.
> >>> 2 per node, 8 cores each. No threading enabled.
> >>>
> >>> $ lstopo
> >>> Machine (64GB)
> >>> NUMANode L#0 (P#0 32GB)
> >>> Socket L#0 + L3 L#0 (20MB)
> >>> L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0
> >> (P#0)
> >>> L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1
> >> (P#1)
> >>> L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2
> >> (P#2)
> >>> L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3
> >> (P#3)
> >>> L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4 + PU L#4
> >> (P#4)
> >>> L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5 + PU L#5
> >> (P#5)
> >>> L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6 + PU L#6
> >> (P#6)
> >>> L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7 + PU L#7
> >> (P#7)
> >>> HostBridge L#0
> >>> PCIBridge
> >>> PCI 1000:0087
> >>> Block L#0 "sda"
> >>> PCIBridge
> >>> PCI 8086:2250
> >>> PCIBridge
> >>> PCI 8086:1521
> >>> Net L#1 "eth0"
> >>> PCI 8086:1521
> >>> Net L#2 "eth1"
> >>> PCIBridge
> >>> PCI 102b:0533
> >>> PCI 8086:1d02
> >>> NUMANode L#1 (P#1 32GB)
> >>> Socket L#1 + L3 L#1 (20MB)
> >>> L2 L#8 (256KB) + L1d L#8 (32KB) + L1i L#8 (32KB) + Core L#8 + PU L#8
> >> (P#8)
> >>> L2 L#9 (256KB) + L1d L#9 (32KB) + L1i L#9 (32KB) + Core L#9 + PU L#9
> >> (P#9)
> >>> L2 L#10 (256KB) + L1d L#10 (32KB) + L1i L#10 (32KB) + Core L#10
> >>> + PU L#10 (P#10)
> >>> L2 L#11 (256KB) + L1d L#11 (32KB) + L1i L#11 (32KB) + Core L#11
> >>> + PU L#11 (P#11)
> >>> L2 L#12 (256KB) + L1d L#12 (32KB) + L1i L#12 (32KB) + Core L#12
> >>> + PU L#12 (P#12)
> >>> L2 L#13 (256KB) + L1d L#13 (32KB) + L1i L#13 (32KB) + Core L#13
> >>> + PU L#13 (P#13)
> >>> L2 L#14 (256KB) + L1d L#14 (32KB) + L1i L#14 (32KB) + Core L#14
> >>> + PU L#14 (P#14)
> >>> L2 L#15 (256KB) + L1d L#15 (32KB) + L1i L#15 (32KB) + Core L#15
> >>> + PU L#15 (P#15)
> >>> HostBridge L#5
> >>> PCIBridge
> >>> PCI 15b3:1011
> >>> Net L#3 "ib0"
> >>> OpenFabrics L#4 "mlx5_0"
> >>> PCIBridge
> >>> PCI 8086:2250
> >>>
> >>> From the segfault below. I tried reproducing the crash on less than
an
> >>> 4 node allocation but wasn't able to.
> >>>
> >>> ddietz@conte-a009:/scratch/conte/d/ddietz/hello$ mpirun -np 2
> >>> -machinefile ./nodes -mca plm_base_verbose 10 ./hello
> >>> [conte-a009.rcac.purdue.edu:55685] mca: base: components_register:
> >>> registering plm components
> >>> [conte-a009.rcac.purdue.edu:55685] mca: base: components_register:
> >>> found loaded component isolated
> >>> [conte-a009.rcac.purdue.edu:55685] mca: base: components_register:
> >>> component isolated has no register or open function
> >>> [conte-a009.rcac.purdue.edu:55685] mca: base: components_register:
> >>> found loaded component slurm
> >>> [conte-a009.rcac.purdue.edu:55685] mca: base: components_register:
> >>> component slurm register function successful
> >>> [conte-a009.rcac.purdue.edu:55685] mca: base: components_register:
> >>> found loaded component rsh
> >>> [conte-a009.rcac.purdue.edu:55685] mca: base: components_register:
> >>> component rsh register function successful
> >>> [conte-a009.rcac.purdue.edu:55685] mca: base: components_register:
> >>> found loaded component tm
> >>> [conte-a009.rcac.purdue.edu:55685] mca: base: 

Re: [OMPI users] Bind multiple cores to rank - OpenMPI 1.8.1

2014-06-08 Thread Ralph Castain
I'm having no luck poking at this segfault issue. For some strange reason, we 
seem to think there are coprocessors on those remote nodes - e.g., a Phi card. 
Yet your lstopo output doesn't seem to show it.

Out of curiosity, can you try running this with "-mca plm rsh"? This will 
substitute the rsh/ssh launcher in place of Torque - assuming your system will 
allow it, this will let me see if the problem is somewhere in the Torque 
launcher or elsewhere in OMPI.

Thanks
Ralph

On Jun 6, 2014, at 12:48 PM, Dan Dietz  wrote:

> No problem -
> 
> These are model name : Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz chips.
> 2 per node, 8 cores each. No threading enabled.
> 
> $ lstopo
> Machine (64GB)
>  NUMANode L#0 (P#0 32GB)
>Socket L#0 + L3 L#0 (20MB)
>  L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 
> (P#0)
>  L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 
> (P#1)
>  L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 
> (P#2)
>  L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 
> (P#3)
>  L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4 + PU L#4 
> (P#4)
>  L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5 + PU L#5 
> (P#5)
>  L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6 + PU L#6 
> (P#6)
>  L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7 + PU L#7 
> (P#7)
>HostBridge L#0
>  PCIBridge
>PCI 1000:0087
>  Block L#0 "sda"
>  PCIBridge
>PCI 8086:2250
>  PCIBridge
>PCI 8086:1521
>  Net L#1 "eth0"
>PCI 8086:1521
>  Net L#2 "eth1"
>  PCIBridge
>PCI 102b:0533
>  PCI 8086:1d02
>  NUMANode L#1 (P#1 32GB)
>Socket L#1 + L3 L#1 (20MB)
>  L2 L#8 (256KB) + L1d L#8 (32KB) + L1i L#8 (32KB) + Core L#8 + PU L#8 
> (P#8)
>  L2 L#9 (256KB) + L1d L#9 (32KB) + L1i L#9 (32KB) + Core L#9 + PU L#9 
> (P#9)
>  L2 L#10 (256KB) + L1d L#10 (32KB) + L1i L#10 (32KB) + Core L#10
> + PU L#10 (P#10)
>  L2 L#11 (256KB) + L1d L#11 (32KB) + L1i L#11 (32KB) + Core L#11
> + PU L#11 (P#11)
>  L2 L#12 (256KB) + L1d L#12 (32KB) + L1i L#12 (32KB) + Core L#12
> + PU L#12 (P#12)
>  L2 L#13 (256KB) + L1d L#13 (32KB) + L1i L#13 (32KB) + Core L#13
> + PU L#13 (P#13)
>  L2 L#14 (256KB) + L1d L#14 (32KB) + L1i L#14 (32KB) + Core L#14
> + PU L#14 (P#14)
>  L2 L#15 (256KB) + L1d L#15 (32KB) + L1i L#15 (32KB) + Core L#15
> + PU L#15 (P#15)
>HostBridge L#5
>  PCIBridge
>PCI 15b3:1011
>  Net L#3 "ib0"
>  OpenFabrics L#4 "mlx5_0"
>  PCIBridge
>PCI 8086:2250
> 
> From the segfault below. I tried reproducing the crash on less than an
> 4 node allocation but wasn't able to.
> 
> ddietz@conte-a009:/scratch/conte/d/ddietz/hello$ mpirun -np 2
> -machinefile ./nodes -mca plm_base_verbose 10 ./hello
> [conte-a009.rcac.purdue.edu:55685] mca: base: components_register:
> registering plm components
> [conte-a009.rcac.purdue.edu:55685] mca: base: components_register:
> found loaded component isolated
> [conte-a009.rcac.purdue.edu:55685] mca: base: components_register:
> component isolated has no register or open function
> [conte-a009.rcac.purdue.edu:55685] mca: base: components_register:
> found loaded component slurm
> [conte-a009.rcac.purdue.edu:55685] mca: base: components_register:
> component slurm register function successful
> [conte-a009.rcac.purdue.edu:55685] mca: base: components_register:
> found loaded component rsh
> [conte-a009.rcac.purdue.edu:55685] mca: base: components_register:
> component rsh register function successful
> [conte-a009.rcac.purdue.edu:55685] mca: base: components_register:
> found loaded component tm
> [conte-a009.rcac.purdue.edu:55685] mca: base: components_register:
> component tm register function successful
> [conte-a009.rcac.purdue.edu:55685] mca: base: components_open: opening
> plm components
> [conte-a009.rcac.purdue.edu:55685] mca: base: components_open: found
> loaded component isolated
> [conte-a009.rcac.purdue.edu:55685] mca: base: components_open:
> component isolated open function successful
> [conte-a009.rcac.purdue.edu:55685] mca: base: components_open: found
> loaded component slurm
> [conte-a009.rcac.purdue.edu:55685] mca: base: components_open:
> component slurm open function successful
> [conte-a009.rcac.purdue.edu:55685] mca: base: components_open: found
> loaded component rsh
> [conte-a009.rcac.purdue.edu:55685] mca: base: components_open:
> component rsh open function successful
> [conte-a009.rcac.purdue.edu:55685] mca: base: components_open: found
> loaded component tm
> [conte-a009.rcac.purdue.edu:55685] mca: base: components_open:
> component tm open function successful
> [conte-a009.rcac.purdue.edu:55685] mca:base:select: Auto-selecting plm
> components
> [conte-a009.rcac.purdue.edu:55685] mca:base:select:(  plm) Querying
> component [isolated]
> 

Re: [OMPI users] Bind multiple cores to rank - OpenMPI 1.8.1

2014-06-08 Thread Ralph Castain
Okay, I revised the command line option to be a little more user-friendly. You 
can now specify the equivalent of the old --cpus-per-proc as just "--map-by 
:pe=N", leaving the mapping policy set as the default. We will default to NUMA 
so the cpus will all be in the same NUMA region, if possible, thus providing 
better performance.

Scheduled this for 1.8.2, asking Tetsuya to review.

On Jun 6, 2014, at 6:25 PM, Ralph Castain  wrote:

> HmmmTetsuya is quite correct. Afraid I got distracted by the segfault 
> (still investigating that one). Our default policy for 2 processes is to 
> map-by core, and that would indeed fail when cpus-per-proc > 1. However, that 
> seems like a non-intuitive requirement, so let me see if I can make this be a 
> little more user-friendly.
> 
> 
> On Jun 6, 2014, at 2:25 PM, tmish...@jcity.maeda.co.jp wrote:
> 
>> 
>> 
>> Hi Dan,
>> 
>> Please try:
>> mpirun -np 2 --map-by socket:pe=8 ./hello
>> or
>> mpirun -np 2 --map-by slot:pe=8 ./hello
>> 
>> You can not bind 8 cpus to the object "core" which has
>> only one cpu. This limitation started from 1.8 series.
>> 
>> The objcet "socket" has 8 cores in your case. So you
>> can do it. And, the object "slot" is almost same as the
>> "core" but it can exceed the limitation, because it's a
>> fictitious object which has no size.
>> 
>> Regards,
>> Tetsuya Mishima
>> 
>> 
>>> No problem -
>>> 
>>> These are model name : Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz chips.
>>> 2 per node, 8 cores each. No threading enabled.
>>> 
>>> $ lstopo
>>> Machine (64GB)
>>> NUMANode L#0 (P#0 32GB)
>>> Socket L#0 + L3 L#0 (20MB)
>>> L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0
>> (P#0)
>>> L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1
>> (P#1)
>>> L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2
>> (P#2)
>>> L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3
>> (P#3)
>>> L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core L#4 + PU L#4
>> (P#4)
>>> L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core L#5 + PU L#5
>> (P#5)
>>> L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core L#6 + PU L#6
>> (P#6)
>>> L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core L#7 + PU L#7
>> (P#7)
>>> HostBridge L#0
>>> PCIBridge
>>> PCI 1000:0087
>>> Block L#0 "sda"
>>> PCIBridge
>>> PCI 8086:2250
>>> PCIBridge
>>> PCI 8086:1521
>>> Net L#1 "eth0"
>>> PCI 8086:1521
>>> Net L#2 "eth1"
>>> PCIBridge
>>> PCI 102b:0533
>>> PCI 8086:1d02
>>> NUMANode L#1 (P#1 32GB)
>>> Socket L#1 + L3 L#1 (20MB)
>>> L2 L#8 (256KB) + L1d L#8 (32KB) + L1i L#8 (32KB) + Core L#8 + PU L#8
>> (P#8)
>>> L2 L#9 (256KB) + L1d L#9 (32KB) + L1i L#9 (32KB) + Core L#9 + PU L#9
>> (P#9)
>>> L2 L#10 (256KB) + L1d L#10 (32KB) + L1i L#10 (32KB) + Core L#10
>>> + PU L#10 (P#10)
>>> L2 L#11 (256KB) + L1d L#11 (32KB) + L1i L#11 (32KB) + Core L#11
>>> + PU L#11 (P#11)
>>> L2 L#12 (256KB) + L1d L#12 (32KB) + L1i L#12 (32KB) + Core L#12
>>> + PU L#12 (P#12)
>>> L2 L#13 (256KB) + L1d L#13 (32KB) + L1i L#13 (32KB) + Core L#13
>>> + PU L#13 (P#13)
>>> L2 L#14 (256KB) + L1d L#14 (32KB) + L1i L#14 (32KB) + Core L#14
>>> + PU L#14 (P#14)
>>> L2 L#15 (256KB) + L1d L#15 (32KB) + L1i L#15 (32KB) + Core L#15
>>> + PU L#15 (P#15)
>>> HostBridge L#5
>>> PCIBridge
>>> PCI 15b3:1011
>>> Net L#3 "ib0"
>>> OpenFabrics L#4 "mlx5_0"
>>> PCIBridge
>>> PCI 8086:2250
>>> 
>>> From the segfault below. I tried reproducing the crash on less than an
>>> 4 node allocation but wasn't able to.
>>> 
>>> ddietz@conte-a009:/scratch/conte/d/ddietz/hello$ mpirun -np 2
>>> -machinefile ./nodes -mca plm_base_verbose 10 ./hello
>>> [conte-a009.rcac.purdue.edu:55685] mca: base: components_register:
>>> registering plm components
>>> [conte-a009.rcac.purdue.edu:55685] mca: base: components_register:
>>> found loaded component isolated
>>> [conte-a009.rcac.purdue.edu:55685] mca: base: components_register:
>>> component isolated has no register or open function
>>> [conte-a009.rcac.purdue.edu:55685] mca: base: components_register:
>>> found loaded component slurm
>>> [conte-a009.rcac.purdue.edu:55685] mca: base: components_register:
>>> component slurm register function successful
>>> [conte-a009.rcac.purdue.edu:55685] mca: base: components_register:
>>> found loaded component rsh
>>> [conte-a009.rcac.purdue.edu:55685] mca: base: components_register:
>>> component rsh register function successful
>>> [conte-a009.rcac.purdue.edu:55685] mca: base: components_register:
>>> found loaded component tm
>>> [conte-a009.rcac.purdue.edu:55685] mca: base: components_register:
>>> component tm register function successful
>>> [conte-a009.rcac.purdue.edu:55685] mca: base: components_open: opening
>>> plm components
>>> [conte-a009.rcac.purdue.edu:55685] mca: base: components_open: found
>>> loaded component isolated
>>> [conte-a009.rcac.purdue.edu:55685] mca: base: components_open:
>>> component isolated open function successful
>>> 

Re: [hwloc-users] divide by zero error?

2014-06-08 Thread Brice Goglin
I added --disable-cpuid, will be in hwloc v1.10.
Brice



Le 06/05/2014 00:44, Friedley, Andrew a écrit :
> Actually, is there any way to make HWLOC_COMPONENTS=-x86 the default or 
> otherwise disable or compile without the x86 backend, so that I get that 
> behavior by default?
>
> Thanks,
>
> Andrew
>
>> -Original Message-
>> From: Brice Goglin [mailto:brice.gog...@inria.fr]
>> Sent: Monday, May 5, 2014 1:03 PM
>> To: Friedley, Andrew
>> Subject: Re: [hwloc-users] divide by zero error?
>>
>> Thanks.
>> The simulator returns buggy cpuid information. It may be possible to
>> workaround this specific problem, but I am afraid there could be others.
>> I think you should just disable the hwloc x86 backend by setting
>> HWLOC_COMPONENTS=-x86 in the environment. Does this look like an
>> acceptable work-around ?
>> Brice
>>
>>
>>
>> Le 05/05/2014 20:21, Friedley, Andrew a écrit :
>>> Back from vacation -- Is this what you're after?
>>>
>>> [root@viper0 bin]# ./lstopo
>>>
>>>
>>>  * Topology extraction from /proc/cpuinfo *
>>>
>>> processor 0
>>> found 1 cpu topologies, cpuset 0x0001 os socket 0 has cpuset
>>> 0x0001 os core 0 has cpuset 0x0001 thread 0 has cpuset
>>> 0x0001 cache depth 0 has cpuset 0x0001 cache depth 0 has
>>> cpuset 0x0001 cache depth 1 has cpuset 0x0001 cache depth 2
>>> has cpuset 0x0001 found DMIProductName 'Bochs'
>>> found DMIProductVersion ''
>>> found DMIProductSerial ''
>>> found DMIChassisVendor 'Bochs'
>>> found DMIChassisType '1'
>>> found DMIChassisVersion ''
>>> found DMIChassisSerial ''
>>> found DMIChassisAssetTag ''
>>> found DMIBIOSVendor 'Bochs'
>>> found DMIBIOSVersion 'Bochs'
>>> found DMIBIOSDate '01/01/2007'
>>> found DMISysVendor 'Bochs'
>>> Machine#0(local=2055580KB total=0KB DMIProductName=Bochs
>> DMIProductVersion= DMIProductSerial= DMIChassisVendor=Bochs
>> DMIChassisType=1 DMI) cpuset 0xf...f complete 0x0001 online 0xf...f
>> allowed 0xf...f nodeset 0x0 completeN 0x0 allowedN 0xf...f
>>>   Socket#0(CPUVendor=GenuineIntel CPUFamilyNumber=6
>> CPUModelNumber=26 CPUModel="Intel(R) Core(TM) i7 CPU  @
>> 2.00GHz") cpuset 0x0001
>>> L3Cache(size=8192KB linesize=64 ways=16) cpuset 0x0001
>>>   L2Cache(size=256KB linesize=64 ways=8) cpuset 0x0001
>>> L1dCache(size=32KB linesize=64 ways=8) cpuset 0x0001
>>>   L1iCache(size=32KB linesize=64 ways=4) cpuset 0x0001
>>> Core#0 cpuset 0x0001
>>>   PU#0 cpuset 0x0001
>>> Backend x86 forcing a reconnect of levels
>>> --- Socket level has number 1
>>>
>>> --- Cache level depth 3 has number 2
>>>
>>> --- Cache level depth 2 has number 3
>>>
>>> --- Cache level depth 1 has number 4
>>>
>>> --- Cache level depth 1 has number 5
>>>
>>> --- Core level has number 6
>>>
>>> --- PU level has number 7
>>>
>>> highest cpuid b, cpuid type 0
>>> highest extended cpuid 8008
>>> possible CPUs are 0x0001
>>> binding to CPU0
>>> APIC ID 0x00 max_log_proc 1
>>> phys 0 thread 0
>>> cache 0 type 1
>>> cache 1 type 2
>>> cache 2 type 3
>>> cache 3 type 3
>>> cache 4 type 0
>>> cache 0 type 1 L1 t2 c8 linesize 64 linepart 1 ways 8 sets 64, size
>>> 32KB thus 0 threads Floating point exception (core dumped)
>>>
 -Original Message-
 From: Brice Goglin [mailto:brice.gog...@inria.fr]
 Sent: Wednesday, April 30, 2014 2:30 AM
 To: Friedley, Andrew
 Subject: Re: [hwloc-users] divide by zero error?

 Thanks.
 The Linux backend works well so the bug is indeed in the x86 backend
>> only.
 Could you rebuild with --enable-debug and send the entire
 stdout+stderr output of lstopo ?

 Thanks
 Brice



 Le 29/04/2014 17:01, Friedley, Andrew a écrit :
> Attached, off list.
>
> Andrew
>
>> -Original Message-
>> From: hwloc-users [mailto:hwloc-users-boun...@open-mpi.org] On
 Behalf
>> Of Brice Goglin
>> Sent: Monday, April 28, 2014 10:37 PM
>> To: hwloc-us...@open-mpi.org
>> Subject: Re: [hwloc-users] divide by zero error?
>>
>> Please run "hwloc-gather-topology simics" and send the resulting
>> simics.tar.bz2 that it will create. However, I assume that the
>> simulator returns buggy x86 cpuid information, so we'll see if we
>> want/can easily workaround the bug or just let simics developers fix it.
>> Brice
>>
>>
>> Le 29/04/2014 01:15, Friedley, Andrew a écrit :
>>> Hi,
>>>
>>> I ran into a problem when running OMPI v1.8.1 -- a divide by zero
>>> crash
>> deep in the hwloc code called by OMPI.  The system I'm running is a
>> simics
>> x86_64 emulator and RHEL 6.3.  I can reproduce the error running
>> lstopo from hwloc v1.9:
>>> [root@viper0 bin]# LD_LIBRARY_PATH=/root/hwloc/lib ./lstopo -v
>>> Floating point exception (core dumped)
>>>
>>>
>>> Hwloc v1.1rc6, already installed on the system, and a
>>> corresponding