Hello

Both 1.10.1 and 1.11.10 are veeeery old. Any chance you try at least
1.11.13 or even 2.x on these machines? I can't remember all what we
changed in this code 5 years later unfortunately.

We are not aware of any issue of Intel haswell but it's not impossible
something is buggy in the hardware causes unexpected CPUID outputs.

If it fails with a recent hwloc, please run hwloc-gather-cpuid
(available in hwloc 2.x) and send me the resulting cpuid/ directory so
that I can debug from here.

Please report the result on hwloc-us...@lists.open-mpi.org (this list is
for OpenMPI users).

Thanks

Brice




Le 29/04/2020 à 17:07, Rob Scott (roscott2) via users a écrit :
>
> We are seeing a kernel trap in Hwloc being reported from a few customers.
>
>  
>
> In one particular case, here are details.
>
>  
>
> hwloc-1.10.1
>
> Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz
>
>  
>
> The offending code is is in look_proc() due to cupid function 0x1
> returning 4 logical processors or possibly hwloc_flsl() macro is wrong. 
>
> static void look_proc(..)
> {
> ...
> infos->max_log_proc = 1 << hwloc_flsl(((ebx
> >> 16) & 0xff) - 1);                   // max_log_proc == 4 
> ...
> infos->max_nbthreads = infos->max_log_proc / infos->max_nbcores;     
> // max_nbthreads == 0
> infos->threadid = infos->logprocid %
> infos->max_nbthreads;                        // divide by zero
> ...
> }
>
>  
>
> It appears hwloc-1.11.10 worked around the cupid issue for AMD
> processors. We are currently using this version in current products.
>
>  
>
> But is there any hwloc version where this is fixed for Intel
> processors as well?
>
>  
>
> Thanks,
>
> Robert Scott
>
>  
>
>  
>

Reply via email to