Re: [hwloc-devel] hwloc on PPC64

2010-07-11 Thread Jirka Hladky
On Sunday, July 11, 2010 07:57:48 pm Brice Goglin wrote:
> Le 11/07/2010 19:48, Jirka Hladky a écrit :
> > Hi all,
> > 
> > I have run into two bugs on PPC64 on 2.6.32 kernel.
> > 
> > Version:
> > 
> > lt-lstopo 1.0.1
> > 
> > BUG #1: No Socket information in lstopo output:
> > 
> > ./lstopo
> > 
> > Machine (3654MB) + L2 #0 (4096KB)
> > 
> > L1 #0 (64KB) + Core #0
> > 
> > PU #0 (phys=0)
> > 
> > PU #1 (phys=1)
> > 
> > L1 #1 (64KB) + Core #1
> > 
> > PU #2 (phys=2)
> > 
> > PU #3 (phys=3)
> > 
> > Fixed in the latest version (tried hwloc-1.1a1r2301.tar.gz)
> >  > r.gz>
> 
> In 1.0.1, there's a patch that prevents us from showing invalid socket
> info on old kernels but it also prevents us from showing valid socket
> info on recent kernel. I reverted the commit in trunk (and in the
> upcoming 1.0.2).
Thanks for shading some light into it!

> 
> > BUG #2
> > 
> > On some PPC64, kernel 2.6.32 I have crash when running
> > 
> > $ lstopo a.txt
> > 
> > Segmentation fault (core dumped)
> > 
> > $ gdb /usr/local/bin/lstopo core.8771
> > 
> > Program terminated with signal 11, Segmentation fault.
> > 
> > #0 0x100060b4 in .merge ()
> > 
> > It appears only on some PPC64 boxes.
> > 
> > This issue is also gone with in the latest version (tried
> > hwloc-1.1a1r2301.tar.gz)
> >  > r.gz>
> > 
> > I wonder if you are aware of these problems. let me know if you need
> > more details.
> 
> If you do "lstopo a.xml" first, does "lstopo --xml a.xml a.txt" crash as
> above? If so, please send a.xml so that I debug this.

$./lstopo --version
lt-lstopo 1.0.1

$./lstopo --xml /tmp/2010-Jul-10_22h14m_results/2.6.32-44.el6.ppc64_OS-
indexing.xml a.txt
Segmentation fault (core dumped)

xml was generated with
lstopo --physical a.xml

Output of command: "lstopo --physical -"
Machine (4096MB)
  NUMANode p#0 (2240MB)
L1 (64KB) + Core p#0
  PU p#0
  PU p#1
L1 (64KB) + Core p#2
  PU p#2
  PU p#3
L1 (64KB) + Core p#4
  PU p#4
  PU p#5
L1 (64KB) + Core p#6
  PU p#6
  PU p#7
  NUMANode p#1 (1856MB)

Note missing socket.

I will attach:
-xml causing crash (2.6.32-44.el6.ppc64_OS-indexing.xml)
-whole run directory (notice that png, pdf, ... are created (no crash) but are 
empty. Others format are OK (check .fig) ) Please notice that hwloc-distrib is 
also not working correctly - check CPU_AFFINITY/0008.log for example.
-runtest.sh - script used to create the data. 

Let me know if you need more data.

Thanks!
Jirka


2.6.32-44.el6.ppc64_OS-indexing.xml
Description: XML document


2010-Jul-10_22h14m_hwloc-results.tar.gz
Description: application/compressed-tar


runtest.sh
Description: application/shellscript


Re: [hwloc-devel] hwloc on PPC64

2010-07-11 Thread Brice Goglin
Le 11/07/2010 19:48, Jirka Hladky a écrit :
>
> Hi all,
>
> I have run into two bugs on PPC64 on 2.6.32 kernel.
>
> Version:
>
> lt-lstopo 1.0.1
>
> BUG #1: No Socket information in lstopo output:
>
> ./lstopo
>
> Machine (3654MB) + L2 #0 (4096KB)
>
> L1 #0 (64KB) + Core #0
>
> PU #0 (phys=0)
>
> PU #1 (phys=1)
>
> L1 #1 (64KB) + Core #1
>
> PU #2 (phys=2)
>
> PU #3 (phys=3)
>
> Fixed in the latest version (tried hwloc-1.1a1r2301.tar.gz)
> 
>

In 1.0.1, there's a patch that prevents us from showing invalid socket
info on old kernels but it also prevents us from showing valid socket
info on recent kernel. I reverted the commit in trunk (and in the
upcoming 1.0.2).

> BUG #2
>
> On some PPC64, kernel 2.6.32 I have crash when running
>
> $ lstopo a.txt
>
> Segmentation fault (core dumped)
>
> $ gdb /usr/local/bin/lstopo core.8771
>
> Program terminated with signal 11, Segmentation fault.
>
> #0 0x100060b4 in .merge ()
>
> It appears only on some PPC64 boxes.
>
> This issue is also gone with in the latest version (tried
> hwloc-1.1a1r2301.tar.gz)
> 
>
> I wonder if you are aware of these problems. let me know if you need
> more details.
>
>

If you do "lstopo a.xml" first, does "lstopo --xml a.xml a.txt" crash as
above? If so, please send a.xml so that I debug this.

thanks,
Brice



[hwloc-devel] hwloc on PPC64

2010-07-11 Thread Jirka Hladky
Hi all,

I have run into two bugs on PPC64 on 2.6.32 kernel.

Version:
lt-lstopo 1.0.1

BUG #1: No Socket information in lstopo output:

./lstopo
Machine (3654MB) + L2 #0 (4096KB)
  L1 #0 (64KB) + Core #0
PU #0 (phys=0)
PU #1 (phys=1)
  L1 #1 (64KB) + Core #1
PU #2 (phys=2)
PU #3 (phys=3)

Fixed in the latest version (tried hwloc-1.1a1r2301.tar.gz)
lt-lstopo 1.1a1
./lstopo
Machine (3654MB) + Socket #0 + L2 #0 (4096KB)
  L1 #0 (64KB) + Core #0
PU #0 (phys=0)
PU #1 (phys=1)
  L1 #1 (64KB) + Core #1
PU #2 (phys=2)
PU #3 (phys=3)

I have attached /proc/cpuinfo ("bug_1-ppc64-cpuinfo")

BUG #2
On some PPC64, kernel 2.6.32 I have crash when running 
$ lstopo a.txt
Segmentation fault (core dumped)
$ gdb /usr/local/bin/lstopo core.8771
Program terminated with signal 11, Segmentation fault.
#0  0x100060b4 in .merge ()

It appears only on some PPC64 boxes.

This issue is also gone with in the latest version (tried 
hwloc-1.1a1r2301.tar.gz)

I wonder if you are aware of these problems. let me know if you need more 
details.

Thanks
Jirka

processor   : 0
cpu : POWER6 (raw), altivec supported
clock   : 3826.00MHz
revision: 3.1 (pvr 003e 0301)

processor   : 1
cpu : POWER6 (raw), altivec supported
clock   : 3826.00MHz
revision: 3.1 (pvr 003e 0301)

processor   : 2
cpu : POWER6 (raw), altivec supported
clock   : 3826.00MHz
revision: 3.1 (pvr 003e 0301)

processor   : 3
cpu : POWER6 (raw), altivec supported
clock   : 3826.00MHz
revision: 3.1 (pvr 003e 0301)

timebase: 51200
platform: pSeries
model   : IBM,7998-60X
machine : CHRP IBM,7998-60X
processor   : 0
cpu : POWER6 (architected), altivec supported
clock   : 4005.00MHz
revision: 3.1 (pvr 003e 0301)

processor   : 1
cpu : POWER6 (architected), altivec supported
clock   : 4005.00MHz
revision: 3.1 (pvr 003e 0301)

processor   : 2
cpu : POWER6 (architected), altivec supported
clock   : 4005.00MHz
revision: 3.1 (pvr 003e 0301)

processor   : 3
cpu : POWER6 (architected), altivec supported
clock   : 4005.00MHz
revision: 3.1 (pvr 003e 0301)

processor   : 4
cpu : POWER6 (architected), altivec supported
clock   : 4005.00MHz
revision: 3.1 (pvr 003e 0301)

processor   : 5
cpu : POWER6 (architected), altivec supported
clock   : 4005.00MHz
revision: 3.1 (pvr 003e 0301)

processor   : 6
cpu : POWER6 (architected), altivec supported
clock   : 4005.00MHz
revision: 3.1 (pvr 003e 0301)

processor   : 7
cpu : POWER6 (architected), altivec supported
clock   : 4005.00MHz
revision: 3.1 (pvr 003e 0301)

timebase: 51200
platform: pSeries
model   : IBM,7998-61X
machine : CHRP IBM,7998-61X