Bug#725013: [Support] [Debichem-devel] Bug#725013: gromacs-openmpi: grompp crashes with invalid opcode

2013-10-02 Thread Vassilis Virvilis

On 10/01/2013 09:39 PM, Nicholas Breen wrote:

Could you please check if the i5 machines where it works include avx in the
flags line of /proc/cpuinfo?



yep it has. This is my i5. It is indeed newer than sandy bridge.

bill@beyonder:~$ grep avx /proc/cpuinfo
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe 
syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl 
xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor 
ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic 
popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm arat epb 
xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase 
smep erms
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe 
syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl 
xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor 
ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic 
popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm arat epb 
xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase 
smep erms
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe 
syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl 
xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor 
ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic 
popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm arat epb 
xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase 
smep erms
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe 
syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl 
xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor 
ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic 
popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm arat epb 
xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase 
smep erms




I built 4.6.3-4 on a different machine than usual, and I think it accidentally
picked up a CPU optimization it should not have had.  AVX extensions were only
added on the Sandy Bridge and newer model Intel CPUs, and the Xeon you provided
the information for doesn't have it.  If that's the case, I will ask for the
package to be rebuilt on a different machine where that problem won't occur.



One question though. When I build with dpkg-buildpackage it was 
crashing. Shouldn't pick the correct flags when I created the packages?


I just retested and I know why. I had the package shared libraries 
installed. So when I tried to run build/src/kernel/grompp it crashed 
because it was using the system's libraries and not the compiled ones.


Ok. So I recompiled and installed my debs and it is working now.

Waiting for your update

--

__

Vassilis Virvilis Ph.D.
Head of IT
Biovista Inc.

US Offices
2421 Ivy Road
Charlottesville, VA 22903
USA
T: +1.434.971.1141
F: +1.434.971.1144

European Offices
34 Rodopoleos Street
Ellinikon, Athens 16777
GREECE
T: +30.210.9629848
F: +30.210.9647606

www.biovista.com

Biovista is a privately held biotechnology company that finds novel uses 
for existing drugs, and profiles their side effects using their 
mechanism of action. Biovista develops its own pipeline of drugs in CNS, 
oncology, auto-immune and rare diseases. Biovista is collaborating with 
biopharmaceutical companies on indication expansion and de-risking of 
their portfolios and with the FDA on adverse event prediction.



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#725013: [Support] [Debichem-devel] Bug#725013: gromacs-openmpi: grompp crashes with invalid opcode

2013-10-01 Thread Vassilis Virvilis

On 09/30/2013 07:27 PM, Nicholas Breen wrote:

reassign 725013 gromacs
tags 725013 moreinfo
thanks


On Mon, Sep 30, 2013 at 04:39:48PM +0300, Vassilis Virvilis wrote:

Trying to run grompp grompp_d

* What exactly did you do (or not do) that was effective (or
  ineffective)?

It crashes

dmesg output:
[ 1699.966132] traps: grompp_d[9667] trap invalid opcode ip:7fb9311ac95d 
sp:77700ee8 error:0 in libgmx_d.so.8[7fb9310d+4e9000]
[ 1728.255893] traps: grompp[9684] trap invalid opcode ip:7f6807c2c65d 
sp:7fff560ed648 error:0 in libgmx.so.8[7f6807b51000+51b000]


I can't reproduce this crash with my test data, and my system runs a similar
Intel CPU (i5-2x00 series).  Could you please attach a file that it crashes on
(or a pdb2gmx/genbox/etc. sequence that creates one) and the exact command line
that causes it to fail?




There is no need to have any test data. It crashes just by running it 
and before printing the help. Here let me re iterate because I have done 
some steps to pinpoint the bug and now that I am reading my bug reports 
I can see I wasn't clear enough.


The story so far:

1) apt-get update; apt-get dist-upgrade (30/9/2013)

2) reboot (since we have now a new kernel)

3) Let's run staff
bill@odin:~$ grompp_d
 :-)  G  R  O  M  A  C  S  (-:

Illegal instruction
bill@odin:~$ grompp
 :-)  G  R  O  M  A  C  S  (-:

Illegal instruction

Here is the dmesg

[ 1699.966132] traps: grompp_d[9667] trap invalid opcode ip:7fb9311ac95d 
sp:77700ee8 error:0 in libgmx_d.so.8[7fb9310d+4e9000]
[ 1728.255893] traps: grompp[9684] trap invalid opcode ip:7f6807c2c65d 
sp:7fff560ed648 error:0 in libgmx.so.8[7f6807b51000+51b000]


4) ok let's see the debugger

bill@odin:~$ gdb grompp
GNU gdb (GDB) 7.6 (Debian 7.6-5)
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
http://gnu.org/licenses/gpl.html

This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type show copying
and show warranty for details.
This GDB was configured as x86_64-linux-gnu.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/...
Reading symbols from /usr/bin/grompp...(no debugging symbols found)...done.
(gdb) run
Starting program: /usr/bin/grompp
warning: Could not load shared library symbols for linux-vdso.so.1.
Do you need set solib-search-path or set sysroot?
[Thread debugging using libthread_db enabled]
Using host libthread_db library /lib/x86_64-linux-gnu/libthread_db.so.1.
 :-)  G  R  O  M  A  C  S  (-:


Program received signal SIGILL, Illegal instruction.
0x76efe65d in rando () from /usr/lib/libgmx.so.8
(gdb) bt
#0  0x76efe65d in rando () from /usr/lib/libgmx.so.8
#1  0x76f6a14f in bromacs () from /usr/lib/libgmx.so.8
#2  0x76f6ad0c in CopyRight () from /usr/lib/libgmx.so.8
#3  0xb3ab in cmain ()
#4  0x7657e995 in __libc_start_main (main=0x6f50 main, 
argc=1, ubp_av=0x7fffe1d8, init=optimized out, fini=optimized 
out, rtld_fini=optimized out, stack_end=0x7fffe1c8) at 
libc-start.c:260

#5  0x6f7e in _start ()
(gdb)


5) Does it happen if we build it ourselves. At least we could get line 
information in the backtrace


  $ apt-get source gromacs-openmpi
  $ sudo apt-get build-dep gromacs-openmpi
  $ cd gromacs-4.6.3/
  $ cmake .
  $ make
  $ find -name grompp
  $./src/kernel/grompp   - It works (prints the help.) No crash.


6) ok. Let's a build a debian package

  $ apt-get source gromacs-openmpi
  $ sudo apt-get build-dep gromacs-openmpi
  $ cd gromacs-4.6.3/
  $ dpkg-buildpackage
  $ cd ..
  $ dpkg -i ../gromacs_4.6.3-4_amd64.deb
  $ grompp  --- It crashses the same way as the original package.

7) Now I am installing in i5
  It works in my i5. Looks like the problem is only in i7. I have 
tested in the two machines of the cluster. These are xeons that they 
have the problem. Here is an excerpt from /proc/cpuinfo


processor   : 23
vendor_id   : GenuineIntel
cpu family  : 6
model   : 44
model name  : Intel(R) Xeon(R) CPU   X5660  @ 2.80GHz
stepping: 2
microcode   : 0x15
cpu MHz : 1600.000
cache size  : 12288 KB
physical id : 1
siblings: 12
core id : 10
cpu cores   : 6
apicid  : 53
initial apicid  : 53
fpu : yes
fpu_exception   : yes
cpuid level : 11
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe 
syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good 
nopl xtopology nonstop_tsc aperfmperf pni dtes64 monitor ds_cpl vmx smx 
est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 popcnt lahf_lm ida 
arat epb dtherm tpr_shadow vnmi flexpriority ept vpid

bogomips: 5600.18

Bug#725013: [Support] [Debichem-devel] Bug#725013: gromacs-openmpi: grompp crashes with invalid opcode

2013-10-01 Thread Nicholas Breen
Thank you, I think that information will lead to a solution.  One last
question:

On Tue, Oct 01, 2013 at 11:00:26AM +0300, Vassilis Virvilis wrote:
 7) Now I am installing in i5
   It works in my i5. Looks like the problem is only in i7. I have
 tested in the two machines of the cluster. These are xeons that they
 have the problem. Here is an excerpt from /proc/cpuinfo

Could you please check if the i5 machines where it works include avx in the
flags line of /proc/cpuinfo?

I built 4.6.3-4 on a different machine than usual, and I think it accidentally
picked up a CPU optimization it should not have had.  AVX extensions were only
added on the Sandy Bridge and newer model Intel CPUs, and the Xeon you provided
the information for doesn't have it.  If that's the case, I will ask for the
package to be rebuilt on a different machine where that problem won't occur.


-- 
Nicholas Breen
nbr...@debian.org


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org