Public bug reported:
I have a C++ program developed as an exercise in tuning to solve Sudoku
puzzles as fast as possible by reading them from a file and bit
twiddling. After upgrading to 9.10 from jaunty and recompiling, the
program that previously solved over 2000 puzzles per second dropped to
about 180 puzzles per second, and CPU increased from about 75% to 100%.
callgrind reveals that the degradation occurs in a function that returns
the number of bits set in an integer. Before the upgrade, this was most
effectively done with a table lookup, e.g:
unsint puzzle::noOfDigits( const cell_t n )
{
// unsint is unsigned int ; cell_t is unsigned short.
const unsigned int BitsSetTable[] = {
0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4,
1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8,
1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8,
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8,
4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8,
5, 6, 6, 7, 6, 7, 7, 8, 6, 7, 7, 8, 7, 8, 8, 9,
1, 2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8,
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8,
4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8,
5, 6, 6, 7, 6, 7, 7, 8, 6, 7, 7, 8, 7, 8, 8, 9,
2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6,
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8,
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8,
4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8,
5, 6, 6, 7, 6, 7, 7, 8, 6, 7, 7, 8, 7, 8, 8, 9,
3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6, 6, 7,
4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8,
4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8,
5, 6, 6, 7, 6, 7, 7, 8, 6, 7, 7, 8, 7, 8, 8, 9,
4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8,
5, 6, 6, 7, 6, 7, 7, 8, 6, 7, 7, 8, 7, 8, 8, 9,
5, 6, 6, 7, 6, 7, 7, 8, 6, 7, 7, 8, 7, 8, 8, 9,
6, 7, 7, 8, 7, 8, 8, 9, 7, 8, 8, 9, 8, 9, 9, 10
};
return BitsSetTable[n];
}
On Jaunty the table lookup version of my program ran significantly
faster than the simple algorithm below (2200 vs 1500 puzzles per
second).
Recompiled n 9.10 the algorithm version still runs at 1500 puzzles per
second, but the lookup version works hard to achieve 180.
unsint puzzle::noOfDigits( cell_t cell )
{
/*
** return the number of digits stored in a cell_t, ie the number of set
bits
** eg. 101101101 ("134679") returns 6
*/
unsint i, n = 0;
for ( i=0; i<10; i++ )
{
n += cell & 0x01;
cell >>= 1;
}
return n;
}
1) Description: Ubuntu 9.10
Release: 9.10
2) gcc:
Installed: 4:4.4.1-1ubuntu2
Candidate: 4:4.4.1-1ubuntu2
Version table:
*** 4:4.4.1-1ubuntu2 0
500 http://gb.archive.ubuntu.com karmic/main Packages
100 /var/lib/dpkg/status
3) Similar performance
4) Severe performance loss.
Fujitsu Siemens Amilo Pi1505 Laptop, 1Gb RAM, Centrino duo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 14
model name : Intel(R) Core(TM) Duo CPU T2250 @ 1.73GHz
stepping : 12
cpu MHz : 800.000
cache size : 2048 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov
clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx constant_tsc arch_perfmon
bts pni monitor est tm2 xtpr pdcm
bogomips : 3466.39
clflush size : 64
power management:
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 14
model name : Intel(R) Core(TM) Duo CPU T2250 @ 1.73GHz
stepping : 12
cpu MHz : 800.000
cache size : 2048 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
apicid : 1
initial apicid : 1
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov
clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx constant_tsc arch_perfmon
bts pni monitor est tm2 xtpr pdcm
bogomips : 3466.83
clflush size : 64
power management:
Experience with upgrading and using 9.10 otherwise very good with only a
few minor application bugs that are already recognised. Apologies if
this problem is gcc 4.4.1 rather than Ubuntu!
Thanks,
Dave
ProblemType: Crash
Architecture: i386
CrashCounter: 1
Date: Sun Nov 1 16:32:56 2009
DistroRelease: Ubuntu 9.10
ExecutablePath: /usr/bin/rhythmbox
Package: rhythmbox 0.12.5-0ubuntu4
ProcCmdline: rhythmbox
ProcEnviron:
PATH=(custom, user)
LANG=en_GB.UTF-8
SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.31-14.48-generic
SegvAnalysis:
Segfault happened at: 0x5b2b397 <LIBMTP_Get_Filemetadata+23>: mov
0x8(%eax),%edx
PC (0x05b2b397) ok
source "0x8(%eax)" (0x00000008) not located in a known VMA region (needed
readable region)!
destination "%edx" ok
SegvReason: reading NULL VMA
Signal: 11
SourcePackage: rhythmbox
StacktraceTop:
LIBMTP_Get_Filemetadata () from /usr/lib/libmtp.so.8
?? ()
gst_element_change_state ()
?? () from /usr/lib/libgstreamer-0.10.so.0
gst_element_set_state ()
Title: rhythmbox crashed with SIGSEGV in LIBMTP_Get_Filemetadata()
Uname: Linux 2.6.31-14-generic i686
UserGroups: adm admin cdrom dialout lpadmin mandriva plugdev sambashare
** Affects: ubuntu
Importance: Undecided
Status: New
** Tags: apport-crash i386
--
gcc - example severe performance regression after upgradingf
https://bugs.launchpad.net/bugs/474825
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs