Re: [CURRENT]: weird memory/linker problem?
Am Tue, 01 Jul 2014 17:23:14 +0200 Willem Jan Withagen w...@digiware.nl schrieb: On 2014-07-01 16:48, Rang, Anton wrote: DOT = DOD 444F54 = 444F44 That's a single-bit flip. Bad memory, perhaps? Very likely, especially if the system does not have ECC It just happens on rare occasions that a alpha particle, power cycle, or any things else disruptive damages a memory cell. And it could be that it requires a special pattern of accesses to actually exhibit the error. In the past (199x's) 'make buildworld' used to be a rather good memory tester. But nowadays look at http://www.memtest.org/ This tool has found all of the bad memory in all the systems I used and or build for others... Note that it might take a few runs and some more heat to actually trigger the faulty cell, but memtest86 will usually find it. Note that on big systems with lots of memory it can take a loong time to run just one full testset to completion. --WjW Anton -Original Message- From: owner-freebsd-curr...@freebsd.org [mailto:owner-freebsd-curr...@freebsd.org] On Behalf Of O. Hartmann Sent: Tuesday, July 01, 2014 8:08 AM To: Dimitry Andric Cc: Adrian Chadd; FreeBSD CURRENT Subject: Re: [CURRENT]: weird memory/linker problem? Am Mon, 23 Jun 2014 17:22:25 +0200 Dimitry Andric d...@freebsd.org schrieb: On 23 Jun 2014, at 16:31, O. Hartmann ohart...@zedat.fu-berlin.de wrote: Am Sun, 22 Jun 2014 10:10:04 -0700 Adrian Chadd adr...@freebsd.org schrieb: When they segfault, where do they segfault? ... GIMP, LaTeX work, nothing special, but a bit memory consuming regrading GIMP) I tried updating the ports tree and surprisingly the tree is left over in a unclean condition while /usr/bin/svn segfault (on console: pid 18013 (svn), uid 0: exited on signal 11 (core dumped)). Using /usr/local/bin/svn, which is from the devel/subversion port, performs well, while FreeBSD 11's svn contribution dies as described. It did not hours ago! I think what Adrian meant was: can you run svn (or another crashing program) in gdb, and post a backtrace? Or maybe run ktrace, and see where it dies? Alternatively, put a core dump and the executable (with debug info) in a tarball, and upload it somewhere, so somebody else can analyze it. -Dimitry It's me again, with the same weird story. After a couple of days silence, the mysterious entity in my computer is back. This time it is again a weird compiler message of failure (trying to buildworld): [...] c++ -O2 -pipe -O3 -O3 c++ -I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/include -I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/tools/clang/include -I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/lib/Support -I. -I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/../../lib/clang/include -DLLVM_ON_UNIX -DLLVM_ON_FREEBSD -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS -fno-strict-aliasing -DLLVM_DEFAULT_TARGET_TRIPLE=\x86_64-unknown-freebsd11.0\ -DLLVM_HOST_TRIPLE=\x86_64-unknown-freebsd11.0\ -DDEFAULT_SYSROOT=\\ -Qunused-arguments -I/usr/obj/usr/src/tmp/legacy/usr/include -std=c++11 -fno-exceptions -fno-rtti -Wno-c++11-extensions -c /usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/lib/Support/Host.cpp -o Host.o --- GraphWriter.o --- In file included from /usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/lib/Support/GraphWriter.cpp:14: /usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/include/llvm/Support/GraphWriter.h:269:10: error: use of undeclared identifier 'DOD'; did you mean 'DOT'? O DOD::EscapeString(Label); ^~~ DOT /usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/include/llvm/Support/GraphWriter.h:35:11: note: 'DOT' declared here namespace DOT { // Private functions... ^ 1 error generated. *** [GraphWriter.o] Error code 1 Well, in the past I saw many of those messages, especially not found labels of routines in shared objects/libraries or even those funny misspelled messages shown above. I can not reproduce them after a reboot, but as long as the system is running with this error occured, it is sticky. So in order to compile the OS successfully, I reboot. Does anyone have an idea what this could be? Since it affects at the moment only one machine (the other CoreDuo has been retired in the meanwhile), it feels a bit like a miscompilation on a certain type of CPU. Thanks for your patience, Oliver Hello all. Well, I'd like to update some informations. It doesn't relief the special concern, but might be a kind of replenishment of experience. The box in question is now with only 4GB - and is oprable as expected. With 8 GB, I see those reported weird bugs and they revealed themselfes as indeed bit flips. I can not reproduce them, the occur spontanously, but I can raise the frequency
Re: [CURRENT]: weird memory/linker problem?
Am Mon, 23 Jun 2014 17:22:25 +0200 Dimitry Andric d...@freebsd.org schrieb: On 23 Jun 2014, at 16:31, O. Hartmann ohart...@zedat.fu-berlin.de wrote: Am Sun, 22 Jun 2014 10:10:04 -0700 Adrian Chadd adr...@freebsd.org schrieb: When they segfault, where do they segfault? ... GIMP, LaTeX work, nothing special, but a bit memory consuming regrading GIMP) I tried updating the ports tree and surprisingly the tree is left over in a unclean condition while /usr/bin/svn segfault (on console: pid 18013 (svn), uid 0: exited on signal 11 (core dumped)). Using /usr/local/bin/svn, which is from the devel/subversion port, performs well, while FreeBSD 11's svn contribution dies as described. It did not hours ago! I think what Adrian meant was: can you run svn (or another crashing program) in gdb, and post a backtrace? Or maybe run ktrace, and see where it dies? Alternatively, put a core dump and the executable (with debug info) in a tarball, and upload it somewhere, so somebody else can analyze it. -Dimitry It's me again, with the same weird story. After a couple of days silence, the mysterious entity in my computer is back. This time it is again a weird compiler message of failure (trying to buildworld): [...] c++ -O2 -pipe -O3 -O3 -I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/include -I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/tools/clang/include -I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/lib/Support -I. -I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/../../lib/clang/include -DLLVM_ON_UNIX -DLLVM_ON_FREEBSD -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS -fno-strict-aliasing -DLLVM_DEFAULT_TARGET_TRIPLE=\x86_64-unknown-freebsd11.0\ -DLLVM_HOST_TRIPLE=\x86_64-unknown-freebsd11.0\ -DDEFAULT_SYSROOT=\\ -Qunused-arguments -I/usr/obj/usr/src/tmp/legacy/usr/include -std=c++11 -fno-exceptions -fno-rtti -Wno-c++11-extensions -c /usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/lib/Support/Host.cpp -o Host.o --- GraphWriter.o --- In file included from /usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/lib/Support/GraphWriter.cpp:14: /usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/include/llvm/Support/GraphWriter.h:269:10: error: use of undeclared identifier 'DOD'; did you mean 'DOT'? O DOD::EscapeString(Label); ^~~ DOT /usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/include/llvm/Support/GraphWriter.h:35:11: note: 'DOT' declared here namespace DOT { // Private functions... ^ 1 error generated. *** [GraphWriter.o] Error code 1 Well, in the past I saw many of those messages, especially not found labels of routines in shared objects/libraries or even those funny misspelled messages shown above. I can not reproduce them after a reboot, but as long as the system is running with this error occured, it is sticky. So in order to compile the OS successfully, I reboot. Does anyone have an idea what this could be? Since it affects at the moment only one machine (the other CoreDuo has been retired in the meanwhile), it feels a bit like a miscompilation on a certain type of CPU. Thanks for your patience, Oliver signature.asc Description: PGP signature
RE: [CURRENT]: weird memory/linker problem?
DOT = DOD 444F54 = 444F44 That's a single-bit flip. Bad memory, perhaps? Anton -Original Message- From: owner-freebsd-curr...@freebsd.org [mailto:owner-freebsd-curr...@freebsd.org] On Behalf Of O. Hartmann Sent: Tuesday, July 01, 2014 8:08 AM To: Dimitry Andric Cc: Adrian Chadd; FreeBSD CURRENT Subject: Re: [CURRENT]: weird memory/linker problem? Am Mon, 23 Jun 2014 17:22:25 +0200 Dimitry Andric d...@freebsd.org schrieb: On 23 Jun 2014, at 16:31, O. Hartmann ohart...@zedat.fu-berlin.de wrote: Am Sun, 22 Jun 2014 10:10:04 -0700 Adrian Chadd adr...@freebsd.org schrieb: When they segfault, where do they segfault? ... GIMP, LaTeX work, nothing special, but a bit memory consuming regrading GIMP) I tried updating the ports tree and surprisingly the tree is left over in a unclean condition while /usr/bin/svn segfault (on console: pid 18013 (svn), uid 0: exited on signal 11 (core dumped)). Using /usr/local/bin/svn, which is from the devel/subversion port, performs well, while FreeBSD 11's svn contribution dies as described. It did not hours ago! I think what Adrian meant was: can you run svn (or another crashing program) in gdb, and post a backtrace? Or maybe run ktrace, and see where it dies? Alternatively, put a core dump and the executable (with debug info) in a tarball, and upload it somewhere, so somebody else can analyze it. -Dimitry It's me again, with the same weird story. After a couple of days silence, the mysterious entity in my computer is back. This time it is again a weird compiler message of failure (trying to buildworld): [...] c++ -O2 -pipe -O3 -O3 c++ -I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/include -I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/tools/clang/include -I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/lib/Support -I. -I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/../../lib/clang/include -DLLVM_ON_UNIX -DLLVM_ON_FREEBSD -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS -fno-strict-aliasing -DLLVM_DEFAULT_TARGET_TRIPLE=\x86_64-unknown-freebsd11.0\ -DLLVM_HOST_TRIPLE=\x86_64-unknown-freebsd11.0\ -DDEFAULT_SYSROOT=\\ -Qunused-arguments -I/usr/obj/usr/src/tmp/legacy/usr/include -std=c++11 -fno-exceptions -fno-rtti -Wno-c++11-extensions -c /usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/lib/Support/Host.cpp -o Host.o --- GraphWriter.o --- In file included from /usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/lib/Support/GraphWriter.cpp:14: /usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/include/llvm/Support/GraphWriter.h:269:10: error: use of undeclared identifier 'DOD'; did you mean 'DOT'? O DOD::EscapeString(Label); ^~~ DOT /usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/include/llvm/Support/GraphWriter.h:35:11: note: 'DOT' declared here namespace DOT { // Private functions... ^ 1 error generated. *** [GraphWriter.o] Error code 1 Well, in the past I saw many of those messages, especially not found labels of routines in shared objects/libraries or even those funny misspelled messages shown above. I can not reproduce them after a reboot, but as long as the system is running with this error occured, it is sticky. So in order to compile the OS successfully, I reboot. Does anyone have an idea what this could be? Since it affects at the moment only one machine (the other CoreDuo has been retired in the meanwhile), it feels a bit like a miscompilation on a certain type of CPU. Thanks for your patience, Oliver ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: [CURRENT]: weird memory/linker problem?
On 2014-07-01 16:48, Rang, Anton wrote: DOT = DOD 444F54 = 444F44 That's a single-bit flip. Bad memory, perhaps? Very likely, especially if the system does not have ECC It just happens on rare occasions that a alpha particle, power cycle, or any things else disruptive damages a memory cell. And it could be that it requires a special pattern of accesses to actually exhibit the error. In the past (199x's) 'make buildworld' used to be a rather good memory tester. But nowadays look at http://www.memtest.org/ This tool has found all of the bad memory in all the systems I used and or build for others... Note that it might take a few runs and some more heat to actually trigger the faulty cell, but memtest86 will usually find it. Note that on big systems with lots of memory it can take a loong time to run just one full testset to completion. --WjW Anton -Original Message- From: owner-freebsd-curr...@freebsd.org [mailto:owner-freebsd-curr...@freebsd.org] On Behalf Of O. Hartmann Sent: Tuesday, July 01, 2014 8:08 AM To: Dimitry Andric Cc: Adrian Chadd; FreeBSD CURRENT Subject: Re: [CURRENT]: weird memory/linker problem? Am Mon, 23 Jun 2014 17:22:25 +0200 Dimitry Andric d...@freebsd.org schrieb: On 23 Jun 2014, at 16:31, O. Hartmann ohart...@zedat.fu-berlin.de wrote: Am Sun, 22 Jun 2014 10:10:04 -0700 Adrian Chadd adr...@freebsd.org schrieb: When they segfault, where do they segfault? ... GIMP, LaTeX work, nothing special, but a bit memory consuming regrading GIMP) I tried updating the ports tree and surprisingly the tree is left over in a unclean condition while /usr/bin/svn segfault (on console: pid 18013 (svn), uid 0: exited on signal 11 (core dumped)). Using /usr/local/bin/svn, which is from the devel/subversion port, performs well, while FreeBSD 11's svn contribution dies as described. It did not hours ago! I think what Adrian meant was: can you run svn (or another crashing program) in gdb, and post a backtrace? Or maybe run ktrace, and see where it dies? Alternatively, put a core dump and the executable (with debug info) in a tarball, and upload it somewhere, so somebody else can analyze it. -Dimitry It's me again, with the same weird story. After a couple of days silence, the mysterious entity in my computer is back. This time it is again a weird compiler message of failure (trying to buildworld): [...] c++ -O2 -pipe -O3 -O3 c++ -I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/include -I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/tools/clang/include -I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/lib/Support -I. -I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/../../lib/clang/include -DLLVM_ON_UNIX -DLLVM_ON_FREEBSD -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS -fno-strict-aliasing -DLLVM_DEFAULT_TARGET_TRIPLE=\x86_64-unknown-freebsd11.0\ -DLLVM_HOST_TRIPLE=\x86_64-unknown-freebsd11.0\ -DDEFAULT_SYSROOT=\\ -Qunused-arguments -I/usr/obj/usr/src/tmp/legacy/usr/include -std=c++11 -fno-exceptions -fno-rtti -Wno-c++11-extensions -c /usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/lib/Support/Host.cpp -o Host.o --- GraphWriter.o --- In file included from /usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/lib/Support/GraphWriter.cpp:14: /usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/include/llvm/Support/GraphWriter.h:269:10: error: use of undeclared identifier 'DOD'; did you mean 'DOT'? O DOD::EscapeString(Label); ^~~ DOT /usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/include/llvm/Support/GraphWriter.h:35:11: note: 'DOT' declared here namespace DOT { // Private functions... ^ 1 error generated. *** [GraphWriter.o] Error code 1 Well, in the past I saw many of those messages, especially not found labels of routines in shared objects/libraries or even those funny misspelled messages shown above. I can not reproduce them after a reboot, but as long as the system is running with this error occured, it is sticky. So in order to compile the OS successfully, I reboot. Does anyone have an idea what this could be? Since it affects at the moment only one machine (the other CoreDuo has been retired in the meanwhile), it feels a bit like a miscompilation on a certain type of CPU. Thanks for your patience, Oliver ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: [CURRENT]: weird memory/linker problem?
Am Tue, 01 Jul 2014 17:23:14 +0200 Willem Jan Withagen w...@digiware.nl schrieb: On 2014-07-01 16:48, Rang, Anton wrote: DOT = DOD 444F54 = 444F44 That's a single-bit flip. Bad memory, perhaps? Very likely, especially if the system does not have ECC It just happens on rare occasions that a alpha particle, power cycle, or any things else disruptive damages a memory cell. And it could be that it requires a special pattern of accesses to actually exhibit the error. In the past (199x's) 'make buildworld' used to be a rather good memory tester. But nowadays look at http://www.memtest.org/ This tool has found all of the bad memory in all the systems I used and or build for others... Note that it might take a few runs and some more heat to actually trigger the faulty cell, but memtest86 will usually find it. Note that on big systems with lots of memory it can take a loong time to run just one full testset to completion. --WjW I already testet via memtest86+ (had to download the linux image, the port on FreeBSD is broken on CURRENT). It didn't find anything strange so far. I will do another test. I realised, that on that that specific box, the chipset temperature is 81 Grad Celius. The chipset is a Eaglelake P45 - in which the memory controller resides on that old platform. dmidecode gives: Manufacturer: ASUSTeK Computer INC. Product Name: P5Q-WS Version: Rev 1.xx Anton -Original Message- From: owner-freebsd-curr...@freebsd.org [mailto:owner-freebsd-curr...@freebsd.org] On Behalf Of O. Hartmann Sent: Tuesday, July 01, 2014 8:08 AM To: Dimitry Andric Cc: Adrian Chadd; FreeBSD CURRENT Subject: Re: [CURRENT]: weird memory/linker problem? Am Mon, 23 Jun 2014 17:22:25 +0200 Dimitry Andric d...@freebsd.org schrieb: On 23 Jun 2014, at 16:31, O. Hartmann ohart...@zedat.fu-berlin.de wrote: Am Sun, 22 Jun 2014 10:10:04 -0700 Adrian Chadd adr...@freebsd.org schrieb: When they segfault, where do they segfault? ... GIMP, LaTeX work, nothing special, but a bit memory consuming regrading GIMP) I tried updating the ports tree and surprisingly the tree is left over in a unclean condition while /usr/bin/svn segfault (on console: pid 18013 (svn), uid 0: exited on signal 11 (core dumped)). Using /usr/local/bin/svn, which is from the devel/subversion port, performs well, while FreeBSD 11's svn contribution dies as described. It did not hours ago! I think what Adrian meant was: can you run svn (or another crashing program) in gdb, and post a backtrace? Or maybe run ktrace, and see where it dies? Alternatively, put a core dump and the executable (with debug info) in a tarball, and upload it somewhere, so somebody else can analyze it. -Dimitry It's me again, with the same weird story. After a couple of days silence, the mysterious entity in my computer is back. This time it is again a weird compiler message of failure (trying to buildworld): [...] c++ -O2 -pipe -O3 -O3 c++ -I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/include -I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/tools/clang/include -I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/lib/Support -I. -I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/../../lib/clang/include -DLLVM_ON_UNIX -DLLVM_ON_FREEBSD -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS -fno-strict-aliasing -DLLVM_DEFAULT_TARGET_TRIPLE=\x86_64-unknown-freebsd11.0\ -DLLVM_HOST_TRIPLE=\x86_64-unknown-freebsd11.0\ -DDEFAULT_SYSROOT=\\ -Qunused-arguments -I/usr/obj/usr/src/tmp/legacy/usr/include -std=c++11 -fno-exceptions -fno-rtti -Wno-c++11-extensions -c /usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/lib/Support/Host.cpp -o Host.o --- GraphWriter.o --- In file included from /usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/lib/Support/GraphWriter.cpp:14: /usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/include/llvm/Support/GraphWriter.h:269:10: error: use of undeclared identifier 'DOD'; did you mean 'DOT'? O DOD::EscapeString(Label); ^~~ DOT /usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/include/llvm/Support/GraphWriter.h:35:11: note: 'DOT' declared here namespace DOT { // Private functions... ^ 1 error generated. *** [GraphWriter.o] Error code 1 Well, in the past I saw many of those messages, especially not found labels of routines in shared objects/libraries or even those funny misspelled messages shown above. I can not reproduce them after a reboot, but as long as the system is running with this error occured, it is sticky. So in order to compile the OS successfully, I reboot. Does anyone have an idea what this could be? Since it affects at the moment only one machine (the other CoreDuo has been retired in the meanwhile), it feels a bit like
Re: [CURRENT]: weird memory/linker problem?
On 2014-07-01 17:33, O. Hartmann wrote: Am Tue, 01 Jul 2014 17:23:14 +0200 Willem Jan Withagen w...@digiware.nl schrieb: On 2014-07-01 16:48, Rang, Anton wrote: DOT = DOD 444F54 = 444F44 That's a single-bit flip. Bad memory, perhaps? Very likely, especially if the system does not have ECC It just happens on rare occasions that a alpha particle, power cycle, or any things else disruptive damages a memory cell. And it could be that it requires a special pattern of accesses to actually exhibit the error. In the past (199x's) 'make buildworld' used to be a rather good memory tester. But nowadays look at http://www.memtest.org/ This tool has found all of the bad memory in all the systems I used and or build for others... Note that it might take a few runs and some more heat to actually trigger the faulty cell, but memtest86 will usually find it. Note that on big systems with lots of memory it can take a loong time to run just one full testset to completion. --WjW I already testet via memtest86+ (had to download the linux image, the port on FreeBSD is broken on CURRENT). It didn't find anything strange so far. I will do another test. I realised, that on that that specific box, the chipset temperature is 81 Grad Celius. The chipset is a Eaglelake P45 - in which the memory controller resides on that old platform. dmidecode gives: Manufacturer: ASUSTeK Computer INC. Product Name: P5Q-WS Version: Rev 1.xx Hi Oliver, I've build several (5+) systems with these boards (from memory they date around 2009??). And if I recall right, one of them is still functional. The first one broke down in a couple of weeks, and the other did not survive time either. The auxiliary chips on that board do run hot, but I never realized this hot. Is 81C is the CPU temp from sysctl, or did you measure the cooling body on the motherboard. In the later case it is just too hot, probably. But even if it is the temp on the chip itself, I've rrarely seen temps go up this high. You can need to run the memtest86 for more than 6-10 complete runs with all the tests. If the memtests do not reveal anything broken, then you get into even more wizardry stuff, like bad power etc... Especially since it only occurs on occasion, it is going to be a nightmare to find the root cause of this. Other than replacing hardware piece by piece, which won't be easy given the age of the board and parts. You could go into the bios, and try to config ram access at a slower speed and see if the problem goes away. Then it could be that you are running an the edge of the spec with regards to ram timing. But like I said, it is all lots of funky details that can interact in strange and unexpected ways. --WjW ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: [CURRENT]: weird memory/linker problem?
Am Tue, 01 Jul 2014 17:57:26 +0200 Willem Jan Withagen w...@digiware.nl schrieb: On 2014-07-01 17:33, O. Hartmann wrote: Am Tue, 01 Jul 2014 17:23:14 +0200 Willem Jan Withagen w...@digiware.nl schrieb: On 2014-07-01 16:48, Rang, Anton wrote: DOT = DOD 444F54 = 444F44 That's a single-bit flip. Bad memory, perhaps? Very likely, especially if the system does not have ECC It just happens on rare occasions that a alpha particle, power cycle, or any things else disruptive damages a memory cell. And it could be that it requires a special pattern of accesses to actually exhibit the error. In the past (199x's) 'make buildworld' used to be a rather good memory tester. But nowadays look at http://www.memtest.org/ This tool has found all of the bad memory in all the systems I used and or build for others... Note that it might take a few runs and some more heat to actually trigger the faulty cell, but memtest86 will usually find it. Note that on big systems with lots of memory it can take a loong time to run just one full testset to completion. --WjW I already testet via memtest86+ (had to download the linux image, the port on FreeBSD is broken on CURRENT). It didn't find anything strange so far. I will do another test. I realised, that on that that specific box, the chipset temperature is 81 Grad Celius. The chipset is a Eaglelake P45 - in which the memory controller resides on that old platform. dmidecode gives: Manufacturer: ASUSTeK Computer INC. Product Name: P5Q-WS Version: Rev 1.xx Hello Willem, Hi Oliver, I've build several (5+) systems with these boards (from memory they date around 2009??). And if I recall right, one of them is still functional. The first one broke down in a couple of weeks, and the other did not survive time either. The auxiliary chips on that board do run hot, but I never realized this hot. Is 81C is the CPU temp from sysctl, or did you measure the cooling body on the motherboard. In the later case it is just too hot, probably. But even if it is the temp on the chip itself, I've rrarely seen temps go up this high. The temperature is seen in BIOS and by the usage of one of those health daemon, found in ports (forgot about the name). There is no sysctl MIB showing the chipset temperature on that board, as far as I know. You can need to run the memtest86 for more than 6-10 complete runs with all the tests. Last time I ran memtest86+ it took ~ 1 1/2 days to finish. If the memtests do not reveal anything broken, then you get into even more wizardry stuff, like bad power etc... Especially since it only occurs on occasion, it is going to be a nightmare to find the root cause of this. Other than replacing hardware piece by piece, which won't be easy given the age of the board and parts. You could go into the bios, and try to config ram access at a slower speed and see if the problem goes away. Then it could be that you are running an the edge of the spec with regards to ram timing. But like I said, it is all lots of funky details that can interact in strange and unexpected ways. --WjW I will check memory these days again. Regards, Oliver signature.asc Description: PGP signature
Re: [CURRENT]: weird memory/linker problem?
Am Mon, 23 Jun 2014 17:22:25 +0200 Dimitry Andric d...@freebsd.org schrieb: On 23 Jun 2014, at 16:31, O. Hartmann ohart...@zedat.fu-berlin.de wrote: Am Sun, 22 Jun 2014 10:10:04 -0700 Adrian Chadd adr...@freebsd.org schrieb: When they segfault, where do they segfault? ... GIMP, LaTeX work, nothing special, but a bit memory consuming regrading GIMP) I tried updating the ports tree and surprisingly the tree is left over in a unclean condition while /usr/bin/svn segfault (on console: pid 18013 (svn), uid 0: exited on signal 11 (core dumped)). Using /usr/local/bin/svn, which is from the devel/subversion port, performs well, while FreeBSD 11's svn contribution dies as described. It did not hours ago! I think what Adrian meant was: can you run svn (or another crashing program) in gdb, and post a backtrace? Or maybe run ktrace, and see where it dies? Alternatively, put a core dump and the executable (with debug info) in a tarball, and upload it somewhere, so somebody else can analyze it. -Dimitry Here I am again. So far, a report what I did. Regarding to the svn issue, I tried to recompile make -C usr.bin/svn clean depend obj all install with setting -O0 -g -DDEBUG in /etc/make.conf and /etc/src.conf (disabling all the -O flags I use usually). gdb complained about missing symbols. After the recompilation the onboard svn didn't crash anymore and the strange story seems to continue. Firefox, so far, also crashed yesterday - out of the blue - with a bus error (SIG 10). Rebooting solved the problem. I didn't recompile the system or any client with DEBUG flags set on so far. So, sorry, this issue is still open, but it is not even less weird. Next, today, I tried recompiling world. The build process fails on the box in question with my well known friend relocation truncated to fit: R_X86_64_PC32 against symbol error. See below. I'm about to reboot the box and restart building world without having prior to the build started any memory consuming applications. Since the problems seem to be randomly I ask myself whether this is somehow related to the ASLR stuff mentioned earlier in the list. I also will disable -O3 again with the next build to ensure that CLANG isn't miscompilating something. As mentioned in the list before, I tried to find some CPU-burning and memory eating applications/tests, but since math/mprime is i386 only and sysutils/cpuburn only covers ancient CPUs, I feel a bit lost in that task and leftover with memtest86 (which indicated earlier no memory problems with the box). And by the way, I face several serious issues with the I/O performance on CURRENT these days: it takes a long time until portmaster has stepped through the ports which are about to be updated when CLANG compiler is compiling world/kernel in the background. This phenomenon has grown worse since earlier this year (~ February). Source at revision 267867. FreeBSD 11.0-CURRENT #0 r267816: Tue Jun 24 14:02:22 CEST 2014 amd64. [...] c++ -O2 -pipe -O3 -O3 -I/usr/src/usr.bin/clang/tblgen/../../../contrib/llvm/include -I/usr/src/usr.bin/clang/tblgen/../../../contrib/llvm/tools/clang/include -I/usr/src/usr.bin/clang/tblgen/../../../contrib/llvm/utils/TableGen -I. -I/usr/src/usr.bin/clang/tblgen/../../../contrib/llvm/../../lib/clang/include -DLLVM_ON_UNIX -DLLVM_ON_FREEBSD -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS -fno-strict-aliasing -DLLVM_DEFAULT_TARGET_TRIPLE=\x86_64-unknown-freebsd11.0\ -DLLVM_HOST_TRIPLE=\x86_64-unknown-freebsd11.0\ -DDEFAULT_SYSROOT=\\ -Qunused-arguments -I/usr/obj/usr/src/tmp/legacy/usr/include -std=c++11 -fno-exceptions -fno-rtti -Wno-c++11-extensions -static -L/usr/obj/usr/src/tmp/legacy/usr/lib -o tblgen AsmMatcherEmitter.o AsmWriterEmitter.o AsmWriterInst.o CTagsEmitter.o CallingConvEmitter.o CodeEmitterGen.o CodeGenDAGPatterns.o CodeGenInstruction.o CodeGenMapTable.o CodeGenRegisters.o CodeGenSchedule.o CodeGenTarget.o DAGISelEmitter.o DAGISelMatcher.o DAGISelMatcherEmitter.o DAGISelMatcherGen.o DAGISelMatcherOpt.o DFAPacketizerEmitter.o DisassemblerEmitter.o FastISelEmitter.o FixedLenDecoderEmitter.o InstrInfoEmitter.o IntrinsicEmitter.o OptParserEmitter.o PseudoLoweringEmitter.o RegisterInfoEmitter.o SetTheory.o SubtargetEmitter.o TGValueTypes.o TableGen.o X86DisassemblerTables.o X86ModRMFilters.o X86RecognizableInstr.o /usr/obj/usr/src/tmp/usr/src/usr.bin/clang/tblgen/../../../lib/clang/libllvmtablegen/libllvmtablegen.a /usr/obj/usr/src/tmp/usr/src/usr.bin/clang/tblgen/../../../lib/clang/libllvmsupport/libllvmsupport.a -lncurses -legacy /usr/lib/libc.a(jemalloc_jemalloc.o): In function `imemalign': jemalloc_jemalloc.c:(.text+0x2605): relocation truncated to fit: R_X86_64_PC32 against symbol `__je_arena_malloc_large' defined in .text section in /usr/lib/libc.a(jemalloc_arena.o) c++: error: linker command failed with exit code 1 (use -v to see invocation) *** [tblgen] Error code 1 make[3]: stopped in
Re: [CURRENT]: weird memory/linker problem?
Am Sun, 22 Jun 2014 10:10:04 -0700 Adrian Chadd adr...@freebsd.org schrieb: When they segfault, where do they segfault? -a I have not investigated this issue so far, since I was convinced - in the first place - it is triggered by a defetive memory system. So I rebooted immediately being glad having found a solution. I will check next time it happens again. oh On 22 June 2014 07:56, O. Hartmann ohart...@zedat.fu-berlin.de wrote: Hello. I face a strange problem on a set of CURRENT driven boxes. The systems in question are all the same version of CURRENT (more or less, a week or so discrepancy). The boxes affected have 8 GB of RAM and are old-style Core2Duo systems. The phenomenon: Starting up the box shows the operating system working. But sometimes it is impossible to start certain applications, like Firefox - they segfault. More disturbing is the fail of the linker when building world. Sometimes I get strange messages like relocation truncated to fit: R_X86_64_PC32 against symbol `__error' defined in .text when compiling/linking. The funny thing is: rebooting the box and doing exactly the same very often leaves the system then operable - starting applications works, compiling works! First I thought this could be a indication of a dying system and so I checked the memory for two days non-stop without any indication of anything wrong. The boxes do not have ECC RAM - it's Intel. I see this problem on two C2D based boxes relatively often (one E8400 two core, another Q6600 quadcore, both systems have 8 GB RAM). This phenomenon also occured two or three months ago on another machine with 32 GB RAM and a Core-i7 3930K, but it went away (it was the very same error as shown above). Another system, a i3-3220 with 16 GB RAM never showed the problem although that system build world also on a regular basis very frequent as the C2D systems do. Well, I feel a bit confused. On the first view, the problem looks weird and it indicates a kind of memory problem - but testing the memory didn't show anything wrong. Today windowmaker stopped starting due to a malformed command in one of windowmaker's library. I did reboot the box and everything was all right. Then, also today, I tried compiling world and I got a weird error message about a misspelled Int__xxx, I can not remember exactly the text, I rebooted and everything was all right again. Those errors are frequent on 8GB, C2D based systems and at the moment not present any more on more modern systems with more memory as described above. This could be a coincidence, but it is strange anyway. I do not exclude dying hardware, but I'd like to ask whether there is something strange going on with FreeBSD's memory management at the moment and whether those problems could also be triggered by some nasty bug? I never see a crash (which would also indicated faulty hardware), I mostly realise those strange behaviour either after a fresh boot or after I ran some memory disk i/o intensive jobs, like updating the ports tree. By the way, FreeBSD CURRENT suffer from a tremendous performance cut these days when compiling world and updating the ports tree and running portmaster. On one box, on which ports reside on a UFS partion, it takes more than 8 minutes to pass the portmaster -da, which is quick when not compiling world. On another system on which /usr/ports is residing on ZFS (the box has 16GB RAM!), it takes sometimes 30(!) minutes to perform a svn update while compiling world (that is the i3-3220 with 16 GB RAM system), it takes 6 - 15 minutes when the box is relaxed and updating the ports tree the first time (every subsequent update is much faster). Well, I know these reports of mine are a bit weird since I have no exact log of the problems, but I think if there is an issue not with the hardware, I report those in. Regards, oh signature.asc Description: PGP signature
Re: [CURRENT]: weird memory/linker problem?
Am Sun, 22 Jun 2014 10:10:04 -0700 Adrian Chadd adr...@freebsd.org schrieb: When they segfault, where do they segfault? -a Now I get more fun. After a buildworld and reboot, the box in question is at CURRENT: FreeBSD 11.0-CURRENT #0 r267782: Mon Jun 23 13:12:56 CEST 2014 amd64 After a reboot, everything is/was all right. After reboot, I did an update of the ports tree (I do this regularily). I configured /etc/make.conf that way, that ports tree update is performed via using /usr/bin/svn. Now, ~ three hours of regular work (KDevelop, some GIMP, LaTeX work, nothing special, but a bit memory consuming regrading GIMP) I tried updating the ports tree and surprisingly the tree is left over in a unclean condition while /usr/bin/svn segfault (on console: pid 18013 (svn), uid 0: exited on signal 11 (core dumped)). Using /usr/local/bin/svn, which is from the devel/subversion port, performs well, while FreeBSD 11's svn contribution dies as described. It did not hours ago! Well, this drives me nuts. Is it a bug in FreeBSD (maybe relocating libs, the memory map or something else) or is it in fact the agony of my computer system? As reported below, memory checks via memtest didn't show up any kind of faulty memory. I'm out of ideas. Is there a way to stress test the CPU and memory system to check whether RAM, the CPU itself and, as an additional possibility, the disk i/o controller (Intel ICH10)? Thanks for your patience, Oliver On 22 June 2014 07:56, O. Hartmann ohart...@zedat.fu-berlin.de wrote: Hello. I face a strange problem on a set of CURRENT driven boxes. The systems in question are all the same version of CURRENT (more or less, a week or so discrepancy). The boxes affected have 8 GB of RAM and are old-style Core2Duo systems. The phenomenon: Starting up the box shows the operating system working. But sometimes it is impossible to start certain applications, like Firefox - they segfault. More disturbing is the fail of the linker when building world. Sometimes I get strange messages like relocation truncated to fit: R_X86_64_PC32 against symbol `__error' defined in .text when compiling/linking. The funny thing is: rebooting the box and doing exactly the same very often leaves the system then operable - starting applications works, compiling works! First I thought this could be a indication of a dying system and so I checked the memory for two days non-stop without any indication of anything wrong. The boxes do not have ECC RAM - it's Intel. I see this problem on two C2D based boxes relatively often (one E8400 two core, another Q6600 quadcore, both systems have 8 GB RAM). This phenomenon also occured two or three months ago on another machine with 32 GB RAM and a Core-i7 3930K, but it went away (it was the very same error as shown above). Another system, a i3-3220 with 16 GB RAM never showed the problem although that system build world also on a regular basis very frequent as the C2D systems do. Well, I feel a bit confused. On the first view, the problem looks weird and it indicates a kind of memory problem - but testing the memory didn't show anything wrong. Today windowmaker stopped starting due to a malformed command in one of windowmaker's library. I did reboot the box and everything was all right. Then, also today, I tried compiling world and I got a weird error message about a misspelled Int__xxx, I can not remember exactly the text, I rebooted and everything was all right again. Those errors are frequent on 8GB, C2D based systems and at the moment not present any more on more modern systems with more memory as described above. This could be a coincidence, but it is strange anyway. I do not exclude dying hardware, but I'd like to ask whether there is something strange going on with FreeBSD's memory management at the moment and whether those problems could also be triggered by some nasty bug? I never see a crash (which would also indicated faulty hardware), I mostly realise those strange behaviour either after a fresh boot or after I ran some memory disk i/o intensive jobs, like updating the ports tree. By the way, FreeBSD CURRENT suffer from a tremendous performance cut these days when compiling world and updating the ports tree and running portmaster. On one box, on which ports reside on a UFS partion, it takes more than 8 minutes to pass the portmaster -da, which is quick when not compiling world. On another system on which /usr/ports is residing on ZFS (the box has 16GB RAM!), it takes sometimes 30(!) minutes to perform a svn update while compiling world (that is the i3-3220 with 16 GB RAM system), it takes 6 - 15 minutes when the box is relaxed and updating the ports tree the first time (every subsequent update is much faster). Well, I know these reports
Re: [CURRENT]: weird memory/linker problem?
On 23 Jun 2014, at 16:31, O. Hartmann ohart...@zedat.fu-berlin.de wrote: Am Sun, 22 Jun 2014 10:10:04 -0700 Adrian Chadd adr...@freebsd.org schrieb: When they segfault, where do they segfault? ... GIMP, LaTeX work, nothing special, but a bit memory consuming regrading GIMP) I tried updating the ports tree and surprisingly the tree is left over in a unclean condition while /usr/bin/svn segfault (on console: pid 18013 (svn), uid 0: exited on signal 11 (core dumped)). Using /usr/local/bin/svn, which is from the devel/subversion port, performs well, while FreeBSD 11's svn contribution dies as described. It did not hours ago! I think what Adrian meant was: can you run svn (or another crashing program) in gdb, and post a backtrace? Or maybe run ktrace, and see where it dies? Alternatively, put a core dump and the executable (with debug info) in a tarball, and upload it somewhere, so somebody else can analyze it. -Dimitry signature.asc Description: Message signed with OpenPGP using GPGMail
Re: [CURRENT]: weird memory/linker problem?
On Mon, 2014-06-23 at 16:31 +0200, O. Hartmann wrote: I'm out of ideas. Is there a way to stress test the CPU and memory system to check whether RAM, the CPU itself and, as an additional possibility, the disk i/o controller (Intel ICH10)? Thanks for your patience, A really good tool for stress-testing a system is ports/math/mprime. It will find memory and cpu errors that memtest86 and other tools completely overlook. Run one copy per cpu, something like this: for i in $(jot $(sysctl -n hw.ncpu) 0) ; do sleep $((i * 2)) mprime -t -a$i /tmp/mprime$i.log done Many overclockers use this to ensure the system is stable with the OC settings. If your system can run a copy of mprime per cpu continuously for 24 hours the hardware is fine. -- Ian ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: [CURRENT]: weird memory/linker problem?
Am Mon, 23 Jun 2014 09:27:46 -0600 Ian Lepore i...@freebsd.org schrieb: On Mon, 2014-06-23 at 16:31 +0200, O. Hartmann wrote: I'm out of ideas. Is there a way to stress test the CPU and memory system to check whether RAM, the CPU itself and, as an additional possibility, the disk i/o controller (Intel ICH10)? Thanks for your patience, A really good tool for stress-testing a system is ports/math/mprime. It will find memory and cpu errors that memtest86 and other tools completely overlook. Run one copy per cpu, something like this: for i in $(jot $(sysctl -n hw.ncpu) 0) ; do sleep $((i * 2)) mprime -t -a$i /tmp/mprime$i.log done Many overclockers use this to ensure the system is stable with the OC settings. If your system can run a copy of mprime per cpu continuously for 24 hours the hardware is fine. -- Ian A great idea, but regretably I receive this error while trying to install that neat port: mprime-0.0.24.14 is only for i386, while you are running amd64. *** Error code 1 Is there a 64bit counterpart? Oliver signature.asc Description: PGP signature
[CURRENT]: weird memory/linker problem?
Hello. I face a strange problem on a set of CURRENT driven boxes. The systems in question are all the same version of CURRENT (more or less, a week or so discrepancy). The boxes affected have 8 GB of RAM and are old-style Core2Duo systems. The phenomenon: Starting up the box shows the operating system working. But sometimes it is impossible to start certain applications, like Firefox - they segfault. More disturbing is the fail of the linker when building world. Sometimes I get strange messages like relocation truncated to fit: R_X86_64_PC32 against symbol `__error' defined in .text when compiling/linking. The funny thing is: rebooting the box and doing exactly the same very often leaves the system then operable - starting applications works, compiling works! First I thought this could be a indication of a dying system and so I checked the memory for two days non-stop without any indication of anything wrong. The boxes do not have ECC RAM - it's Intel. I see this problem on two C2D based boxes relatively often (one E8400 two core, another Q6600 quadcore, both systems have 8 GB RAM). This phenomenon also occured two or three months ago on another machine with 32 GB RAM and a Core-i7 3930K, but it went away (it was the very same error as shown above). Another system, a i3-3220 with 16 GB RAM never showed the problem although that system build world also on a regular basis very frequent as the C2D systems do. Well, I feel a bit confused. On the first view, the problem looks weird and it indicates a kind of memory problem - but testing the memory didn't show anything wrong. Today windowmaker stopped starting due to a malformed command in one of windowmaker's library. I did reboot the box and everything was all right. Then, also today, I tried compiling world and I got a weird error message about a misspelled Int__xxx, I can not remember exactly the text, I rebooted and everything was all right again. Those errors are frequent on 8GB, C2D based systems and at the moment not present any more on more modern systems with more memory as described above. This could be a coincidence, but it is strange anyway. I do not exclude dying hardware, but I'd like to ask whether there is something strange going on with FreeBSD's memory management at the moment and whether those problems could also be triggered by some nasty bug? I never see a crash (which would also indicated faulty hardware), I mostly realise those strange behaviour either after a fresh boot or after I ran some memory disk i/o intensive jobs, like updating the ports tree. By the way, FreeBSD CURRENT suffer from a tremendous performance cut these days when compiling world and updating the ports tree and running portmaster. On one box, on which ports reside on a UFS partion, it takes more than 8 minutes to pass the portmaster -da, which is quick when not compiling world. On another system on which /usr/ports is residing on ZFS (the box has 16GB RAM!), it takes sometimes 30(!) minutes to perform a svn update while compiling world (that is the i3-3220 with 16 GB RAM system), it takes 6 - 15 minutes when the box is relaxed and updating the ports tree the first time (every subsequent update is much faster). Well, I know these reports of mine are a bit weird since I have no exact log of the problems, but I think if there is an issue not with the hardware, I report those in. Regards, oh signature.asc Description: PGP signature
Re: [CURRENT]: weird memory/linker problem?
When they segfault, where do they segfault? -a On 22 June 2014 07:56, O. Hartmann ohart...@zedat.fu-berlin.de wrote: Hello. I face a strange problem on a set of CURRENT driven boxes. The systems in question are all the same version of CURRENT (more or less, a week or so discrepancy). The boxes affected have 8 GB of RAM and are old-style Core2Duo systems. The phenomenon: Starting up the box shows the operating system working. But sometimes it is impossible to start certain applications, like Firefox - they segfault. More disturbing is the fail of the linker when building world. Sometimes I get strange messages like relocation truncated to fit: R_X86_64_PC32 against symbol `__error' defined in .text when compiling/linking. The funny thing is: rebooting the box and doing exactly the same very often leaves the system then operable - starting applications works, compiling works! First I thought this could be a indication of a dying system and so I checked the memory for two days non-stop without any indication of anything wrong. The boxes do not have ECC RAM - it's Intel. I see this problem on two C2D based boxes relatively often (one E8400 two core, another Q6600 quadcore, both systems have 8 GB RAM). This phenomenon also occured two or three months ago on another machine with 32 GB RAM and a Core-i7 3930K, but it went away (it was the very same error as shown above). Another system, a i3-3220 with 16 GB RAM never showed the problem although that system build world also on a regular basis very frequent as the C2D systems do. Well, I feel a bit confused. On the first view, the problem looks weird and it indicates a kind of memory problem - but testing the memory didn't show anything wrong. Today windowmaker stopped starting due to a malformed command in one of windowmaker's library. I did reboot the box and everything was all right. Then, also today, I tried compiling world and I got a weird error message about a misspelled Int__xxx, I can not remember exactly the text, I rebooted and everything was all right again. Those errors are frequent on 8GB, C2D based systems and at the moment not present any more on more modern systems with more memory as described above. This could be a coincidence, but it is strange anyway. I do not exclude dying hardware, but I'd like to ask whether there is something strange going on with FreeBSD's memory management at the moment and whether those problems could also be triggered by some nasty bug? I never see a crash (which would also indicated faulty hardware), I mostly realise those strange behaviour either after a fresh boot or after I ran some memory disk i/o intensive jobs, like updating the ports tree. By the way, FreeBSD CURRENT suffer from a tremendous performance cut these days when compiling world and updating the ports tree and running portmaster. On one box, on which ports reside on a UFS partion, it takes more than 8 minutes to pass the portmaster -da, which is quick when not compiling world. On another system on which /usr/ports is residing on ZFS (the box has 16GB RAM!), it takes sometimes 30(!) minutes to perform a svn update while compiling world (that is the i3-3220 with 16 GB RAM system), it takes 6 - 15 minutes when the box is relaxed and updating the ports tree the first time (every subsequent update is much faster). Well, I know these reports of mine are a bit weird since I have no exact log of the problems, but I think if there is an issue not with the hardware, I report those in. Regards, oh ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: [CURRENT]: weird memory/linker problem?
On 2014-06-22 10:56, O. Hartmann wrote: Hello. I face a strange problem on a set of CURRENT driven boxes. The systems in question are all the same version of CURRENT (more or less, a week or so discrepancy). The boxes affected have 8 GB of RAM and are old-style Core2Duo systems. The phenomenon: Starting up the box shows the operating system working. But sometimes it is impossible to start certain applications, like Firefox - they segfault. More disturbing is the fail of the linker when building world. Sometimes I get strange messages like relocation truncated to fit: R_X86_64_PC32 against symbol `__error' defined in .text when compiling/linking. The funny thing is: rebooting the box and doing exactly the same very often leaves the system then operable - starting applications works, compiling works! First I thought this could be a indication of a dying system and so I checked the memory for two days non-stop without any indication of anything wrong. The boxes do not have ECC RAM - it's Intel. I see this problem on two C2D based boxes relatively often (one E8400 two core, another Q6600 quadcore, both systems have 8 GB RAM). This phenomenon also occured two or three months ago on another machine with 32 GB RAM and a Core-i7 3930K, but it went away (it was the very same error as shown above). Another system, a i3-3220 with 16 GB RAM never showed the problem although that system build world also on a regular basis very frequent as the C2D systems do. Well, I feel a bit confused. On the first view, the problem looks weird and it indicates a kind of memory problem - but testing the memory didn't show anything wrong. Today windowmaker stopped starting due to a malformed command in one of windowmaker's library. I did reboot the box and everything was all right. Then, also today, I tried compiling world and I got a weird error message about a misspelled Int__xxx, I can not remember exactly the text, I rebooted and everything was all right again. Those errors are frequent on 8GB, C2D based systems and at the moment not present any more on more modern systems with more memory as described above. This could be a coincidence, but it is strange anyway. I do not exclude dying hardware, but I'd like to ask whether there is something strange going on with FreeBSD's memory management at the moment and whether those problems could also be triggered by some nasty bug? I never see a crash (which would also indicated faulty hardware), I mostly realise those strange behaviour either after a fresh boot or after I ran some memory disk i/o intensive jobs, like updating the ports tree. By the way, FreeBSD CURRENT suffer from a tremendous performance cut these days when compiling world and updating the ports tree and running portmaster. On one box, on which ports reside on a UFS partion, it takes more than 8 minutes to pass the portmaster -da, which is quick when not compiling world. On another system on which /usr/ports is residing on ZFS (the box has 16GB RAM!), it takes sometimes 30(!) minutes to perform a svn update while compiling world (that is the i3-3220 with 16 GB RAM system), it takes 6 - 15 minutes when the box is relaxed and updating the ports tree the first time (every subsequent update is much faster). Well, I know these reports of mine are a bit weird since I have no exact log of the problems, but I think if there is an issue not with the hardware, I report those in. Regards, oh In order to get a better benchmark for 'svn update' on the ports tree if you 'zfs unmount pool/usr/ports' it will flush all ARC entries for that dataset, then 'zfs mount pool/usr/ports' and run the test again. This should give you more reproducible results -- Allan Jude signature.asc Description: OpenPGP digital signature