Re: [CURRENT]: weird memory/linker problem?

2014-07-17 Thread O. Hartmann
Am Tue, 01 Jul 2014 17:23:14 +0200
Willem Jan Withagen w...@digiware.nl schrieb:

 On 2014-07-01 16:48, Rang, Anton wrote:
  DOT = DOD
 
  444F54 = 444F44
 
  That's a single-bit flip.  Bad memory, perhaps?
 
 Very likely, especially if the system does not have ECC
 It just happens on rare occasions that a alpha particle, power cycle, or 
 any things else disruptive damages a memory cell. And it could be that 
 it requires a special pattern of accesses to actually exhibit the error.
 
 In the past (199x's) 'make buildworld' used to be a rather good memory 
 tester. But nowadays look at
   http://www.memtest.org/
 
 This tool has found all of the bad memory in all the systems I used and 
 or build for others...
 Note that it might take a few runs and some more heat to actually 
 trigger the faulty cell, but memtest86 will usually find it.
 
 Note that on big systems with lots of memory it can take a loong 
 time to run just one full testset to completion.
 
 --WjW
 
 
 
  Anton
 
  -Original Message-
  From: owner-freebsd-curr...@freebsd.org 
  [mailto:owner-freebsd-curr...@freebsd.org] On
  Behalf Of O. Hartmann Sent: Tuesday, July 01, 2014 8:08 AM
  To: Dimitry Andric
  Cc: Adrian Chadd; FreeBSD CURRENT
  Subject: Re: [CURRENT]: weird memory/linker problem?
 
  Am Mon, 23 Jun 2014 17:22:25 +0200
  Dimitry Andric d...@freebsd.org schrieb:
 
  On 23 Jun 2014, at 16:31, O. Hartmann ohart...@zedat.fu-berlin.de wrote:
  Am Sun, 22 Jun 2014 10:10:04 -0700
  Adrian Chadd adr...@freebsd.org schrieb:
  When they segfault, where do they segfault?
  ...
  GIMP, LaTeX work, nothing special, but a bit memory consuming
  regrading GIMP) I tried updating the ports tree and surprisingly the
  tree is left over in a unclean condition while /usr/bin/svn segfault
  (on console: pid 18013 (svn), uid 0: exited on signal 11 (core dumped)).
 
  Using /usr/local/bin/svn, which is from the devel/subversion port,
  performs well, while FreeBSD 11's svn contribution dies as described. It 
  did not
  hours ago!
 
  I think what Adrian meant was: can you run svn (or another crashing
  program) in gdb, and post a backtrace?  Or maybe run ktrace, and see
  where it dies?
 
  Alternatively, put a core dump and the executable (with debug info) in
  a tarball, and upload it somewhere, so somebody else can analyze it.
 
  -Dimitry
 
 
  It's me again, with the same weird story.
 
  After a couple of days silence, the mysterious entity in my computer is 
  back. This
  time it is again a weird compiler message of failure (trying to buildworld):
 
  [...]
  c++  -O2 -pipe -O3 -O3
  c++ -I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/include
  -I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/tools/clang/include
  -I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/lib/Support -I.
  -I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/../../lib/clang/include
  -DLLVM_ON_UNIX -DLLVM_ON_FREEBSD -D__STDC_LIMIT_MACROS 
  -D__STDC_CONSTANT_MACROS
  -fno-strict-aliasing 
  -DLLVM_DEFAULT_TARGET_TRIPLE=\x86_64-unknown-freebsd11.0\
  -DLLVM_HOST_TRIPLE=\x86_64-unknown-freebsd11.0\ -DDEFAULT_SYSROOT=\\
  -Qunused-arguments -I/usr/obj/usr/src/tmp/legacy/usr/include -std=c++11
  -fno-exceptions -fno-rtti -Wno-c++11-extensions
  -c 
  /usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/lib/Support/Host.cpp
   -o
  Host.o --- GraphWriter.o --- In file included
  from 
  /usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/lib/Support/GraphWriter.cpp:14:
   
  /usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/include/llvm/Support/GraphWriter.h:269:10:
  error: use of undeclared identifier 'DOD'; did you mean 'DOT'? O 
  DOD::EscapeString(Label); ^~~
  DOT 
  /usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/include/llvm/Support/GraphWriter.h:35:11:
  note: 'DOT' declared here namespace DOT {  // Private functions... ^ 1 error
  generated. *** [GraphWriter.o] Error code 1
 
 
  Well, in the past I saw many of those messages, especially not found labels 
  of
  routines in shared objects/libraries or even those funny misspelled 
  messages shown
  above.
 
  I can not reproduce them after a reboot, but as long as the system is 
  running with
  this error occured, it is sticky. So in order to compile the OS 
  successfully, I
  reboot.
 
  Does anyone have an idea what this could be? Since it affects at the moment 
  only one
  machine (the other CoreDuo has been retired in the meanwhile), it feels a 
  bit like a
  miscompilation on a certain type of CPU.
 
  Thanks for your patience,
 
  Oliver


Hello all.

Well, I'd like to update some informations. It doesn't relief the special 
concern, but
might be a kind of replenishment of experience.

The box in question is now with only 4GB - and is oprable as expected. With 8 
GB, I see
those reported weird bugs and they revealed themselfes as indeed bit flips. I 
can not
reproduce them, the occur spontanously, but I can raise the frequency

Re: [CURRENT]: weird memory/linker problem?

2014-07-01 Thread O. Hartmann
Am Mon, 23 Jun 2014 17:22:25 +0200
Dimitry Andric d...@freebsd.org schrieb:

 On 23 Jun 2014, at 16:31, O. Hartmann ohart...@zedat.fu-berlin.de wrote:
  Am Sun, 22 Jun 2014 10:10:04 -0700
  Adrian Chadd adr...@freebsd.org schrieb:
  When they segfault, where do they segfault?
 ...
  GIMP, LaTeX work, nothing special, but a bit memory consuming regrading 
  GIMP) I tried
  updating the ports tree and surprisingly the tree is left over in a unclean 
  condition
  while /usr/bin/svn segfault (on console: pid 18013 (svn), uid 0: exited on 
  signal 11
  (core dumped)).
  
  Using /usr/local/bin/svn, which is from the devel/subversion port, performs 
  well,
  while FreeBSD 11's svn contribution dies as described. It did not hours ago!
 
 I think what Adrian meant was: can you run svn (or another crashing
 program) in gdb, and post a backtrace?  Or maybe run ktrace, and see
 where it dies?
 
 Alternatively, put a core dump and the executable (with debug info) in a
 tarball, and upload it somewhere, so somebody else can analyze it.
 
 -Dimitry
 

It's me again, with the same weird story.

After a couple of days silence, the mysterious entity in my computer is back. 
This time
it is again a weird compiler message of failure (trying to buildworld):

[...]
c++  -O2 -pipe -O3 -O3 
-I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/include
-I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/tools/clang/include
-I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/lib/Support -I.
-I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/../../lib/clang/include
-DLLVM_ON_UNIX -DLLVM_ON_FREEBSD -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS
-fno-strict-aliasing -DLLVM_DEFAULT_TARGET_TRIPLE=\x86_64-unknown-freebsd11.0\
-DLLVM_HOST_TRIPLE=\x86_64-unknown-freebsd11.0\ -DDEFAULT_SYSROOT=\\
-Qunused-arguments -I/usr/obj/usr/src/tmp/legacy/usr/include -std=c++11 
-fno-exceptions
-fno-rtti -Wno-c++11-extensions
-c /usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/lib/Support/Host.cpp 
-o Host.o
--- GraphWriter.o --- In file included
from 
/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/lib/Support/GraphWriter.cpp:14:
 
/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/include/llvm/Support/GraphWriter.h:269:10:
error: use of undeclared identifier 'DOD'; did you mean 'DOT'? O 
DOD::EscapeString(Label); ^~~
DOT 
/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/include/llvm/Support/GraphWriter.h:35:11:
note: 'DOT' declared here namespace DOT {  // Private functions... ^ 1 error 
generated.
*** [GraphWriter.o] Error code 1


Well, in the past I saw many of those messages, especially not found labels of 
routines
in shared objects/libraries or even those funny misspelled messages shown 
above.

I can not reproduce them after a reboot, but as long as the system is running 
with this
error occured, it is sticky. So in order to compile the OS successfully, I 
reboot.

Does anyone have an idea what this could be? Since it affects at the moment 
only one
machine (the other CoreDuo has been retired in the meanwhile), it feels a bit 
like a
miscompilation on a certain type of CPU.

Thanks for your patience,

Oliver


signature.asc
Description: PGP signature


RE: [CURRENT]: weird memory/linker problem?

2014-07-01 Thread Rang, Anton
DOT = DOD

444F54 = 444F44

That's a single-bit flip.  Bad memory, perhaps?

Anton

-Original Message-
From: owner-freebsd-curr...@freebsd.org 
[mailto:owner-freebsd-curr...@freebsd.org] On Behalf Of O. Hartmann
Sent: Tuesday, July 01, 2014 8:08 AM
To: Dimitry Andric
Cc: Adrian Chadd; FreeBSD CURRENT
Subject: Re: [CURRENT]: weird memory/linker problem?

Am Mon, 23 Jun 2014 17:22:25 +0200
Dimitry Andric d...@freebsd.org schrieb:

 On 23 Jun 2014, at 16:31, O. Hartmann ohart...@zedat.fu-berlin.de wrote:
  Am Sun, 22 Jun 2014 10:10:04 -0700
  Adrian Chadd adr...@freebsd.org schrieb:
  When they segfault, where do they segfault?
 ...
  GIMP, LaTeX work, nothing special, but a bit memory consuming 
  regrading GIMP) I tried updating the ports tree and surprisingly the 
  tree is left over in a unclean condition while /usr/bin/svn segfault 
  (on console: pid 18013 (svn), uid 0: exited on signal 11 (core dumped)).
  
  Using /usr/local/bin/svn, which is from the devel/subversion port, 
  performs well, while FreeBSD 11's svn contribution dies as described. It 
  did not hours ago!
 
 I think what Adrian meant was: can you run svn (or another crashing
 program) in gdb, and post a backtrace?  Or maybe run ktrace, and see 
 where it dies?
 
 Alternatively, put a core dump and the executable (with debug info) in 
 a tarball, and upload it somewhere, so somebody else can analyze it.
 
 -Dimitry
 

It's me again, with the same weird story.

After a couple of days silence, the mysterious entity in my computer is back. 
This time it is again a weird compiler message of failure (trying to 
buildworld):

[...]
c++  -O2 -pipe -O3 -O3 
c++ -I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/include
-I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/tools/clang/include
-I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/lib/Support -I.
-I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/../../lib/clang/include
-DLLVM_ON_UNIX -DLLVM_ON_FREEBSD -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS 
-fno-strict-aliasing -DLLVM_DEFAULT_TARGET_TRIPLE=\x86_64-unknown-freebsd11.0\
-DLLVM_HOST_TRIPLE=\x86_64-unknown-freebsd11.0\ -DDEFAULT_SYSROOT=\\
-Qunused-arguments -I/usr/obj/usr/src/tmp/legacy/usr/include -std=c++11 
-fno-exceptions -fno-rtti -Wno-c++11-extensions -c 
/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/lib/Support/Host.cpp -o 
Host.o
--- GraphWriter.o --- In file included
from 
/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/lib/Support/GraphWriter.cpp:14:
 
/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/include/llvm/Support/GraphWriter.h:269:10:
error: use of undeclared identifier 'DOD'; did you mean 'DOT'? O  
DOD::EscapeString(Label); ^~~ DOT 
/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/include/llvm/Support/GraphWriter.h:35:11:
note: 'DOT' declared here namespace DOT {  // Private functions... ^ 1 error 
generated.
*** [GraphWriter.o] Error code 1


Well, in the past I saw many of those messages, especially not found labels of 
routines in shared objects/libraries or even those funny misspelled messages 
shown above.

I can not reproduce them after a reboot, but as long as the system is running 
with this error occured, it is sticky. So in order to compile the OS 
successfully, I reboot.

Does anyone have an idea what this could be? Since it affects at the moment 
only one machine (the other CoreDuo has been retired in the meanwhile), it 
feels a bit like a miscompilation on a certain type of CPU.

Thanks for your patience,

Oliver
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [CURRENT]: weird memory/linker problem?

2014-07-01 Thread Willem Jan Withagen

On 2014-07-01 16:48, Rang, Anton wrote:

DOT = DOD

444F54 = 444F44

That's a single-bit flip.  Bad memory, perhaps?


Very likely, especially if the system does not have ECC
It just happens on rare occasions that a alpha particle, power cycle, or 
any things else disruptive damages a memory cell. And it could be that 
it requires a special pattern of accesses to actually exhibit the error.


In the past (199x's) 'make buildworld' used to be a rather good memory 
tester. But nowadays look at

http://www.memtest.org/

This tool has found all of the bad memory in all the systems I used and 
or build for others...
Note that it might take a few runs and some more heat to actually 
trigger the faulty cell, but memtest86 will usually find it.


Note that on big systems with lots of memory it can take a loong 
time to run just one full testset to completion.


--WjW




Anton

-Original Message-
From: owner-freebsd-curr...@freebsd.org 
[mailto:owner-freebsd-curr...@freebsd.org] On Behalf Of O. Hartmann
Sent: Tuesday, July 01, 2014 8:08 AM
To: Dimitry Andric
Cc: Adrian Chadd; FreeBSD CURRENT
Subject: Re: [CURRENT]: weird memory/linker problem?

Am Mon, 23 Jun 2014 17:22:25 +0200
Dimitry Andric d...@freebsd.org schrieb:


On 23 Jun 2014, at 16:31, O. Hartmann ohart...@zedat.fu-berlin.de wrote:

Am Sun, 22 Jun 2014 10:10:04 -0700
Adrian Chadd adr...@freebsd.org schrieb:

When they segfault, where do they segfault?

...

GIMP, LaTeX work, nothing special, but a bit memory consuming
regrading GIMP) I tried updating the ports tree and surprisingly the
tree is left over in a unclean condition while /usr/bin/svn segfault
(on console: pid 18013 (svn), uid 0: exited on signal 11 (core dumped)).

Using /usr/local/bin/svn, which is from the devel/subversion port,
performs well, while FreeBSD 11's svn contribution dies as described. It did 
not hours ago!


I think what Adrian meant was: can you run svn (or another crashing
program) in gdb, and post a backtrace?  Or maybe run ktrace, and see
where it dies?

Alternatively, put a core dump and the executable (with debug info) in
a tarball, and upload it somewhere, so somebody else can analyze it.

-Dimitry



It's me again, with the same weird story.

After a couple of days silence, the mysterious entity in my computer is back. 
This time it is again a weird compiler message of failure (trying to 
buildworld):

[...]
c++  -O2 -pipe -O3 -O3
c++ -I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/include
-I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/tools/clang/include
-I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/lib/Support -I.
-I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/../../lib/clang/include
-DLLVM_ON_UNIX -DLLVM_ON_FREEBSD -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS 
-fno-strict-aliasing -DLLVM_DEFAULT_TARGET_TRIPLE=\x86_64-unknown-freebsd11.0\
-DLLVM_HOST_TRIPLE=\x86_64-unknown-freebsd11.0\ -DDEFAULT_SYSROOT=\\
-Qunused-arguments -I/usr/obj/usr/src/tmp/legacy/usr/include -std=c++11 
-fno-exceptions -fno-rtti -Wno-c++11-extensions -c 
/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/lib/Support/Host.cpp -o 
Host.o
--- GraphWriter.o --- In file included
from 
/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/lib/Support/GraphWriter.cpp:14:
 
/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/include/llvm/Support/GraphWriter.h:269:10:
error: use of undeclared identifier 'DOD'; did you mean 'DOT'? O  
DOD::EscapeString(Label); ^~~ DOT 
/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/include/llvm/Support/GraphWriter.h:35:11:
note: 'DOT' declared here namespace DOT {  // Private functions... ^ 1 error 
generated.
*** [GraphWriter.o] Error code 1


Well, in the past I saw many of those messages, especially not found labels of routines 
in shared objects/libraries or even those funny misspelled messages shown 
above.

I can not reproduce them after a reboot, but as long as the system is running 
with this error occured, it is sticky. So in order to compile the OS 
successfully, I reboot.

Does anyone have an idea what this could be? Since it affects at the moment 
only one machine (the other CoreDuo has been retired in the meanwhile), it 
feels a bit like a miscompilation on a certain type of CPU.

Thanks for your patience,

Oliver
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org



___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [CURRENT]: weird memory/linker problem?

2014-07-01 Thread O. Hartmann
Am Tue, 01 Jul 2014 17:23:14 +0200
Willem Jan Withagen w...@digiware.nl schrieb:

 On 2014-07-01 16:48, Rang, Anton wrote:
  DOT = DOD
 
  444F54 = 444F44
 
  That's a single-bit flip.  Bad memory, perhaps?
 
 Very likely, especially if the system does not have ECC
 It just happens on rare occasions that a alpha particle, power cycle, or 
 any things else disruptive damages a memory cell. And it could be that 
 it requires a special pattern of accesses to actually exhibit the error.
 
 In the past (199x's) 'make buildworld' used to be a rather good memory 
 tester. But nowadays look at
   http://www.memtest.org/
 
 This tool has found all of the bad memory in all the systems I used and 
 or build for others...
 Note that it might take a few runs and some more heat to actually 
 trigger the faulty cell, but memtest86 will usually find it.
 
 Note that on big systems with lots of memory it can take a loong 
 time to run just one full testset to completion.
 
 --WjW

I already testet via memtest86+ (had to download the linux image, the port on 
FreeBSD is
broken on CURRENT). It didn't find anything strange so far.

I will do another test.

I realised, that on that that specific box, the chipset temperature is 81 Grad 
Celius.
The chipset is a Eaglelake P45 - in which the memory controller resides on that 
old
platform. dmidecode gives:

Manufacturer: ASUSTeK Computer INC.
Product Name: P5Q-WS
Version: Rev 1.xx

 
 
 
  Anton
 
  -Original Message-
  From: owner-freebsd-curr...@freebsd.org 
  [mailto:owner-freebsd-curr...@freebsd.org] On
  Behalf Of O. Hartmann Sent: Tuesday, July 01, 2014 8:08 AM
  To: Dimitry Andric
  Cc: Adrian Chadd; FreeBSD CURRENT
  Subject: Re: [CURRENT]: weird memory/linker problem?
 
  Am Mon, 23 Jun 2014 17:22:25 +0200
  Dimitry Andric d...@freebsd.org schrieb:
 
  On 23 Jun 2014, at 16:31, O. Hartmann ohart...@zedat.fu-berlin.de wrote:
  Am Sun, 22 Jun 2014 10:10:04 -0700
  Adrian Chadd adr...@freebsd.org schrieb:
  When they segfault, where do they segfault?
  ...
  GIMP, LaTeX work, nothing special, but a bit memory consuming
  regrading GIMP) I tried updating the ports tree and surprisingly the
  tree is left over in a unclean condition while /usr/bin/svn segfault
  (on console: pid 18013 (svn), uid 0: exited on signal 11 (core dumped)).
 
  Using /usr/local/bin/svn, which is from the devel/subversion port,
  performs well, while FreeBSD 11's svn contribution dies as described. It 
  did not
  hours ago!
 
  I think what Adrian meant was: can you run svn (or another crashing
  program) in gdb, and post a backtrace?  Or maybe run ktrace, and see
  where it dies?
 
  Alternatively, put a core dump and the executable (with debug info) in
  a tarball, and upload it somewhere, so somebody else can analyze it.
 
  -Dimitry
 
 
  It's me again, with the same weird story.
 
  After a couple of days silence, the mysterious entity in my computer is 
  back. This
  time it is again a weird compiler message of failure (trying to buildworld):
 
  [...]
  c++  -O2 -pipe -O3 -O3
  c++ -I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/include
  -I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/tools/clang/include
  -I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/lib/Support -I.
  -I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/../../lib/clang/include
  -DLLVM_ON_UNIX -DLLVM_ON_FREEBSD -D__STDC_LIMIT_MACROS 
  -D__STDC_CONSTANT_MACROS
  -fno-strict-aliasing 
  -DLLVM_DEFAULT_TARGET_TRIPLE=\x86_64-unknown-freebsd11.0\
  -DLLVM_HOST_TRIPLE=\x86_64-unknown-freebsd11.0\ -DDEFAULT_SYSROOT=\\
  -Qunused-arguments -I/usr/obj/usr/src/tmp/legacy/usr/include -std=c++11
  -fno-exceptions -fno-rtti -Wno-c++11-extensions
  -c 
  /usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/lib/Support/Host.cpp
   -o
  Host.o --- GraphWriter.o --- In file included
  from 
  /usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/lib/Support/GraphWriter.cpp:14:
   
  /usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/include/llvm/Support/GraphWriter.h:269:10:
  error: use of undeclared identifier 'DOD'; did you mean 'DOT'? O 
  DOD::EscapeString(Label); ^~~
  DOT 
  /usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/include/llvm/Support/GraphWriter.h:35:11:
  note: 'DOT' declared here namespace DOT {  // Private functions... ^ 1 error
  generated. *** [GraphWriter.o] Error code 1
 
 
  Well, in the past I saw many of those messages, especially not found labels 
  of
  routines in shared objects/libraries or even those funny misspelled 
  messages shown
  above.
 
  I can not reproduce them after a reboot, but as long as the system is 
  running with
  this error occured, it is sticky. So in order to compile the OS 
  successfully, I
  reboot.
 
  Does anyone have an idea what this could be? Since it affects at the moment 
  only one
  machine (the other CoreDuo has been retired in the meanwhile), it feels a 
  bit like

Re: [CURRENT]: weird memory/linker problem?

2014-07-01 Thread Willem Jan Withagen

On 2014-07-01 17:33, O. Hartmann wrote:

Am Tue, 01 Jul 2014 17:23:14 +0200
Willem Jan Withagen w...@digiware.nl schrieb:


On 2014-07-01 16:48, Rang, Anton wrote:

DOT = DOD

444F54 = 444F44

That's a single-bit flip.  Bad memory, perhaps?


Very likely, especially if the system does not have ECC
It just happens on rare occasions that a alpha particle, power cycle, or
any things else disruptive damages a memory cell. And it could be that
it requires a special pattern of accesses to actually exhibit the error.

In the past (199x's) 'make buildworld' used to be a rather good memory
tester. But nowadays look at
http://www.memtest.org/

This tool has found all of the bad memory in all the systems I used and
or build for others...
Note that it might take a few runs and some more heat to actually
trigger the faulty cell, but memtest86 will usually find it.

Note that on big systems with lots of memory it can take a loong
time to run just one full testset to completion.

--WjW


I already testet via memtest86+ (had to download the linux image, the port on 
FreeBSD is
broken on CURRENT). It didn't find anything strange so far.

I will do another test.

I realised, that on that that specific box, the chipset temperature is 81 Grad 
Celius.
The chipset is a Eaglelake P45 - in which the memory controller resides on that 
old
platform. dmidecode gives:

 Manufacturer: ASUSTeK Computer INC.
 Product Name: P5Q-WS
 Version: Rev 1.xx


Hi Oliver,

I've build several (5+) systems with these boards (from memory they date 
around 2009??). And if I recall right, one of them is still functional. 
The first one broke down in a couple of weeks, and the other did not 
survive time either.


The auxiliary chips on that board do run hot, but I never realized this 
hot. Is 81C is the CPU temp from sysctl, or did you measure the cooling 
body on the motherboard. In the later case it is just too hot, probably.
But even if it is the temp on the chip itself, I've rrarely seen temps 
go up this high.


You can need to run the memtest86 for more than 6-10 complete runs with 
all the tests.


If the memtests do not reveal anything broken, then you get into even 
more wizardry stuff, like bad power etc... Especially since it only 
occurs on occasion, it is going to be a nightmare to find the root cause 
of this. Other than replacing hardware piece by piece, which won't be 
easy given the age of the board and parts.


You could go into the bios, and try to config ram access at a slower 
speed and see if the problem goes away. Then it could be that you are 
running an the edge of the spec with regards to ram timing.


But like I said, it is all lots of funky details that can interact in 
strange and unexpected ways.


--WjW

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [CURRENT]: weird memory/linker problem?

2014-07-01 Thread O. Hartmann
Am Tue, 01 Jul 2014 17:57:26 +0200
Willem Jan Withagen w...@digiware.nl schrieb:

 On 2014-07-01 17:33, O. Hartmann wrote:
  Am Tue, 01 Jul 2014 17:23:14 +0200
  Willem Jan Withagen w...@digiware.nl schrieb:
 
  On 2014-07-01 16:48, Rang, Anton wrote:
  DOT = DOD
 
  444F54 = 444F44
 
  That's a single-bit flip.  Bad memory, perhaps?
 
  Very likely, especially if the system does not have ECC
  It just happens on rare occasions that a alpha particle, power cycle, or
  any things else disruptive damages a memory cell. And it could be that
  it requires a special pattern of accesses to actually exhibit the error.
 
  In the past (199x's) 'make buildworld' used to be a rather good memory
  tester. But nowadays look at
 http://www.memtest.org/
 
  This tool has found all of the bad memory in all the systems I used and
  or build for others...
  Note that it might take a few runs and some more heat to actually
  trigger the faulty cell, but memtest86 will usually find it.
 
  Note that on big systems with lots of memory it can take a loong
  time to run just one full testset to completion.
 
  --WjW
 
  I already testet via memtest86+ (had to download the linux image, the port 
  on FreeBSD
  is broken on CURRENT). It didn't find anything strange so far.
 
  I will do another test.
 
  I realised, that on that that specific box, the chipset temperature is 81 
  Grad Celius.
  The chipset is a Eaglelake P45 - in which the memory controller resides on 
  that old
  platform. dmidecode gives:
 
   Manufacturer: ASUSTeK Computer INC.
   Product Name: P5Q-WS
   Version: Rev 1.xx



Hello Willem,

 
 Hi Oliver,
 
 I've build several (5+) systems with these boards (from memory they date 
 around 2009??). And if I recall right, one of them is still functional. 
 The first one broke down in a couple of weeks, and the other did not 
 survive time either.
 
 The auxiliary chips on that board do run hot, but I never realized this 
 hot. Is 81C is the CPU temp from sysctl, or did you measure the cooling 
 body on the motherboard. In the later case it is just too hot, probably.
 But even if it is the temp on the chip itself, I've rrarely seen temps 
 go up this high.

The temperature is seen in BIOS and by the usage of one of those health daemon, 
found in
ports (forgot about the name). 
There is no sysctl MIB showing the chipset temperature on that board, as far as 
I know.

 
 You can need to run the memtest86 for more than 6-10 complete runs with 
 all the tests.

Last time I ran memtest86+ it took ~ 1 1/2 days to finish.

 
 If the memtests do not reveal anything broken, then you get into even 
 more wizardry stuff, like bad power etc... Especially since it only 
 occurs on occasion, it is going to be a nightmare to find the root cause 
 of this. Other than replacing hardware piece by piece, which won't be 
 easy given the age of the board and parts.
 
 You could go into the bios, and try to config ram access at a slower 
 speed and see if the problem goes away. Then it could be that you are 
 running an the edge of the spec with regards to ram timing.
 
 But like I said, it is all lots of funky details that can interact in 
 strange and unexpected ways.
 
 --WjW

I will check memory these days again.

Regards,
Oliver



signature.asc
Description: PGP signature


Re: [CURRENT]: weird memory/linker problem?

2014-06-25 Thread O. Hartmann
Am Mon, 23 Jun 2014 17:22:25 +0200
Dimitry Andric d...@freebsd.org schrieb:

 On 23 Jun 2014, at 16:31, O. Hartmann ohart...@zedat.fu-berlin.de wrote:
  Am Sun, 22 Jun 2014 10:10:04 -0700
  Adrian Chadd adr...@freebsd.org schrieb:
  When they segfault, where do they segfault?
 ...
  GIMP, LaTeX work, nothing special, but a bit memory consuming regrading 
  GIMP) I tried
  updating the ports tree and surprisingly the tree is left over in a unclean 
  condition
  while /usr/bin/svn segfault (on console: pid 18013 (svn), uid 0: exited on 
  signal 11
  (core dumped)).
  
  Using /usr/local/bin/svn, which is from the devel/subversion port, performs 
  well,
  while FreeBSD 11's svn contribution dies as described. It did not hours ago!
 
 I think what Adrian meant was: can you run svn (or another crashing
 program) in gdb, and post a backtrace?  Or maybe run ktrace, and see
 where it dies?
 
 Alternatively, put a core dump and the executable (with debug info) in a
 tarball, and upload it somewhere, so somebody else can analyze it.
 
 -Dimitry
 

Here I am again.

So far, a report what I did. Regarding to the svn issue, I tried to
recompile  make -C usr.bin/svn clean depend obj all install with setting -O0 
-g
-DDEBUG in /etc/make.conf and /etc/src.conf (disabling all the -O flags I use
usually). gdb complained about missing symbols. After the recompilation the 
onboard svn
didn't crash  anymore and the strange story seems to continue.

Firefox, so far, also crashed yesterday - out of the blue - with a bus error 
(SIG 10).
Rebooting solved the problem. I didn't recompile the system or any client with 
DEBUG
flags set on so far. So, sorry, this issue is still open, but it is not even 
less weird.


Next, today, I tried recompiling world. The build process fails on the box in 
question
with my well known friend relocation truncated to
fit: R_X86_64_PC32 against symbol error. See below.

I'm about to reboot the box and restart building world without having prior to 
the build
started any memory consuming applications.

Since the problems seem to be randomly I ask myself whether this is somehow 
related to
the ASLR stuff mentioned earlier in the list. I also will disable -O3 again 
with the
next build to ensure that CLANG isn't miscompilating something.

As mentioned in the list before, I tried to find some CPU-burning and memory 
eating
applications/tests, but since math/mprime is i386 only and sysutils/cpuburn 
only covers
ancient CPUs, I feel a bit lost in that task and leftover with memtest86 
(which
indicated earlier no memory problems with the box).

And by the way, I face several serious issues with the I/O performance on 
CURRENT these
days: it takes a long time until portmaster has stepped through the ports which 
are
about to be updated when CLANG compiler is compiling world/kernel in the 
background.
This phenomenon has grown worse since earlier this year (~ February). 

Source at revision 267867. FreeBSD 11.0-CURRENT #0 r267816: Tue Jun 24 14:02:22 
CEST 2014
amd64.

[...]
c++ -O2 -pipe -O3 -O3 
-I/usr/src/usr.bin/clang/tblgen/../../../contrib/llvm/include
-I/usr/src/usr.bin/clang/tblgen/../../../contrib/llvm/tools/clang/include
-I/usr/src/usr.bin/clang/tblgen/../../../contrib/llvm/utils/TableGen -I.
-I/usr/src/usr.bin/clang/tblgen/../../../contrib/llvm/../../lib/clang/include
-DLLVM_ON_UNIX -DLLVM_ON_FREEBSD -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS
-fno-strict-aliasing -DLLVM_DEFAULT_TARGET_TRIPLE=\x86_64-unknown-freebsd11.0\
-DLLVM_HOST_TRIPLE=\x86_64-unknown-freebsd11.0\ -DDEFAULT_SYSROOT=\\
-Qunused-arguments -I/usr/obj/usr/src/tmp/legacy/usr/include -std=c++11 
-fno-exceptions
-fno-rtti -Wno-c++11-extensions  -static -L/usr/obj/usr/src/tmp/legacy/usr/lib 
-o tblgen
AsmMatcherEmitter.o AsmWriterEmitter.o AsmWriterInst.o CTagsEmitter.o
CallingConvEmitter.o CodeEmitterGen.o CodeGenDAGPatterns.o CodeGenInstruction.o
CodeGenMapTable.o CodeGenRegisters.o CodeGenSchedule.o CodeGenTarget.o 
DAGISelEmitter.o
DAGISelMatcher.o DAGISelMatcherEmitter.o DAGISelMatcherGen.o DAGISelMatcherOpt.o
DFAPacketizerEmitter.o DisassemblerEmitter.o FastISelEmitter.o 
FixedLenDecoderEmitter.o
InstrInfoEmitter.o IntrinsicEmitter.o OptParserEmitter.o PseudoLoweringEmitter.o
RegisterInfoEmitter.o SetTheory.o SubtargetEmitter.o TGValueTypes.o TableGen.o
X86DisassemblerTables.o X86ModRMFilters.o
X86RecognizableInstr.o 
/usr/obj/usr/src/tmp/usr/src/usr.bin/clang/tblgen/../../../lib/clang/libllvmtablegen/libllvmtablegen.a
 
/usr/obj/usr/src/tmp/usr/src/usr.bin/clang/tblgen/../../../lib/clang/libllvmsupport/libllvmsupport.a
-lncurses -legacy /usr/lib/libc.a(jemalloc_jemalloc.o): In function `imemalign':
jemalloc_jemalloc.c:(.text+0x2605): relocation truncated to fit: R_X86_64_PC32 
against
symbol `__je_arena_malloc_large' defined in .text section
in /usr/lib/libc.a(jemalloc_arena.o) c++: error: linker command failed with 
exit code 1
(use -v to see invocation) *** [tblgen] Error code 1

make[3]: stopped in 

Re: [CURRENT]: weird memory/linker problem?

2014-06-23 Thread O. Hartmann
Am Sun, 22 Jun 2014 10:10:04 -0700
Adrian Chadd adr...@freebsd.org schrieb:

 When they segfault, where do they segfault?
 
 
 
 -a
 
 

I have not investigated this issue so far, since I was convinced - in the first 
place -
it is triggered by a defetive memory system. So I rebooted immediately being 
glad having
found a solution.

I will check next time it happens again.

oh
 On 22 June 2014 07:56, O. Hartmann ohart...@zedat.fu-berlin.de wrote:
 
  Hello.
 
  I face a strange problem on a set of CURRENT driven boxes. The systems in 
  question are
  all the same version of CURRENT (more or less, a week or so discrepancy).
 
  The boxes affected have 8 GB of RAM and are old-style Core2Duo systems.
 
  The phenomenon:
 
  Starting up the box shows the operating system working. But sometimes it is
  impossible to start certain applications, like Firefox - they segfault. More
  disturbing is the fail of the linker when building world. Sometimes I get 
  strange
  messages like
 
  relocation truncated to fit: R_X86_64_PC32 against symbol `__error' defined 
  in .text
 
  when compiling/linking. The funny thing is: rebooting the box and doing 
  exactly the
  same very often leaves the system then operable - starting applications 
  works,
  compiling works!
 
  First I thought this could be a indication of a dying system and so I 
  checked the
  memory for two days non-stop without any indication of anything wrong. The 
  boxes do
  not have ECC RAM - it's Intel.
 
  I see this problem on two C2D based boxes relatively often (one E8400 two 
  core,
  another Q6600 quadcore, both systems have 8 GB RAM). This phenomenon also 
  occured two
  or three months ago on another machine with 32 GB RAM and a Core-i7 3930K, 
  but it
  went away (it was the very same error as shown above).
 
  Another system, a i3-3220 with 16 GB RAM never showed the problem although 
  that system
  build world also on a regular basis very frequent as the C2D systems do.
 
  Well, I feel a bit confused. On the first view, the problem looks weird and 
  it
  indicates a kind of memory problem - but testing the memory didn't show 
  anything
  wrong.
 
  Today windowmaker stopped starting due to a malformed command in one of
  windowmaker's library. I did reboot the box and everything was all right. 
  Then, also
  today, I tried compiling world and I got a weird error message about a 
  misspelled
  Int__xxx, I can not remember exactly the text, I rebooted and everything 
  was all
  right again.
 
  Those errors are frequent on 8GB, C2D based systems and at the moment not 
  present any
  more on more modern systems with more memory as described above. This could 
  be a
  coincidence, but it is strange anyway.
 
  I do not exclude dying hardware, but I'd like to ask whether there is 
  something
  strange going on with FreeBSD's memory management at the moment and whether 
  those
  problems could also be triggered by some nasty bug? I never see a crash 
  (which would
  also indicated faulty hardware), I mostly realise those strange behaviour 
  either
  after a fresh boot or after I ran some memory disk i/o intensive jobs, like 
  updating
  the ports tree.
 
  By the way, FreeBSD CURRENT suffer from a tremendous performance cut these 
  days when
  compiling world and updating the ports tree and running portmaster. On one 
  box, on
  which ports reside on a UFS partion, it takes more than 8 minutes to pass 
  the
  portmaster -da, which is quick when not compiling world. On another system 
  on
  which /usr/ports is residing on ZFS (the box has 16GB RAM!), it takes 
  sometimes 30(!)
  minutes to perform a svn update while compiling world (that is the 
  i3-3220 with 16
  GB RAM system), it takes 6 - 15 minutes when the box is relaxed and 
  updating the
  ports tree the first time (every subsequent update is much faster).
 
  Well, I know these reports of mine are a bit weird since I have no exact 
  log of the
  problems, but I think if there is an issue not with the hardware, I report 
  those in.
 
  Regards,
 
  oh




signature.asc
Description: PGP signature


Re: [CURRENT]: weird memory/linker problem?

2014-06-23 Thread O. Hartmann
Am Sun, 22 Jun 2014 10:10:04 -0700
Adrian Chadd adr...@freebsd.org schrieb:

 When they segfault, where do they segfault?
 
 
 
 -a

Now I get more fun.

After a buildworld and reboot, the box in question is at CURRENT:

FreeBSD 11.0-CURRENT #0 r267782: Mon Jun 23 13:12:56 CEST 2014 amd64

After a reboot, everything is/was all right. After reboot, I did an update of 
the ports
tree (I do this regularily). I configured /etc/make.conf that way, that ports 
tree update
is performed via using /usr/bin/svn. Now, ~ three hours of regular work 
(KDevelop, some
GIMP, LaTeX work, nothing special, but a bit memory consuming regrading GIMP) I 
tried
updating the ports tree and surprisingly the tree is left over in a unclean 
condition
while /usr/bin/svn segfault (on console: pid 18013 (svn), uid 0: exited on 
signal 11
(core dumped)).

Using /usr/local/bin/svn, which is from the devel/subversion port, performs 
well, while
FreeBSD 11's svn contribution dies as described. It did not hours ago!

Well, this drives me nuts. Is it a bug in FreeBSD (maybe relocating libs, the 
memory map
or something else) or is it in fact the agony of my computer system? As 
reported below,
memory checks via memtest didn't show up any kind of faulty memory.

I'm out of ideas. Is there a way to stress test the CPU and memory system to 
check
whether RAM, the CPU itself and, as an additional possibility, the disk i/o 
controller
(Intel ICH10)?

Thanks for your patience,

Oliver
 
 
 
 On 22 June 2014 07:56, O. Hartmann ohart...@zedat.fu-berlin.de wrote:
 
  Hello.
 
  I face a strange problem on a set of CURRENT driven boxes. The systems in 
  question are
  all the same version of CURRENT (more or less, a week or so discrepancy).
 
  The boxes affected have 8 GB of RAM and are old-style Core2Duo systems.
 
  The phenomenon:
 
  Starting up the box shows the operating system working. But sometimes it is
  impossible to start certain applications, like Firefox - they segfault. More
  disturbing is the fail of the linker when building world. Sometimes I get 
  strange
  messages like
 
  relocation truncated to fit: R_X86_64_PC32 against symbol `__error' defined 
  in .text
 
  when compiling/linking. The funny thing is: rebooting the box and doing 
  exactly the
  same very often leaves the system then operable - starting applications 
  works,
  compiling works!
 
  First I thought this could be a indication of a dying system and so I 
  checked the
  memory for two days non-stop without any indication of anything wrong. The 
  boxes do
  not have ECC RAM - it's Intel.
 
  I see this problem on two C2D based boxes relatively often (one E8400 two 
  core,
  another Q6600 quadcore, both systems have 8 GB RAM). This phenomenon also 
  occured two
  or three months ago on another machine with 32 GB RAM and a Core-i7 3930K, 
  but it
  went away (it was the very same error as shown above).
 
  Another system, a i3-3220 with 16 GB RAM never showed the problem although 
  that system
  build world also on a regular basis very frequent as the C2D systems do.
 
  Well, I feel a bit confused. On the first view, the problem looks weird and 
  it
  indicates a kind of memory problem - but testing the memory didn't show 
  anything
  wrong.
 
  Today windowmaker stopped starting due to a malformed command in one of
  windowmaker's library. I did reboot the box and everything was all right. 
  Then, also
  today, I tried compiling world and I got a weird error message about a 
  misspelled
  Int__xxx, I can not remember exactly the text, I rebooted and everything 
  was all
  right again.
 
  Those errors are frequent on 8GB, C2D based systems and at the moment not 
  present any
  more on more modern systems with more memory as described above. This could 
  be a
  coincidence, but it is strange anyway.
 
  I do not exclude dying hardware, but I'd like to ask whether there is 
  something
  strange going on with FreeBSD's memory management at the moment and whether 
  those
  problems could also be triggered by some nasty bug? I never see a crash 
  (which would
  also indicated faulty hardware), I mostly realise those strange behaviour 
  either
  after a fresh boot or after I ran some memory disk i/o intensive jobs, like 
  updating
  the ports tree.
 
  By the way, FreeBSD CURRENT suffer from a tremendous performance cut these 
  days when
  compiling world and updating the ports tree and running portmaster. On one 
  box, on
  which ports reside on a UFS partion, it takes more than 8 minutes to pass 
  the
  portmaster -da, which is quick when not compiling world. On another system 
  on
  which /usr/ports is residing on ZFS (the box has 16GB RAM!), it takes 
  sometimes 30(!)
  minutes to perform a svn update while compiling world (that is the 
  i3-3220 with 16
  GB RAM system), it takes 6 - 15 minutes when the box is relaxed and 
  updating the
  ports tree the first time (every subsequent update is much faster).
 
  Well, I know these reports 

Re: [CURRENT]: weird memory/linker problem?

2014-06-23 Thread Dimitry Andric
On 23 Jun 2014, at 16:31, O. Hartmann ohart...@zedat.fu-berlin.de wrote:
 Am Sun, 22 Jun 2014 10:10:04 -0700
 Adrian Chadd adr...@freebsd.org schrieb:
 When they segfault, where do they segfault?
...
 GIMP, LaTeX work, nothing special, but a bit memory consuming regrading GIMP) 
 I tried
 updating the ports tree and surprisingly the tree is left over in a unclean 
 condition
 while /usr/bin/svn segfault (on console: pid 18013 (svn), uid 0: exited on 
 signal 11
 (core dumped)).
 
 Using /usr/local/bin/svn, which is from the devel/subversion port, performs 
 well, while
 FreeBSD 11's svn contribution dies as described. It did not hours ago!

I think what Adrian meant was: can you run svn (or another crashing
program) in gdb, and post a backtrace?  Or maybe run ktrace, and see
where it dies?

Alternatively, put a core dump and the executable (with debug info) in a
tarball, and upload it somewhere, so somebody else can analyze it.

-Dimitry



signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [CURRENT]: weird memory/linker problem?

2014-06-23 Thread Ian Lepore
On Mon, 2014-06-23 at 16:31 +0200, O. Hartmann wrote:
 
 I'm out of ideas. Is there a way to stress test the CPU and memory
 system to check
 whether RAM, the CPU itself and, as an additional possibility, the
 disk i/o controller
 (Intel ICH10)?
 
 Thanks for your patience,

A really good tool for stress-testing a system is ports/math/mprime.  It
will find memory and cpu errors that memtest86 and other tools
completely overlook.  Run one copy per cpu, something like this:

for i in $(jot $(sysctl -n hw.ncpu) 0) ; do
sleep $((i * 2))  mprime -t -a$i /tmp/mprime$i.log 
done

Many overclockers use this to ensure the system is stable with the OC
settings.  If your system can run a copy of mprime per cpu continuously
for 24 hours the hardware is fine.

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [CURRENT]: weird memory/linker problem?

2014-06-23 Thread O. Hartmann
Am Mon, 23 Jun 2014 09:27:46 -0600
Ian Lepore i...@freebsd.org schrieb:

 On Mon, 2014-06-23 at 16:31 +0200, O. Hartmann wrote:
  
  I'm out of ideas. Is there a way to stress test the CPU and memory
  system to check
  whether RAM, the CPU itself and, as an additional possibility, the
  disk i/o controller
  (Intel ICH10)?
  
  Thanks for your patience,
 
 A really good tool for stress-testing a system is ports/math/mprime.  It
 will find memory and cpu errors that memtest86 and other tools
 completely overlook.  Run one copy per cpu, something like this:
 
 for i in $(jot $(sysctl -n hw.ncpu) 0) ; do
 sleep $((i * 2))  mprime -t -a$i /tmp/mprime$i.log 
 done
 
 Many overclockers use this to ensure the system is stable with the OC
 settings.  If your system can run a copy of mprime per cpu continuously
 for 24 hours the hardware is fine.
 
 -- Ian

A great idea, but regretably I receive this error while trying to install that 
neat port:

mprime-0.0.24.14 is only for i386, while you are running amd64.
*** Error code 1

Is there a 64bit counterpart?

Oliver



signature.asc
Description: PGP signature


[CURRENT]: weird memory/linker problem?

2014-06-22 Thread O. Hartmann

Hello.

I face a strange problem on a set of CURRENT driven boxes. The systems in 
question are
all the same version of CURRENT (more or less, a week or so discrepancy).

The boxes affected have 8 GB of RAM and are old-style Core2Duo systems.

The phenomenon:

Starting up the box shows the operating system working. But sometimes it is 
impossible to
start certain applications, like Firefox - they segfault. More disturbing is 
the fail of
the linker when building world. Sometimes I get strange messages like

relocation truncated to fit: R_X86_64_PC32 against symbol `__error' defined in 
.text

when compiling/linking. The funny thing is: rebooting the box and doing exactly 
the same
very often leaves the system then operable - starting applications works, 
compiling works!

First I thought this could be a indication of a dying system and so I checked 
the memory
for two days non-stop without any indication of anything wrong. The boxes do 
not have ECC
RAM - it's Intel.

I see this problem on two C2D based boxes relatively often (one E8400 two core, 
another
Q6600 quadcore, both systems have 8 GB RAM). This phenomenon also occured two 
or three
months ago on another machine with 32 GB RAM and a Core-i7 3930K, but it went 
away (it was
the very same error as shown above).

Another system, a i3-3220 with 16 GB RAM never showed the problem although that 
system
build world also on a regular basis very frequent as the C2D systems do.

Well, I feel a bit confused. On the first view, the problem looks weird and it 
indicates
a kind of memory problem - but testing the memory didn't show anything wrong. 

Today windowmaker stopped starting due to a malformed command in one of 
windowmaker's
library. I did reboot the box and everything was all right. Then, also today, I 
tried
compiling world and I got a weird error message about a misspelled Int__xxx, 
I can not
remember exactly the text, I rebooted and everything was all right again.

Those errors are frequent on 8GB, C2D based systems and at the moment not 
present any
more on more modern systems with more memory as described above. This could be a
coincidence, but it is strange anyway.

I do not exclude dying hardware, but I'd like to ask whether there is something 
strange
going on with FreeBSD's memory management at the moment and whether those 
problems could
also be triggered by some nasty bug? I never see a crash (which would also 
indicated
faulty hardware), I mostly realise those strange behaviour either after a fresh 
boot or
after I ran some memory disk i/o intensive jobs, like updating the ports tree.

By the way, FreeBSD CURRENT suffer from a tremendous performance cut these days 
when
compiling world and updating the ports tree and running portmaster. On one box, 
on which
ports reside on a UFS partion, it takes more than 8 minutes to pass the 
portmaster -da,
which is quick when not compiling world. On another system on which /usr/ports 
is
residing on ZFS (the box has 16GB RAM!), it takes sometimes 30(!) minutes to 
perform a
svn update while compiling world (that is the i3-3220 with 16 GB RAM system), 
it takes
6 - 15 minutes when the box is relaxed and updating the ports tree the first 
time (every
subsequent update is much faster).

Well, I know these reports of mine are a bit weird since I have no exact log of 
the
problems, but I think if there is an issue not with the hardware, I report 
those in.

Regards,

oh


signature.asc
Description: PGP signature


Re: [CURRENT]: weird memory/linker problem?

2014-06-22 Thread Adrian Chadd
When they segfault, where do they segfault?



-a


On 22 June 2014 07:56, O. Hartmann ohart...@zedat.fu-berlin.de wrote:

 Hello.

 I face a strange problem on a set of CURRENT driven boxes. The systems in 
 question are
 all the same version of CURRENT (more or less, a week or so discrepancy).

 The boxes affected have 8 GB of RAM and are old-style Core2Duo systems.

 The phenomenon:

 Starting up the box shows the operating system working. But sometimes it is 
 impossible to
 start certain applications, like Firefox - they segfault. More disturbing is 
 the fail of
 the linker when building world. Sometimes I get strange messages like

 relocation truncated to fit: R_X86_64_PC32 against symbol `__error' defined 
 in .text

 when compiling/linking. The funny thing is: rebooting the box and doing 
 exactly the same
 very often leaves the system then operable - starting applications works, 
 compiling works!

 First I thought this could be a indication of a dying system and so I checked 
 the memory
 for two days non-stop without any indication of anything wrong. The boxes do 
 not have ECC
 RAM - it's Intel.

 I see this problem on two C2D based boxes relatively often (one E8400 two 
 core, another
 Q6600 quadcore, both systems have 8 GB RAM). This phenomenon also occured two 
 or three
 months ago on another machine with 32 GB RAM and a Core-i7 3930K, but it went 
 away (it was
 the very same error as shown above).

 Another system, a i3-3220 with 16 GB RAM never showed the problem although 
 that system
 build world also on a regular basis very frequent as the C2D systems do.

 Well, I feel a bit confused. On the first view, the problem looks weird and 
 it indicates
 a kind of memory problem - but testing the memory didn't show anything wrong.

 Today windowmaker stopped starting due to a malformed command in one of 
 windowmaker's
 library. I did reboot the box and everything was all right. Then, also today, 
 I tried
 compiling world and I got a weird error message about a misspelled 
 Int__xxx, I can not
 remember exactly the text, I rebooted and everything was all right again.

 Those errors are frequent on 8GB, C2D based systems and at the moment not 
 present any
 more on more modern systems with more memory as described above. This could 
 be a
 coincidence, but it is strange anyway.

 I do not exclude dying hardware, but I'd like to ask whether there is 
 something strange
 going on with FreeBSD's memory management at the moment and whether those 
 problems could
 also be triggered by some nasty bug? I never see a crash (which would also 
 indicated
 faulty hardware), I mostly realise those strange behaviour either after a 
 fresh boot or
 after I ran some memory disk i/o intensive jobs, like updating the ports tree.

 By the way, FreeBSD CURRENT suffer from a tremendous performance cut these 
 days when
 compiling world and updating the ports tree and running portmaster. On one 
 box, on which
 ports reside on a UFS partion, it takes more than 8 minutes to pass the 
 portmaster -da,
 which is quick when not compiling world. On another system on which 
 /usr/ports is
 residing on ZFS (the box has 16GB RAM!), it takes sometimes 30(!) minutes to 
 perform a
 svn update while compiling world (that is the i3-3220 with 16 GB RAM 
 system), it takes
 6 - 15 minutes when the box is relaxed and updating the ports tree the first 
 time (every
 subsequent update is much faster).

 Well, I know these reports of mine are a bit weird since I have no exact log 
 of the
 problems, but I think if there is an issue not with the hardware, I report 
 those in.

 Regards,

 oh
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: [CURRENT]: weird memory/linker problem?

2014-06-22 Thread Allan Jude
On 2014-06-22 10:56, O. Hartmann wrote:
 
 Hello.
 
 I face a strange problem on a set of CURRENT driven boxes. The systems in 
 question are
 all the same version of CURRENT (more or less, a week or so discrepancy).
 
 The boxes affected have 8 GB of RAM and are old-style Core2Duo systems.
 
 The phenomenon:
 
 Starting up the box shows the operating system working. But sometimes it is 
 impossible to
 start certain applications, like Firefox - they segfault. More disturbing is 
 the fail of
 the linker when building world. Sometimes I get strange messages like
 
 relocation truncated to fit: R_X86_64_PC32 against symbol `__error' defined 
 in .text
 
 when compiling/linking. The funny thing is: rebooting the box and doing 
 exactly the same
 very often leaves the system then operable - starting applications works, 
 compiling works!
 
 First I thought this could be a indication of a dying system and so I checked 
 the memory
 for two days non-stop without any indication of anything wrong. The boxes do 
 not have ECC
 RAM - it's Intel.
 
 I see this problem on two C2D based boxes relatively often (one E8400 two 
 core, another
 Q6600 quadcore, both systems have 8 GB RAM). This phenomenon also occured two 
 or three
 months ago on another machine with 32 GB RAM and a Core-i7 3930K, but it went 
 away (it was
 the very same error as shown above).
 
 Another system, a i3-3220 with 16 GB RAM never showed the problem although 
 that system
 build world also on a regular basis very frequent as the C2D systems do.
 
 Well, I feel a bit confused. On the first view, the problem looks weird and 
 it indicates
 a kind of memory problem - but testing the memory didn't show anything wrong. 
 
 Today windowmaker stopped starting due to a malformed command in one of 
 windowmaker's
 library. I did reboot the box and everything was all right. Then, also today, 
 I tried
 compiling world and I got a weird error message about a misspelled 
 Int__xxx, I can not
 remember exactly the text, I rebooted and everything was all right again.
 
 Those errors are frequent on 8GB, C2D based systems and at the moment not 
 present any
 more on more modern systems with more memory as described above. This could 
 be a
 coincidence, but it is strange anyway.
 
 I do not exclude dying hardware, but I'd like to ask whether there is 
 something strange
 going on with FreeBSD's memory management at the moment and whether those 
 problems could
 also be triggered by some nasty bug? I never see a crash (which would also 
 indicated
 faulty hardware), I mostly realise those strange behaviour either after a 
 fresh boot or
 after I ran some memory disk i/o intensive jobs, like updating the ports tree.
 
 By the way, FreeBSD CURRENT suffer from a tremendous performance cut these 
 days when
 compiling world and updating the ports tree and running portmaster. On one 
 box, on which
 ports reside on a UFS partion, it takes more than 8 minutes to pass the 
 portmaster -da,
 which is quick when not compiling world. On another system on which 
 /usr/ports is
 residing on ZFS (the box has 16GB RAM!), it takes sometimes 30(!) minutes to 
 perform a
 svn update while compiling world (that is the i3-3220 with 16 GB RAM 
 system), it takes
 6 - 15 minutes when the box is relaxed and updating the ports tree the first 
 time (every
 subsequent update is much faster).
 
 Well, I know these reports of mine are a bit weird since I have no exact log 
 of the
 problems, but I think if there is an issue not with the hardware, I report 
 those in.
 
 Regards,
 
 oh
 

In order to get a better benchmark for 'svn update' on the ports tree

if you 'zfs unmount pool/usr/ports' it will flush all ARC entries for
that dataset, then 'zfs mount pool/usr/ports' and run the test again.
This should give you more reproducible results

-- 
Allan Jude



signature.asc
Description: OpenPGP digital signature