On Mon, 26 Jun 2006, Marc G. Fournier wrote:
I think this is a useful activity, especially if you've already run
extensive memory testing on the box. If you haven't yet done that, I
encourage you to take a break from buildworld's and make sure the memory
tests pass. I spent several months
On Tuesday 27 June 2006 22:15, Matthew D. Fuller wrote:
On Tue, Jun 27, 2006 at 01:16:11PM + I heard the voice of
Eprha Carvajal, and lo! it spake thus:
I see no ACPI capability in the processor features
ACPI is not a CPU feature.
There is an 'ACPI' feature bit, but I think it has
At 3:34 PM +0100 6/28/06, Robert Watson wrote:
On Mon, 26 Jun 2006, Marc G. Fournier wrote:
I wish I'd run the memory test earlier, but the lesson
is clear!
Is there something that I can run *from* FreeBSD, remotely,
to do this?
Not that I know of. In the past, the discussion has been
On Wed, Jun 28, 2006 at 10:29:04AM -0400 I heard the voice of
John Baldwin, and lo! it spake thus:
There is an 'ACPI' feature bit, but I think it has to do with
preserving the TSC rate while the CPU is throttled. It's not
required for core ACPI operation.
Ah, well, I stand corrected. I
On Tue, Jun 27, 2006 at 12:33:39AM +0200, M.Hirsch wrote..
Wilko Bulte schrieb:
You really have never seen a machine used for serious business apparantly.
Depends on what you define serious business...
Yes, I am rather new to FreeBSD (2y+)
I am just trying to setup a /stable/ cluster
On Tue, 27 Jun 2006, M.Hirsch wrote:
Yes, the result may be correct.
If you're talking about single-bit error, you aren't quite correct. It isn't
may be correct, it's _definitely_ correct (in mathematical sense; that it,
correcting code proves that we have one and only one error in bit number
On Mon, 26 Jun 2006, Paul Allen wrote:
The very originating purpose of ECC was to keep the computer going in the
face of an alpha particle strike.
Alpha particles flip *single* bits.
ECC was never intended to detect crummy, failing hardware: that's a use
people have shoe-horned it into, but
On Jun 26, 2006, at 11:54 PM, M.Hirsch wrote:
Ok, sorry. Misunderstanding here.
My point was, along what has been posted here in this thread:
An ECC error should raise a kernel panic immediately, not only a
message in the log files.
Preferably not until the running transactions are
On Tue, 2006-Jun-27 00:01:08 +0300, Dmitry Pryanishnikov wrote:
On Mon, 26 Jun 2006, Robert Watson wrote:
I think this is a useful activity, especially if you've already run
extensive memory testing on the box. If you haven't yet done that, I
encourage you to take a break from buildworld's and
Argelo [EMAIL PROTECTED]
To: Marc G. Fournier [EMAIL PROTECTED]
CC: [EMAIL PROTECTED], freebsd-stable@freeBSD.org
Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ...
Date: Mon, 26 Jun 2006 18:17:05 +0200
[sni[
Yahoo . yscrappy Skype: hub.orgICQ . 7615664
: can't alloc wake memory
perphaps your bios is broker.
From: Jorn Argelo [EMAIL PROTECTED]
To: Marc G. Fournier [EMAIL PROTECTED]
CC: [EMAIL PROTECTED], freebsd-stable@freeBSD.org
Subject: Re: FreeBSD 6.x CVSUP today crashes with zero load ...
Date: Mon, 26 Jun 2006 18:17:05 +0200
[sni[
Yahoo
On Tue, Jun 27, 2006 at 01:16:11PM + I heard the voice of
Eprha Carvajal, and lo! it spake thus:
I see no ACPI capability in the processor features
ACPI is not a CPU feature.
--
Matthew Fuller (MF4839) | [EMAIL PROTECTED]
Systems/Network Administrator |
On Sun, 25 Jun 2006, Pete French wrote:
'k, I'm starting to get the impression that FreeBSD 6.x is evil ... at
least as far as Dual-PIII servers are concerned ... on a machine that,
I can't comment on your other problems - but I have a dual PIII server and
say a 30% performance increase
On Mon, 26 Jun 2006, Robert Watson wrote:
On Sun, 25 Jun 2006, Pete French wrote:
'k, I'm starting to get the impression that FreeBSD 6.x is evil ... at
least as far as Dual-PIII servers are concerned ... on a machine that,
I can't comment on your other problems - but I have a dual PIII
On Mon, 26 Jun 2006, Marc G. Fournier wrote:
I'm also running 6.x on several dual-PIII without problems. An issue local
to Marc's setup is definitely indicated. Given the failure mode, I would
be worried about a potential hardware issue, although subtle hardware and
subtle system software
On Mon, 26 Jun 2006, Robert Watson wrote:
On Mon, 26 Jun 2006, Marc G. Fournier wrote:
I'm also running 6.x on several dual-PIII without problems. An issue
local to Marc's setup is definitely indicated. Given the failure mode, I
would be worried about a potential hardware issue, although
Robert Watson wrote:
On Sun, 25 Jun 2006, Pete French wrote:
'k, I'm starting to get the impression that FreeBSD 6.x is evil ...
at least as far as Dual-PIII servers are concerned ... on a machine
that,
I can't comment on your other problems - but I have a dual PIII server
and say a 30%
[sni[
Yahoo . yscrappy Skype: hub.orgICQ . 7615664
Copyright (c) 1992-2006 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the
On 6/26/06, Jorn Argelo [EMAIL PROTECTED] wrote:
[sni[
Yahoo . yscrappy Skype: hub.orgICQ . 7615664
Copyright (c) 1992-2006 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989,
Hello!
On Mon, 26 Jun 2006, Robert Watson wrote:
I think this is a useful activity, especially if you've already run extensive
memory testing on the box. If you haven't yet done that, I encourage you to
take a break from buildworld's and make sure the memory tests pass. I spent
several
ECC is a way to mask broken hardware. I rather have my hardware fail
directly when it does first, so I can replace it _immediately_
What's your hardware good for if it passes a test, but fails in
production?
ECC is totally overrated.
(sorry, couldn't resist...)
M.
On Mon, Jun 26, 2006 at 11:21:22PM +0200, M.Hirsch wrote..
ECC is a way to mask broken hardware. I rather have my hardware fail
directly when it does first, so I can replace it _immediately_
What's your hardware good for if it passes a test, but fails in
production?
ECC is totally
Nope,
I'd like my bank data to be stored on a system that does ECC, no question.
But please, on hard disk level (RAID; that is _permanent_), not in the
RAM of a single node.
If memory gets corrupted, please, raise a kernel panic... Even if
there's ECC in place.
Counter question:
Would you
On Mon, Jun 26, 2006 at 11:37:18PM +0200, M.Hirsch wrote..
Nope,
I'd like my bank data to be stored on a system that does ECC, no question.
But please, on hard disk level (RAID; that is _permanent_), not in the
RAM of a single node.
If memory gets corrupted, please, raise a kernel
snip
.. So the logs are there, all that's required is a utility to read them
and, optionally, alert the administrator to the event,
No, I think a panic _should_ occur, even if there was a correctable
error. Not when there's no other option left.
Maybe make it optional via a kernel option.
On Tue, Jun 27, 2006 at 12:11:03AM +0200, M.Hirsch wrote..
snip
.. So the logs are there, all that's required is a utility to read them
and, optionally, alert the administrator to the event,
No, I think a panic _should_ occur, even if there was a correctable
error. Not when there's
Hello!
On Mon, 26 Jun 2006, M.Hirsch wrote:
ECC is a way to mask broken hardware. I rather have my hardware fail directly
when it does first, so I can replace it _immediately_
You got it backwards. If your data has any value to you, then you don't want
to miss any single-error bit in it, do
Of course not. You only panic once you have no other options left.
Proper hardware with ECC give you these options. I am not talking
consumer grade crap here of course.
I agree that no panic should occur if the error was correctable and it
should when it isn't.
However, *real* equipment
Ok, sorry. Misunderstanding here.
My point was, along what has been posted here in this thread:
An ECC error should raise a kernel panic immediately, not only a
message in the log files.
Any hardware showing ECC errors should be replaced asap..
Make them lazy admins do what they're getting paid
On Mon, Jun 26, 2006 at 11:54:53PM +0200, M.Hirsch wrote..
Ok, sorry. Misunderstanding here.
My point was, along what has been posted here in this thread:
An ECC error should raise a kernel panic immediately, not only a
message in the log files.
Any hardware showing ECC errors should be
Wilko Bulte schrieb:
You really have never seen a machine used for serious business apparantly.
Depends on what you define serious business...
Yes, I am rather new to FreeBSD (2y+)
I am just trying to setup a /stable/ cluster of six machines right now.
For over a week straight.
4.11 works
Dmitry Pryanishnikov schrieb:
Hello!
On Mon, 26 Jun 2006, M.Hirsch wrote:
ECC is a way to mask broken hardware. I rather have my hardware fail
directly when it does first, so I can replace it _immediately_
You got it backwards. If your data has any value to you, then you
don't want
to
Ok...
Does the standard fs, UFS2, do extra sanity checks, then?
Sorry, replying to myself...
No, this does not matter.
If the OS thinks the data is ok, UFS will write OK data...
So, let me rephrase this:
How can I make sure there is no broken hardware in my cluster?
I am not looking for
On Tue, 27 Jun 2006, M.Hirsch wrote:
On Mon, 26 Jun 2006, M.Hirsch wrote:
ECC is a way to mask broken hardware. I rather have my hardware fail
directly when it does first, so I can replace it _immediately_
You got it backwards. If your data has any value to you, then you don't
Nope, I am
M.Hirsch wrote:
Ok...
Does the standard fs, UFS2, do extra sanity checks, then?
My advice would be dont feed the troll.
Steve
This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is
M.Hirsch wrote:
Any hardware showing ECC errors should be replaced asap..
No. ALL memory will sooner or later show single bit error.
Several years ago I was checking this during my work at Ericsson.
There was a discussion if ECC should be present in the GSM-base-stations
or not. I had a
Dmitry Pryanishnikov schrieb:
When you wrote ECC is a way to mask broken hardware, you were plain
wrong.
If you're using hardware w/o ECC, it just can't tell whether error
present
or absent. So ECC _is_ the way to detect (not mask) broken hardware.
Ok, thanks. I think I understand the
Wow, Steven,
you've been really helpful here...
M.
Steven Hartland schrieb:
My advice would be dont feed the troll.
Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe,
So, unlike my supplier claims, ECC is not supposed to help against
hardware failures.
But it is the way to detect them, right?
Yes!!! Absolutely.
-pete.
___
freebsd-stable@freebsd.org mailing list
I am not looking for workarounds, like ECC. I want the box to break
immediately once any single component goes wrong...
Uh, that *is* what ECC does (or can do). Without ECC your broken hardware
continues to run un-noticed. With ECC you can either make it break
immediatley, or log an error or
On Tue, 27 Jun 2006, M.Hirsch wrote:
If you're using hardware w/o ECC, it just can't tell whether error present
or absent. So ECC _is_ the way to detect (not mask) broken hardware.
Ok, thanks. I think I understand the meaning of ECC now.
So, unlike my supplier claims, ECC is not supposed to
So what do I need to do to make the box panic() on an ECC error?
Is there a kernel parameter, sysctl, or what else?
Thanks,
M.
Pete French schrieb:
I am not looking for workarounds, like ECC. I want the box to break
immediately once any single component goes wrong...
Uh, that *is* what
Yes, the result may be correct.
'Do not take ECC for equals additional security'
So I understand what's ECC good for, other than the usual marketing talk.
But, in FreeBSD, the function is a result of hardware-level correction.
Something that only kicks in in _real_ _serious_ situations.
I just
Wilko Bulte wrote:
Proper hardware will log the ECC errors, a proper OS tailored to that
hardware will log and notify the sysadmins.
So the question is.. is FreeBSD one of those operating systems? What
features/software is present if any, to report ECC problems?
From M.Hirsch [EMAIL PROTECTED], Tue, Jun 27, 2006 at 01:38:35AM +0200:
Sticks don't just break on a single bit. From my experience, a stick
that's got any problems at all, will cause even more trouble soon...
If a hardware problem isn't worth panick'ing, what else is?
(don't answer this one
On Tue, Jun 27, 2006 at 01:38:35AM +0200, M.Hirsch wrote:
I just would like you (not specifically you, Dmitry) to aknowledge that
broken RAM is worth a panic in standard situations- if I may call it
like that.
Well, ideally, if broken ram could be isolated with something
like IBM's chipkill
'k, I'm starting to get the impression that FreeBSD 6.x is evil ... at
least as far as Dual-PIII servers are concerned ... on a machine that,
I can't comment on your other problems - but I have a dual PIII server
and say a 30% performance increase when moving to 6.x over 5.x ... and
it's been
'k, looks like I'm going to have to back this out ... just upgraded
another server to 6.x, CVSup latest -STABLE, built, installed, rebooted
... up fine ...
Running a single 'rsync' to copy files from another server over, it has
crashed twice in a row so far ...
I'm enabling dumpdev right
On Sat, 24 Jun 2006, Marc G. Fournier wrote:
'k, looks like I'm going to have to back this out ... just upgraded another
server to 6.x, CVSup latest -STABLE, built, installed, rebooted ... up fine
...
Running a single 'rsync' to copy files from another server over, it has
crashed twice in
On Sat, 24 Jun 2006, Marc G. Fournier wrote:
On Sat, 24 Jun 2006, Marc G. Fournier wrote:
'k, looks like I'm going to have to back this out ... just upgraded another
server to 6.x, CVSup latest -STABLE, built, installed, rebooted ... up fine
...
Running a single 'rsync' to copy files
Marc G. Fournier wrote:
On Sat, 24 Jun 2006, Marc G. Fournier wrote:
'k, looks like I'm going to have to back this out ... just upgraded
another server to 6.x, CVSup latest -STABLE, built, installed,
rebooted ... up fine ...
Running a single 'rsync' to copy files from another server over,
On Sat, 24 Jun 2006, Nate Lawson wrote:
Marc G. Fournier wrote:
On Sat, 24 Jun 2006, Marc G. Fournier wrote:
'k, looks like I'm going to have to back this out ... just upgraded
another server to 6.x, CVSup latest -STABLE, built, installed, rebooted
... up fine ...
Running a single
52 matches
Mail list logo