date:20050721

Re: IrDA question

2005-07-21 Thread Daniel O'Connor

On Thursday 21 July 2005 16:42, caleb wrote:
 The port is definitely sio1/cuaa1. I tried to run ircomm while irs was
 still running and got;

Why?
No offence but it's always worth trying something different when stuff isn't 
working :)

 cannot open pty

 I killed irs and used;

 ircomm -d /dev/cuaa1 -y /dev/ptypv -v 2 and I get the following output;

Yes, only one of them will be able to run at any one time.

 localhost# ircomm -d /dev/cuaa1 -y /dev/ptypv -v 2
 query completed
 query completed
 query completed
 query completed
 query completed
 query completed
 query completed
 query completed
 query completed
 query completed
 No peer station found

 The mobile phone had Ir switched on and I have tried running the command
 from various distances.

Hmm, any way you can test it besides in FreeBSD?

-- 
Daniel O'Connor software and network engineer
for Genesis Software - http://www.gsoft.com.au
The nice thing about standards is that there
are so many of them to choose from.
  -- Andrew Tanenbaum
GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C


pgpapQ10zIHN6.pgp
Description: PGP signature

Re: Serious issue with serial console in 5.4

2005-07-21 Thread Eirik Øverby



On Jul 21, 2005, at 7:00 AM, Kris Kennaway wrote:


On Mon, Jul 18, 2005 at 11:58:54AM +0200, Eirik ?verby wrote:


Hi,

I reported this before, but I am very surprised that it is still the
case:

(This is from the last time it happened; this time the box rebooted
and cleared the serial console before I had time to cut/paste it.


Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 00
fault virtual address   = 0x1c
fault code  = supervisor write, page not present
instruction pointer = 0x8:0xc0620b5f
stack pointer   = 0x10:0xdadbd988
frame pointer   = 0x10:0xdadbd994
code segment= base 0x0, limit 0xf, type 0x1b
  = DPL 0, pres 1, def32 1, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 51999 (getty)
trap number = 12
panic: page fault
cpuid = 1
boot() called on cpu#0
Uptime: 66d11h24m50s



The above panic will show up occasionally when logging out from a
serial console (i.e. ctrl-D, logout, exit, whatever). This is
EXTREMELY BAD, as it will crash an otherwise perfectly healthy box at
random - and renders the serial console useless.

Robert Watson confirmed this to be an issue on the 10th of April.

Anyone??



You might have to wait until 6.0-R since fixing it seems to require
infrastructure changes that cannot easily be backported to 5.x.


With all due respect - if this is (and I'm assuming it is, because it  
happens on all the servers I'm serial-controlling) an omnipresent  
problem on 5.x, I daresay it should warrant some more attention.  
Having unsafe serial terminal support that can bring down your system  
like that defies much of the point of having serial terminal support  
in the first place.


However, since I seem to be the only one who has noticed this,  
perhaps I'm the last person on earth to routinely use serial terminal  
switches instead of KVM switches to do my admin work?


/Eirik



Kris



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Serious issue with serial console in 5.4

2005-07-21 Thread Hans Lambermont

Eirik ?verby wrote:

...
 However, since I seem to be the only one who has noticed this,
 perhaps I'm the last person on earth to routinely use serial terminal
 switches instead of KVM switches to do my admin work?

No, I recently installed 3 5.4-R production machines that do not have
video cards, so I'm using the serial console a lot. I didn't see any of
the horror you found. (yet, fingers crossed ;-)

-- Hans Lambermont
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Serious issue with serial console in 5.4

2005-07-21 Thread Marc Olzheim

On Thu, Jul 21, 2005 at 10:56:54AM +0200, Eirik verby wrote:
 Fatal trap 12: page fault while in kernel mode
 cpuid = 1; apic id = 00
 fault virtual address   = 0x1c
 fault code  = supervisor write, page not present
 instruction pointer = 0x8:0xc0620b5f
 stack pointer   = 0x10:0xdadbd988
 frame pointer   = 0x10:0xdadbd994
 code segment= base 0x0, limit 0xf, type 0x1b
   = DPL 0, pres 1, def32 1, gran 1
 processor eflags= interrupt enabled, resume, IOPL = 0
 current process = 51999 (getty)
 trap number = 12
 panic: page fault
 cpuid = 1
 boot() called on cpu#0
 Uptime: 66d11h24m50s
 
 
 The above panic will show up occasionally when logging out from a
 serial console (i.e. ctrl-D, logout, exit, whatever). This is
 EXTREMELY BAD, as it will crash an otherwise perfectly healthy box at
 random - and renders the serial console useless.
 
 Robert Watson confirmed this to be an issue on the 10th of April.
 
 Anyone??
 
 
 You might have to wait until 6.0-R since fixing it seems to require
 infrastructure changes that cannot easily be backported to 5.x.
 
 With all due respect - if this is (and I'm assuming it is, because it  
 happens on all the servers I'm serial-controlling) an omnipresent  
 problem on 5.x, I daresay it should warrant some more attention.  
 Having unsafe serial terminal support that can bring down your system  
 like that defies much of the point of having serial terminal support  
 in the first place.
 
 However, since I seem to be the only one who has noticed this,  
 perhaps I'm the last person on earth to routinely use serial terminal  
 switches instead of KVM switches to do my admin work?

Nope, I use them a lot as well, but only if there are problems. Why
would you login on a serial console if there's ssh ;-)

So that would explain why I haven't seen the issue yet.
Do you have a debugger trace ? It seems very similar to my last
remaining issue
(http://www.stack.nl/~marcolz/FreeBSD/showstoppers.html), namely

http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/83375

i.e. someting going wrong in pty cloning and cleanup...

Marc


pgpseQJ8nE6Gf.pgp
Description: PGP signature

background fsck, softupdates inconsistent state on disk

2005-07-21 Thread Marc Olzheim

Hi.

Having enough opportunities to do crash recovery with kern/83375 open
and some of my services not yet moved back to FreeBSD 4, I noticed that
often it crashes just after (or perhaps during) mirroring of a directory
tree. The mirroring involves creating a directory with in it 80
subdirectories in it.

Now when the machine panics on a 'screen' again, background fsck fails
to properly check the filesystem and reports so in /var/log/messages.
What I see on that partition is the main directory that should have
contained the 80 subdirs, but now it has a link count of 0 and so
doesn't even contain a . or .. , let alone the 80 directories that
should have been there.

The only thing a manual fsck can do after that is unlink the
unreferenced inodes and clear up the mess...

Shouldn't this be impossible without power loss ? Or is it inherent to
SMP that the machine can crash on a process on CPU #0 while CPU #1 is
updating disk structures ?


Anyway, as soon as the migration of production services suffering from
kern/83375 back to 4.x is done I should have a 5.x test machine ready to
crash whenever people want, so I can get debug output out of it.

If anyone could tell me how to get it and what they need, I'd be happy
to provide it.

Marc


pgpLoVvMJavOU.pgp
Description: PGP signature

Re: Quality of FreeBSD

2005-07-21 Thread Marc Olzheim

On Wed, Jul 20, 2005 at 08:43:33PM -0700, Alexey Yakimovich wrote:
 My advice to FreeBSD release engineering team: 
 - do more testing;
 - have it tested with hardware what was published in Hardware Notes;
 - do not release it for production if it is not in production quality;
 - reread again what was written by yourself regarding 4.4 release
 quality.
 I wish to say more.
 
 This mail was written because I like FreeBSD and I want to continue
 using it. And wouldn't mind to wait longer for real production quality
 releases instead of start using something else. And please, I know, it's
 open source project.
 
 Best regards,
 Real FreeBSD fan

Thank you for expressing my exact same sentiments. I'm still a huge
FreeBSD fan and switching to anything else (well, perhaps DragonFly)
seems out of the question, but my faith is being tested a lot lately.
Having switched some of my companies production machines to 5.4, since
it was (in my eyes falsely) called a 'production release', FreeBSD's
reputation within the less technical parts of the company has taken a
large dent. Luckily they know as well that there's still no comparison
to FreeBSD 4.x; top of my ruptime looks like:

up 1124+12:15, 1 user,   load 2.14, 2.10, 2.02
up 1095+06:22,11 users,  load 2.01, 2.04, 2.02
up 1095+05:31, 5 users,  load 2.38, 2.31, 2.24
up 1095+05:06, 2 users,  load 1.07, 1.08, 1.01
up 1095+04:46, 0 users,  load 1.09, 1.08, 1.01
up 1087+21:04, 1 user,   load 1.01, 1.00, 1.00

but then again, I'd really like to use the new 5.x features in a stable
environment...

Marc
also a Real FreeBSD fan :-)


pgpTowLy8qDtO.pgp
Description: PGP signature

Re: Serious issue with serial console in 5.4

2005-07-21 Thread Robert Watson



On Thu, 21 Jul 2005, Eirik Øverby wrote:


The above panic will show up occasionally when logging out from a
serial console (i.e. ctrl-D, logout, exit, whatever). This is
EXTREMELY BAD, as it will crash an otherwise perfectly healthy box at
random - and renders the serial console useless.

Robert Watson confirmed this to be an issue on the 10th of April.


You might have to wait until 6.0-R since fixing it seems to require 
infrastructure changes that cannot easily be backported to 5.x.


With all due respect - if this is (and I'm assuming it is, because it 
happens on all the servers I'm serial-controlling) an omnipresent 
problem on 5.x, I daresay it should warrant some more attention. Having 
unsafe serial terminal support that can bring down your system like that 
defies much of the point of having serial terminal support in the first 
place.


However, since I seem to be the only one who has noticed this, perhaps 
I'm the last person on earth to routinely use serial terminal switches 
instead of KVM switches to do my admin work?


The concern about the 5.x backport is that it will break parts of the 
device driver ABI, and is a significant change that involves a lot of 
risk.


Regarding the general prevalence of the problem -- I've seen a small 
number of people reporting it's a big problem.  Since I know of a great 
many people running with serial consoles (other than a workstation, I 
never run FreeBSD boxes any other way), this leads me to believe it's 
something that shows up in fairly specific conditions -- perhaps relating 
to precise timing of a race condition.  This means that if we introduce a 
generally destabilizing change, it may impact more people than the problem 
as it exists (a nasty trade-off).


I've only seen the issue when logging out of a serial console session, and 
had previously hypothesized that it had to do with the simultaneous timing 
of a console message from syslog and the opening/closing of the console's 
tty due to logging out and getty restarting, resulting in a reference 
count improperly hitting zero.


I thought Doug White had come up with a work-around patch that prevented 
the reference count from being allowed to hit 0 for the console by 
artificially elevating it, which would prevent the panic, so either (a) 
the work around wasn't committed, or (b) it didn't work.


I can attempt to take another look at this problem in a week or so, but 
have a number of things I need to finish up for FreeBSD 6.0 before then 
that will be occupying my time.


Robert N M Watson___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: READ_DMA, WRITE_DMA errors

2005-07-21 Thread Robert Watson



On Wed, 20 Jul 2005, Steve wrote:

I've found tons of emails, news messages, listserv messages, and even 
some bug reports of this seemingly common error.


So, I had been running 5.2 on a server, and, updated to 5.3. Got the 
READ_DMA and WRITE_DMA error and retries. So, figuring it might be a bad 
update, took a new drive. put it in, loaded 5.4 for grins, and, same 
issue, lots of these errors, eventually destroying the FS. Played around 
with various settings, no avail. So, took it back, got different box, 
everything new. Same problem, new install of 5.4


6.0 contains a significant re-write and update of the ATA driver, and 
corrects a number of known problems with timeouts and reliability.  This 
rewrite is available as patches against 5.x, but has not been committed 
because ATA is a very sensitive thing (lots of very diverse and very 
broken hardware), and has had insufficient testing.  If you have test 
hardware available that's not in production, it would be quite helpful if 
you could install 6.0-BETA2, once that comes out in the next week or so, 
and see if the specific ATA problems you're experiencing occur there. 
It's not impossible that the new ATA code will be merged to 5.x, but I 
think we cannot do that until it has seen a lot more exposure.  If you 
search back through the mailing archives, you should be able to find posts 
from Soren regarding the new ATA patches, if you want to give them a try 
on 5.x.


Robert N M Watson
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Quality of FreeBSD

2005-07-21 Thread Robert Watson



On Wed, 20 Jul 2005, Alexey Yakimovich wrote:


My advice to FreeBSD release engineering team:
- do more testing;
- have it tested with hardware what was published in Hardware Notes;
- do not release it for production if it is not in production quality;
- reread again what was written by yourself regarding 4.4 release
quality.
I wish to say more.

This mail was written because I like FreeBSD and I want to continue 
using it. And wouldn't mind to wait longer for real production quality 
releases instead of start using something else. And please, I know, it's 
open source project.


While I agree more testing always helps, and that there are some fairly 
concete ways we can work to improve testing, there are also some practical 
realities to how software testing happens, especially for complex software 
products running on diverse hardware.  I have a question for you though:


  Have you tried, and do you plan to try, our 6.0 test releases before
  6.0-RELEASE goes out the door?  Specifically, on the hardware you know
  you're having problems with 5.4 on?

The way hardware gets tested is that people who have the hardware run the 
software on it under a variety of loads, and see if it works.  Since a 
volunteer project of a couple of hundred developers can't buy all known 
past and future hardware, we have to rely on hardware vendors, software 
resellers, and FreeBSD users to do some of the testing.  In order for that 
testing to affect a release, it must happen before the release goes out 
the door, rather than afterwards.  And it has to happen sufficiently in 
advance of the release that someone can do something about the results of 
failed testing.  If hardware isn't tested before the releasee, then 
inevitably people with that untested hardware are more likely to 
experience problems.  This means that the best way to help us support your 
hardware is to run our test releases with useful workloads, and then 
provide feedback if/when they don't work.  I realize you're providing 
feedback now on the 5.x branch, but what you may or may not know is that 
in the 6.x branch, we have a significant update to the ATA code that may 
get merged to 5.x, if it proves to be as much better as we hope.  This 
means that we need you to test the future code, not the current code, in 
order to fix the problems you are experiencing.


90% of useful FreeBSD testing happens when large FreeBSD consumers take 
release of FreeBSD and deploy them in their testbeds and real-world 
environments, and find the bugs through the application of high levels of 
load and obscure hardware configurations.  This is why later FreeBSD 
releases along a -STABLE branch are typically much more stable than 
earlier ones -- the code has run on millions of machines for untold 
amounts of load, instead of the thousand or so with a very selected load 
it's likely to run on during development.  This is how all software 
vendors work, really -- be it Microsoft, or Apple, old-style UNIX vendors, 
or any of the Linux vendors.  Some set of users sits on the bleeding edge 
and shakes out the early problems, and then the rest of the user base 
suffers through the later versions to shake out more subtle problems that 
gradually get resolved.


The FreeBSD Project is working on moving towards a more formal testing 
regimen.  This change will help shake out software bugs relating to 
workload -- i.e., IP stack bugs, file system bugs, etc.  But the chances 
of it having a significant impact on broad hardware testing is very low.


So if you have non-production instances of your production hardware, and 
can reproduce the workloads of your production environment on that 
hardware, what we would love you to do is run 6-CURRENT on it and tell us 
if that works better.  If it does, then it's a question of back-porting 
the functionality (if possible) to 5.x.  If it doesn't, then we can fix 
the problem in the active development tree, then merge as makes sense. 
4.x became a great success after a quite shaking 3.x release branch, and 
after some bumps early in 4.x.  It got there because of a lot of testing 
and improvement resulting from production experience.  If you didn't have 
problems with 3.x and 4.x, it's because someone else got there first.


The reason I suggest waiting for BETA2 is that BETA2 will have cleaned up 
support for running 5.x applications.  Specifically, there are one or two 
system calls that have changed in 6.x, and require COMPAT_FREEBSD5 to be 
compiled into the kernel, which it wasn't in BETA1.  Likewise, a number of 
library version bumps and compatibility pieces will be in BETA2.  This 
will make it easier to test 5.x application workloads on a 6.x install.


We take the concerns you've expressed seriously, and you should know that 
every FreeBSD developer I've talked with in the last few years has been 
talking about how to improve 5.x stability.  The challenge has been to 
integrate the agressive feature set improvement in 5.x with

Re: Quality of FreeBSD

2005-07-21 Thread Daniel O'Connor

On Thursday 21 July 2005 19:27, Marc Olzheim wrote:
 Thank you for expressing my exact same sentiments. I'm still a huge
 FreeBSD fan and switching to anything else (well, perhaps DragonFly)
 seems out of the question, but my faith is being tested a lot lately.
 Having switched some of my companies production machines to 5.4, since
 it was (in my eyes falsely) called a 'production release', FreeBSD's
 reputation within the less technical parts of the company has taken a
 large dent. Luckily they know as well that there's still no comparison
 to FreeBSD 4.x; top of my ruptime looks like:

I think the best way to rectify this is to test RC candidates on YOUR 
hardware.. This finds the bugs you need fixed at a time when people are very 
receptive to fixing them.

It's not realistic for the release engineer to test on a lot of hardware as 
they are very busy doing other things.

-- 
Daniel O'Connor software and network engineer
for Genesis Software - http://www.gsoft.com.au
The nice thing about standards is that there
are so many of them to choose from.
  -- Andrew Tanenbaum
GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C


pgpFtF5tEMUL6.pgp
Description: PGP signature

Re: Serious issue with serial console in 5.4

2005-07-21 Thread Eirik Øverby



On Jul 21, 2005, at 12:16 PM, Robert Watson wrote:



On Thu, 21 Jul 2005, Eirik Øverby wrote:



The above panic will show up occasionally when logging out from a
serial console (i.e. ctrl-D, logout, exit, whatever). This is
EXTREMELY BAD, as it will crash an otherwise perfectly healthy  
box at

random - and renders the serial console useless.
Robert Watson confirmed this to be an issue on the 10th of April.

You might have to wait until 6.0-R since fixing it seems to  
require infrastructure changes that cannot easily be backported  
to 5.x.




With all due respect - if this is (and I'm assuming it is, because  
it happens on all the servers I'm serial-controlling) an  
omnipresent problem on 5.x, I daresay it should warrant some more  
attention. Having unsafe serial terminal support that can bring  
down your system like that defies much of the point of having  
serial terminal support in the first place.


However, since I seem to be the only one who has noticed this,  
perhaps I'm the last person on earth to routinely use serial  
terminal switches instead of KVM switches to do my admin work?




The concern about the 5.x backport is that it will break parts of  
the device driver ABI, and is a significant change that involves a  
lot of risk.


Regarding the general prevalence of the problem -- I've seen a  
small number of people reporting it's a big problem.  Since I know  
of a great many people running with serial consoles (other than a  
workstation, I never run FreeBSD boxes any other way), this leads  
me to believe it's something that shows up in fairly specific  
conditions -- perhaps relating to precise timing of a race  
condition.  This means that if we introduce a generally  
destabilizing change, it may impact more people than the problem as  
it exists (a nasty trade-off).


I've only seen the issue when logging out of a serial console  
session, and had previously hypothesized that it had to do with the  
simultaneous timing of a console message from syslog and the  
opening/closing of the console's tty due to logging out and getty  
restarting, resulting in a reference count improperly hitting zero.


I did indeed make some changes to my syslog configuration after  
getting the serials online. Your theory might not be entirely off.
Let me know if I should post my syslog.conf file or anything else  
here or elsewhere...


Thanks,
/Eirik


I thought Doug White had come up with a work-around patch that  
prevented the reference count from being allowed to hit 0 for the  
console by artificially elevating it, which would prevent the  
panic, so either (a) the work around wasn't committed, or (b) it  
didn't work.


I can attempt to take another look at this problem in a week or so,  
but have a number of things I need to finish up for FreeBSD 6.0  
before then that will be occupying my time.


Robert N M Watson


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Serious issue with serial console in 5.4

2005-07-21 Thread Robert Watson



On Thu, 21 Jul 2005, Eirik Øverby wrote:

I've only seen the issue when logging out of a serial console session, 
and had previously hypothesized that it had to do with the simultaneous 
timing of a console message from syslog and the opening/closing of the 
console's tty due to logging out and getty restarting, resulting in a 
reference count improperly hitting zero.


I did indeed make some changes to my syslog configuration after getting 
the serials online. Your theory might not be entirely off. Let me know 
if I should post my syslog.conf file or anything else here or 
elsewhere...


Since you appear to be able to reliably reproduce the problem (whereas I 
was able to reproduce it only after several hours of quite active serial 
console work), it would be quite interesting to answer the following 
question:


  If you cause syslogd not to send any output to /dev/console, does the
  problem go away?

Thanks,

Robert N M Watson___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Quality of FreeBSD

2005-07-21 Thread Marc Olzheim

On Thu, Jul 21, 2005 at 08:29:47PM +0930, Daniel O'Connor wrote:
 I think the best way to rectify this is to test RC candidates on YOUR 
 hardware.. This finds the bugs you need fixed at a time when people are very 
 receptive to fixing them.
 
 It's not realistic for the release engineer to test on a lot of hardware as 
 they are very busy doing other things.

Of course. That's why we test stuff first, before upgrading. However,
real world situations are always different from test setups and the kind
of race conditions that we're talking about that we were troubled by
didn't show up until we had it in production... But that's why I always
try to supply code to trigger the bug in my PRs after finding it, so
that it can be tested for a next release.

It's just that this will probably not be fixed in 5.x is not the thing
I like to hear. But as said, there's always 4.x, which is the most
stable OS I've seen in my open source UN*X life. Too bad that with 4.x I
get responses on libc_r's uthread like libc_r is dead, please use KSE,
which don't help anyway. No need to burn your ships behind you just yet.
(Or whatever the expression is in English)

Marc


pgp3wkC3E2UEz.pgp
Description: PGP signature

Re: Quality of FreeBSD

2005-07-21 Thread Marc Olzheim

Robert,

First, thank you for your clear reply.

 90% of useful FreeBSD testing happens when large FreeBSD consumers take 
 release of FreeBSD and deploy them in their testbeds and real-world 
 environments, and find the bugs through the application of high levels of 
 load and obscure hardware configurations.  This is why later FreeBSD 
 releases along a -STABLE branch are typically much more stable than 
 earlier ones -- the code has run on millions of machines for untold 
 amounts of load, instead of the thousand or so with a very selected load 
 it's likely to run on during development.  This is how all software 
 vendors work, really -- be it Microsoft, or Apple, old-style UNIX vendors, 
 or any of the Linux vendors.  Some set of users sits on the bleeding edge 
 and shakes out the early problems, and then the rest of the user base 
 suffers through the later versions to shake out more subtle problems that 
 gradually get resolved.

Indeed. That's why my company started taking FreeBSD 5.3 in use for
production servers when it was out. Since then numerous bugs were fixed,
some of which reported by us. Now that we're X bug fixes later in time
and started to get a good feeling about the number of open problems, it
is extremely annoying to hear the This will (probably) not be fixed in
5.x statements. That conflicts with 'gradually get resolved'. What do
you recommend larger consumers to do ? Keep using FreeBSD 4 and start
testing FreeBSD 6.x, dropping 5.x all together ?

I know FreeBSD 5 was a strange exception in the relase scheduling and
that a lot has been learned from it for the future and I'm certainly not
unthankful for all the work that's done, but I'd like a clear answer on
what to do now in regard to taking FreeBSD 5 into 'real' production...

Marc


pgptuc7dzWcTn.pgp
Description: PGP signature

Re: Quality of FreeBSD

2005-07-21 Thread Nicklas B. Westerlund

Marc Olzheim wrote:


but I'd like a clear answer on
what to do now in regard to taking FreeBSD 5 into 'real' production...

  


I'd have to second this request.  We rely heavily on the stability and
performance of FreeBSD in our business.
We've only had the occasional stupid hang on our RELENG_5_4 systems, but
I've deployed both RELENG_6
and -CURRENT in our labs now to see what kind of results I can get out
of it.

Although I havn't seen any major problems on our servers, all using u320
scsi and smp - I don't feel as secure about my choice of upgrading to 5.x.
We still have some 4.x servers in production, and judging by how this is
evolving, I think I'll rather skip the 5-branch for those machines and
keep testing 6.x.
The last thing we need is servers with problems to disturb our sleep at
night.

Overall I think we're a few of the lucky ones, as alot of people seem to
have huge problems which we havn't encountered, again that is because of
different architectures and such.

Nick.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

RELENG_6 scroll wheel

2005-07-21 Thread Marian Hettwer

Hej All,

I upgraded to RELENG_6 to help testing.
Everything went smooth so far, but my scroll wheel in X isn't working
anymore. I didn't changed anything regarding the configuration from
FreeBSD RELENG_5 to RELENG_6 ...

some details:
[EMAIL PROTECTED] ~ $ uname -a
FreeBSD beastie.mobile.rz 6.0-BETA1 FreeBSD 6.0-BETA1 #0: Fri Jul 15
17:00:59 CEST 2005 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/GENERIC
 i386
[EMAIL PROTECTED] ~ $ dmesg | grep ums
ums0: Logitech USB-PS/2 Optical Mouse, rev 2.00/11.10, addr 3, iclass 3/1
ums0: 3 buttons and Z dir.
ums0: Logitech USB-PS/2 Optical Mouse, rev 2.00/11.10, addr 2, iclass 3/1
ums0: 3 buttons and Z dir.
ums0: Logitech USB-PS/2 Optical Mouse, rev 2.00/11.10, addr 2, iclass 3/1
ums0: 3 buttons and Z dir.

from /etc/X11/xorg.conf
Section InputDevice
Identifier  Mouse0
Driver  mouse
Option  Protocol auto
Option  Device /dev/sysmouse
Option  ZAxisMapping 4 5
EndSection

[EMAIL PROTECTED] ~ $ ps ax | grep moused
 1060  ??  Ss 0:52,38 /usr/sbin/moused -z 4 -p /dev/ums0 -t auto -I
/var/run/moused.ums0.pid

I'm running xorg-6.8.2

I didn't recompiled my ports, but I guess this shouldn't be the problem, hm ?

Any ideas anyone ?

best regards,
Marian
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Serious issue with serial console in 5.4

2005-07-21 Thread Eirik Øverby



On Jul 21, 2005, at 1:04 PM, Robert Watson wrote:



On Thu, 21 Jul 2005, Eirik Øverby wrote:


I've only seen the issue when logging out of a serial console  
session, and had previously hypothesized that it had to do with  
the simultaneous timing of a console message from syslog and the  
opening/closing of the console's tty due to logging out and getty  
restarting, resulting in a reference count improperly hitting zero.




I did indeed make some changes to my syslog configuration after  
getting the serials online. Your theory might not be entirely off.  
Let me know if I should post my syslog.conf file or anything else  
here or elsewhere...




Since you appear to be able to reliably reproduce the problem  
(whereas I was able to reproduce it only after several hours of  
quite active serial console work), it would be quite interesting to  
answer the following question:


  If you cause syslogd not to send any output to /dev/console, does  
the

  problem go away?


I'm afraid to say it doesn't

/Eirik




Thanks,

Robert N M Watson


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Quality of FreeBSD

2005-07-21 Thread Robert Watson



On Thu, 21 Jul 2005, Marc Olzheim wrote:

Indeed. That's why my company started taking FreeBSD 5.3 in use for 
production servers when it was out. Since then numerous bugs were fixed, 
some of which reported by us. Now that we're X bug fixes later in time 
and started to get a good feeling about the number of open problems, it 
is extremely annoying to hear the This will (probably) not be fixed in 
5.x statements. That conflicts with 'gradually get resolved'. What do 
you recommend larger consumers to do ? Keep using FreeBSD 4 and start 
testing FreeBSD 6.x, dropping 5.x all together ?


I know FreeBSD 5 was a strange exception in the relase scheduling and 
that a lot has been learned from it for the future and I'm certainly not 
unthankful for all the work that's done, but I'd like a clear answer on 
what to do now in regard to taking FreeBSD 5 into 'real' production...


Marc,

I should start out by saying I appreciate your clear and concise bug 
reports, and the list of your company's show-stopper 5.x bugs has made the 
rounds among FreeBSD developers.  I'm happy that at least one of the 
issues on the list was fixed by me. :-)  As you probably saw yesterday, 
I've started bugging Poul-Henning to look at the pty problem you're 
experiencing, and will get that on our 6.0 release show-stopper list.  I 
haven't yet had a chance to reproduce it locally, but it sounds like that 
should be straight forward.


FreeBSD 5 has been an exception -- normally, in as much as major 
releases have a normal, the set of new features is a lot less agressive, 
and it has been our goal with 6.x to restore the expectation of a more 
rapid release cycle with a less agressive feature set.  This should reduce 
the number of problems by virtue of reducing the level of change.  It 
should also make it easier for users to pick what version to run on, as 
the amount of adaptation they have to do to slide forward a version will 
be greatly reduced.  I.e., right now it's relatively easy to move back and 
forward between 5.x and 6.x.


With respect to 5.x vs 6.x upgrades: I've seen companies take two 
different strategies.  Most of them have been at least experimenting with 
deploying 5.x, and are very interested in its feature set.  Support for 
large file systems, 64-bit support on newer AMD and Intel hardware, 
improved PAM support, etc.  Some of my customers are specifically 
interested in the support for mandatory access control, but that's 
obviously a less common feature request :-).  The biggest determining 
factor for companies today comes from their own product schedule, since 
most big consumers of FreeBSD treat it as a component in a product they 
deliver for others.


For example, my understanding is that Yahoo is now deploying 6.0 betas 
across their server environment with great success, but was actually 
unable to seriously deploy 5.x because their goal was to support full 
32-bit compatibility on 64-bit amd/intel hardware, which has only recently 
reached the level of maturity they require.  In fact, you'll notice if you 
follow FreeBSD commit logs that much of that support has come from Yahoo!. 
Since 6.x is maturing in pretty good synch with their deployment timeline 
for 5.x, they are actually deploying 6.x.  Of course, Yahoo! has a team of 
in-house OS developers who adapt FreeBSD for their needs, and is quite 
capable of debugging a kernel or two if they run into problems.


The ATA driver issue is a sticky one for many users -- we hope to get the 
6.x ATA code back into 5.x in the next 5.x release.  However, hard-earned 
experience tells us that ATA driver code is notoriously difficult to get 
right across the broad range of available hardware.  Soren has been 
lobbying to get it merged to 5.x, but given the level of testing performed 
so far, we can't yet justify the merge.  My hope is that with 6.0 out the 
door and a lot of testing of that code, we can get it merged back to 5.x 
before 5.5.  Many other fixes have gone into 5.x, correcting many of the 
most significant issues.  If you compare 5.4 with 5.3, you'll find that in 
most cases, it's both faster and more stable.


The tty issue is a sticky one also.  The tty code in 6.x has been 
substantially rewritten to better support the SMPng environment.  Because 
the tty code plugs in to a number of device drivers, T1 adapter drivers, 
etc, changing the tty interfaces is a fairly big event, and will affect 
third party vendors like Cronyx.  This code has also not yet seen as wide 
deployment as I'd like, so it's also something that really isn't 
appropriate for an MFC immediately.  However, once it has seen significant 
6.0 deployment, it may well be.  A question then will be whether it's 
better to simply say you're better off making the jump to 6.x, which is 
minor than backporting, and it's something we can't really answer until 
we're comfortable that it's seen sufficient deployment.  My hope is that 
we can identify a workaround for 5.x that will avoid

Re: Serious issue with serial console in 5.4

2005-07-21 Thread Marc Olzheim

On Thu, Jul 21, 2005 at 02:19:23PM +0200, Eirik verby wrote:
   If you cause syslogd not to send any output to /dev/console, does  
 the
   problem go away?
 
 I'm afraid to say it doesn't

Please, could you add:

options DDB #Enable the kernel debugger
options DDB_NUMSYM  #Print numerical value of symbols too
options KDB
options KDB_TRACE
options KDB_UNATTENDED

to your kernel config ?

Marc


pgplvcZmOr4AP.pgp
Description: PGP signature

Re: Quality of FreeBSD

2005-07-21 Thread Robert Watson


On Thu, 21 Jul 2005, Nicklas B. Westerlund wrote:

Although I havn't seen any major problems on our servers, all using u320 
scsi and smp - I don't feel as secure about my choice of upgrading to 
5.x. We still have some 4.x servers in production, and judging by how 
this is evolving, I think I'll rather skip the 5-branch for those 
machines and keep testing 6.x. The last thing we need is servers with 
problems to disturb our sleep at night.


Overall I think we're a few of the lucky ones, as alot of people seem to 
have huge problems which we havn't encountered, again that is because of 
different architectures and such.


Actually, I think you're part of the silent majority who find it works 
fine in their environment.  We use RELENG_5 at work on a number of 
machines, and I work with several companies and organizations who do, and 
have no problems at all.  The edge cases seem to be:


- High load environments, or high load testing.

- Hardware that isn't part of the regular testing that FreeBSD developers
  do as part of their work, likely because they don't have the hardware.

- Less commonly deployed features -- i.e., IPX, which has experienced
  serious functional problems in RELENG_5 until a few months ago.
  Interestingly, resulting from a compiler change, not network stack
  changes...

Robert N M Watson
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Quality of FreeBSD

2005-07-21 Thread Michael Schuh

Hi,

at this point i musttail my paint with you and the other's.
I have really made a few tests on one big issue or RELENG_5.
At the time as it was early enough to change things, but the guys they
have me telled someone else have to fast machines to test ( in my eyes
they should test on some sloweer hardware, to become the maximum
performance)

I have telled some guys the problems that i have found, these Problems are 
really important for other issues ( performance from applications etc.)

but no one would really hear what i have to say, they telled me some
unrelevant ( and many bullshit), and they think not before they
speak.

so that the result for me ist to wait on RELENG_6, so that i made one
or two tests and if the tests do not perform in the right direction
then i leave the
FreeBSD and going back to Linux or switching eventually to DragonFly.

Now my question to you : is the performance of ata-related disk-access
under UFS-Filesystem not important for other application, so that the
performance can be a half of them that RELENG_4 does?

In fact under RELENG_4 i can write a GIG FIle double as fast as under
RELENG_5 ! and i would not hear any thing about serial performance or
that this is not really like the real world, if i syimulate that with:

/usr/bin/time dd if=/dev/zero of=/zerofile bs=1024 count=1024k;
this is reality poor!

I know we gave all our best, but many people are more arrogant,
and think not really...

best regards

Michael
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Quality of FreeBSD

2005-07-21 Thread Joao Barros

Hi all,

Robert,

I was hopping for you to mention user's feedback.
I started this thread
http://lists.freebsd.org/pipermail/freebsd-current/2005-July/052288.html
back with SNAP004. The problem is still present in BETA1.
I haven't seen any more advances in the thread, and I know this must
be a very localized issue, and that everyone is pretty busy with the
upcoming release but I wouldn't want this issue forgotten. Should I
submit a PR?
As this is a kernel issue, I'm pretty much stuck to 5, although I
would prefer start using 6.

Yet, another loyal FreeBSD user :-)
--
Joao Barros
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Quality of FreeBSD

2005-07-21 Thread MikeM


On 7/21/2005 at 8:29 PM Daniel O'Connor wrote:

|On Thursday 21 July 2005 19:27, Marc Olzheim wrote:
| Thank you for expressing my exact same sentiments. I'm still a huge
| FreeBSD fan and switching to anything else (well, perhaps DragonFly)
| seems out of the question, but my faith is being tested a lot
lately.
| Having switched some of my companies production machines to 5.4,
since
| it was (in my eyes falsely) called a 'production release', FreeBSD's
| reputation within the less technical parts of the company has taken
a
| large dent. Luckily they know as well that there's still no
comparison
| to FreeBSD 4.x; top of my ruptime looks like:
|
|I think the best way to rectify this is to test RC candidates on YOUR 
|hardware.. This finds the bugs you need fixed at a time when people
are
|very receptive to fixing them.
|
|It's not realistic for the release engineer to test on a lot of
hardware
|as they are very busy doing other things.
 =

Your comment presupposes that most of the bugs are specific to one
piece of hardware, I doubt that is a valid assertion.  I would offer
that most of the bugs are not present in source code specific to a
certain piece of hardware, but are present in source code that is run
across much of the hardware that FreeBSD runs on.  As such, it is just
a matter of setting up the correct QA testing scripts to catch the
bugs.

Once a bug is reported, and that bug can be reproduced on the hardware
of the development team, then that bug should not reappear again,
because there should be a testing script written for it.


Additionally, every software bug is not only a defect in the software,
but it also represents a defect in the process that created the
software.  Bugs should be looked at to analyze why they occurred, and
what in the process might be changed to prevent the same or similar
bugs from recurring.



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Quality of FreeBSD

2005-07-21 Thread Robert Watson



On Thu, 21 Jul 2005, Joao Barros wrote:

I was hopping for you to mention user's feedback. I started this thread 
http://lists.freebsd.org/pipermail/freebsd-current/2005-July/052288.html 
back with SNAP004. The problem is still present in BETA1. I haven't seen 
any more advances in the thread, and I know this must be a very 
localized issue, and that everyone is pretty busy with the upcoming 
release but I wouldn't want this issue forgotten. Should I submit a PR? 
As this is a kernel issue, I'm pretty much stuck to 5, although I would 
prefer start using 6.


I would suggest always filling a PR if you worry the problem is going to 
get lost.  While PR's can also get lost, they tend to persist more than 
old e-mails.


There are two likely causes of problems:

(1) amr driver problems
(2) General PCI/interrupt/ACPI/APIC problems

The last few functional changes to amr were by Paull Saab (ps@) and Scott 
Long (scottl@), and I'd be tempted to try to chase that option first.  The 
first question to answer is whether you can get into the debugger using a 
console or serial break, as that will tell us what sort of hang you're 
seeing.


You can find detailed instructions for kernel debugging in the handbook. 
Try adding BREAK_TO_DEBUGGER, KDB, and KDB as a first step, and see if a 
break gets you to the debugger or not.  If you can get into the debugger, 
submit the information to the PR, forward me the PR receipt, and I'll try 
assigning it to one of the above and see if we can get someone to take 
some interest in it.


If you can't get into the debugger, it's more likely an interrupt/etc 
problem.  We might try John Baldwin (jhb@) as a possible first contact.


Robert N M Watson
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: SuperMicro X5DP8-G2MB/(2)XEON 2.4/1GB RAM 5.4-S Freeze

2005-07-21 Thread Marc Olzheim

On Tue, Apr 19, 2005 at 10:38:08AM +0200, Marc Olzheim wrote:
   The problem is with the periodic SMM interrupt and the bios.
   
   The attached program (ich-periodic-smm-disable.c) will fix the problem.
   For more information on what it does, see the Intel ICH3 datasheet.
   
   compile as 'gcc ich-periodic-smm-disable.c; ./a.out' and you will be
   good.
   Run this on each boot.
   
   I think you only need to clear PERIODIC_EN.
  
  Ok, I'll try it right away, thanks a lot!
 
 This clearly solves it. The machines are now up for longer than a week
 for the first time since I booted FreeBSD 5.x on them.

Does anyone know whether this workaround is still necessary for newer
5.x's and/or 6.x and current ?

Marc


pgpmmprQYm3Ks.pgp
Description: PGP signature

Re: Quality of FreeBSD

2005-07-21 Thread Joao Barros

On 7/21/05, Robert Watson [EMAIL PROTECTED] wrote:

 On Thu, 21 Jul 2005, Joao Barros wrote:

  I was hopping for you to mention user's feedback. I started this thread
  http://lists.freebsd.org/pipermail/freebsd-current/2005-July/052288.html
  back with SNAP004. The problem is still present in BETA1. I haven't seen
  any more advances in the thread, and I know this must be a very
  localized issue, and that everyone is pretty busy with the upcoming
  release but I wouldn't want this issue forgotten. Should I submit a PR?
  As this is a kernel issue, I'm pretty much stuck to 5, although I would
  prefer start using 6.

 I would suggest always filling a PR if you worry the problem is going to
 get lost.  While PR's can also get lost, they tend to persist more than
 old e-mails.

 There are two likely causes of problems:

 (1) amr driver problems
 (2) General PCI/interrupt/ACPI/APIC problems

I suspect the 2nd


 The last few functional changes to amr were by Paull Saab (ps@) and Scott
 Long (scottl@), and I'd be tempted to try to chase that option first.

Scott replied:

The kernel isn't hung, it's just forever waiting for an interrupt from
the amr card that it'll never get.  Again, this is almost certainly an
interrupt routing problem, so please contact John Baldwin
jhb at freebsd.org and provide him your details.

Scott



  The first question to answer is whether you can get into the debugger using a
 console or serial break, as that will tell us what sort of hang you're
 seeing.

 You can find detailed instructions for kernel debugging in the handbook.
 Try adding BREAK_TO_DEBUGGER, KDB, and KDB as a first step, and see if a
 break gets you to the debugger or not.  If you can get into the debugger,
 submit the information to the PR, forward me the PR receipt, and I'll try
 assigning it to one of the above and see if we can get someone to take
 some interest in it.

After reading this
http://lists.freebsd.org/pipermail/freebsd-current/2005-July/052434.html
I breaked into the debugger and posted this
http://lists.freebsd.org/pipermail/freebsd-current/2005-July/052489.html
Is the information there suficient to open a PR?


 If you can't get into the debugger, it's more likely an interrupt/etc
 problem.  We might try John Baldwin (jhb@) as a possible first contact.

John started debugging this with another person with similar problems
on 5 and the debugging never got to 6 (no feedback from the other
person): 
http://lists.freebsd.org/pipermail/freebsd-current/2005-July/052727.html


 Robert N M Watson

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

why is [acpi_task1] killable?

2005-07-21 Thread Benjamin Lutz

Hello,

I've (accidentally, because of a broken pidfile) noticed yesterday that
root can kill [acpi_task1] (PID 8 on my system, FreeBSD 5.4-p5/i386).
Killing it resulted in an immediate and total lockup of the system.

I gather that processes with [ ] around their name are parts of the
kernel. Shouldn't they be protected from kills? Note this was a standard
kill, not a kill -9 or anything mean like that.

Cheers
Benjamin


signature.asc
Description: OpenPGP digital signature

Re: Quality of FreeBSD

2005-07-21 Thread Robert Watson


On Thu, 21 Jul 2005, MikeM wrote:

Your comment presupposes that most of the bugs are specific to one piece 
of hardware, I doubt that is a valid assertion.  I would offer that most 
of the bugs are not present in source code specific to a certain piece 
of hardware, but are present in source code that is run across much of 
the hardware that FreeBSD runs on.  As such, it is just a matter of 
setting up the correct QA testing scripts to catch the bugs.


Once a bug is reported, and that bug can be reproduced on the hardware 
of the development team, then that bug should not reappear again, 
because there should be a testing script written for it.


Additionally, every software bug is not only a defect in the software, 
but it also represents a defect in the process that created the 
software.  Bugs should be looked at to analyze why they occurred, and 
what in the process might be changed to prevent the same or similar bugs 
from recurring.


Some of us have actually spent quite a bit of time looking at the defect 
sets reported for 5.x.  Depending on the release they fall into a number 
of categories, but here are the major ones I've identified:


- ACPI-related hardware probe issues, especially in earlier 5.x releases
  when the ACPI code (especially Intel vendor code) started knowing how to
  work around common ACPI BIOS bugs.  The source of these problems was
  often that BIOS ACPI code contained work-arounds for Windows ACPI bugs.
  Newer 5.x releases have blacklists of known bad BIOSes, workarounds for
  bugs, etc, and this is a much less reported problem now.  These problems
  weren't present in 4.x because ACPI wasn't supported in 4.x; on the
  other hand, there's a broad range of modern server hardware that now
  requires ACPI to boot, so 4.x didn't run on that hardware, or supported
  it poorly.  After a very large effort, ACPI problems are massively
  reduced.

- ATA problems.  Many of these, while a symptom of bugs in the ATA code
  running without Giant, were very specific to timing, or divergent/poor
  ATA hardware.  As a result, they were difficult to reproduce in any
  environment but the original reporting environment.  The same hardware
  might perform fine in a FreeBSD developer's system.  Many of these
  problems have now been resolved, but some have not.  Often as not, the
  problems have to do with retrying requests to drives.  As I mentioned,
  we believe the ATA code in 6.x is much more resilient, but right now
  what it needs is testing, not merging to 5.x yet.  Fixes require just as
  much testing as any other change, since a fix for one issue may well
  trigger another issue, especially in the world of cheap PC hardware.

- Network stack stability under high load, especially on SMP.  Many of
  these bugs had to do with exercising timing and race conditions
  precisely right, and involved workloads not in the standard set of
  testing performed.  In many cases, those workloads have now been added
  to the regression test suite.  For example, there were a number of race
  conditions relating to the closing of sockets and network stack teardown
  in the protocols.  These tended to turn up on systems running tens of
  thousands of rapidly opening and closing TCP connections on SMP
  hardware.  Reproducing those conditions is difficult, and not something
  most FreeBSD developers have the resources to do, so have to wait for
  bug reports from people who do have those resources.

  However, over the past 12 months we've been working to put together a
  netperf test cluster, using hardware donated by a number of
  organizations, including the FreeBSD Foundation, FreeBSD Systems,
  IronPort Systems, as well as network connectivity and management donated
  by Sentex Communications.  This has allowed us to apply network tests in
  higher performance environments, and make high end SMP hardware
  available to a broader range of developers.

- Storage/file system related buffer starvation, deadlocks, etc, most a
  result of the development of snapshots and bgfsck support, changes in
  the I/O path, and so on.  A number of these have turned out to be driver
  bugs, but a fair number (especially in the 5.2 time frame) had to do
  with resource management in the UFS code.  Some still remain.

- Lock and resource leak crashes, especially with 5.2 and 5.3, when large
  parts of the system moved from running under Giant to running without
  it.  Our process has definitely improved here, through improved lock
  debugging tools, increased use of assertions, and the advent of things
  like Coverity's static analysis tools being run over the source tree.

- ACPI-like problems having to do with migrating interrupt and hardware
  configuration models.  These usually manifest as interrupt storms.  They
  are required changes to support modern server class SMP hardware, but
  often trigger bugs in a range of motherboard revisions from about 2-3
  years ago.  Sometimes, fixing these problems has

[--Formal Message--] [MailServer Notification]To recipient: Message matched eManager setting and action was taken.

2005-07-21 Thread Administrator

 eManager Notification *

The following mail was blocked since it contains sensitive content.

Source mailbox: [EMAIL PROTECTED]
Destination mailbox(es): MikeM;freebsd-stable@freebsd.org
Rule/Policy: NOC fun
Action: Quarantine to 
C:\Programme\Trend\SMCF\Quarantine\2005-07-21\15\35\DFImessagebody42dfa4a6979.tmp

Content filter has detected a sensitive e-mail.

*** End of message *

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Quality of FreeBSD

2005-07-21 Thread Mike Tancsa


At 09:23 AM 21/07/2005, Joao Barros wrote:


John started debugging this with another person with similar problems
on 5 and the debugging never got to 6 (no feedback from the other
person): 
http://lists.freebsd.org/pipermail/freebsd-current/2005-July/052727.html



Yes, The other person is me :)  I should have some time today to try and test.

---Mike 


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

jails bring down network interface

2005-07-21 Thread Benjamin Lutz

Hello,

While tracking an issue with a jail I run, the interface to which the
jail aliases it's IP to suddenly became unresponsive.

My script starts the jail, then runs ifconfig alias. After starting and
stopping the jail about 20 times, the interface basically froze.
Ifconfig reported it as up and running, but it would no longer pass any
packets. Bringing it down then back up made it work again.

Since this is a production machine, I'm afraid I can't give more
specific details, I don't wish to run into the problem again.

I'm running FreeBSD 5.4-p5, the interface in question is a VIA VT6105
Rhine III using the vr(4) driver.

Cheers
Benjamin


signature.asc
Description: OpenPGP digital signature

Re: Quality of FreeBSD

2005-07-21 Thread Marc Olzheim

On Thu, Jul 21, 2005 at 01:20:49PM +0100, Robert Watson wrote:
 I know FreeBSD 5 was a strange exception in the relase scheduling and 
 that a lot has been learned from it for the future and I'm certainly not 
 unthankful for all the work that's done, but I'd like a clear answer on 
 what to do now in regard to taking FreeBSD 5 into 'real' production...

[snip]

 In terms of advice:
 
 If you have a product due out more than 3 months from now, I think 6.x 
 is the obvious way to go: you want to be ahead of the curve so that you 
 can have the foundation for your product in sync with the FreeBSD 
 production release cycle, and avoid jumping major releases early in the 
 product life cycle.  6.x has significant performance and stability 
 improvements -- performance especially in the area of file system 
 performance on SMP, preemption, network stack, and memory management, and 
 stability especially in the area of tty support.  By product, I mean a 
 range of things: the OS foundation of an embedded product such as a 
 firewall or storage appliance, or deployment of an internal product, such 
 as a virtual server product at an ISP.

[snip]

Robert, thanks again for your clear and straight answer. :-)

We fall in the Yahoo-like category of FreeBSD users (in more than one
way) and have been testing a bit with 6.x, just not as heavy as with
5.x.

Since I've already experienced the easy upgrade path before (the way
back to 5.x has been a bit more hairy btw.), it will be easy enough for
me to upgrade some servers to 6.x and start testing that, which is
excatly what I will do.

Because my current 5.x machines have to run with INVARIANTS to be in
production for more than a few seconds, the performance will no doubt be
better anyway. I'll let the debug code enabled on most machines for now
anyhow to possibly provide more useful bug reports. :-)

Thanks again, your answer was of great value to me.

Marc


pgpG6qrYOvOYJ.pgp
Description: PGP signature

Re: Quality of FreeBSD

2005-07-21 Thread Joao Barros

On 7/21/05, Mike Tancsa [EMAIL PROTECTED] wrote:
 At 09:23 AM 21/07/2005, Joao Barros wrote:
 
 John started debugging this with another person with similar problems
 on 5 and the debugging never got to 6 (no feedback from the other
 person):
 http://lists.freebsd.org/pipermail/freebsd-current/2005-July/052727.html
 
 
 Yes, The other person is me :)  I should have some time today to try and test.
 
  ---Mike

Sorry Mike for not seeing you ;-)
I believe you were on the right track with jhb so I'm looking forward
to your test results! Thanks
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Multiple consumers of /dev/dsp

2005-07-21 Thread Josef Karthauser

In the past I'm sure that we supported the mixing of audio in the kernel
so that multiple applications could open /dev/dsp at the same time.  Was
this a function of the audio card driver, or of the audio subsystem?
Currently on my new machine I don't get any mixing, and applications
fail to open /dev/dsp if it's already open by something.

The current hardware is:

FreeBSD Audio Driver (newpcm)
Installed devices:
pcm0: Intel ICH4 (82801DB) at io 0xee00, 0xe000 irq 9 bufsz 16384 kld
snd_ich (1p/1r/0v channels duplex default)

Am I imagining that this use to the case or isn't it enabled by default?

Joe
-- 
Josef Karthauser ([EMAIL PROTECTED])   http://www.josef-k.net/
FreeBSD (cvs meister, admin and hacker) http://www.uk.FreeBSD.org/
Physics Particle Theory (student)   http://www.pact.cpes.sussex.ac.uk/
 An eclectic mix of fact and theory. =


pgpw0NVhFRhzG.pgp
Description: PGP signature

Re: Multiple consumers of /dev/dsp

2005-07-21 Thread David Adam

Josef,


On Thu, 21 Jul 2005, Josef Karthauser wrote:

 In the past I'm sure that we supported the mixing of audio in the kernel
 so that multiple applications could open /dev/dsp at the same time.  Was
 this a function of the audio card driver, or of the audio subsystem?
 Currently on my new machine I don't get any mixing, and applications
 fail to open /dev/dsp if it's already open by something.

 The current hardware is:

 FreeBSD Audio Driver (newpcm)
 Installed devices:
 pcm0: Intel ICH4 (82801DB) at io 0xee00, 0xe000 irq 9 bufsz 16384 kld
 snd_ich (1p/1r/0v channels duplex default)

 Am I imagining that this use to the case or isn't it enabled by default?

It's not on by default, AFAIK, but setting a couple of sysctls will allow
you to have more than one program playing sound at once.

# sysctl hw.snd.pcm0.vchans=4
# sysctl hw.snd.maxautovchans=4

Check out http://www.freebsd.org/doc/handbook/sound-setup.html#AEN8582
(the section titled 'Utilizing Multiple Sound Sources').

Cheers,

David Adam
[EMAIL PROTECTED]

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: jails bring down network interface

2005-07-21 Thread Robert Watson



On Thu, 21 Jul 2005, Benjamin Lutz wrote:

While tracking an issue with a jail I run, the interface to which the 
jail aliases it's IP to suddenly became unresponsive.


My script starts the jail, then runs ifconfig alias. After starting and 
stopping the jail about 20 times, the interface basically froze. 
Ifconfig reported it as up and running, but it would no longer pass any 
packets. Bringing it down then back up made it work again.


Since this is a production machine, I'm afraid I can't give more 
specific details, I don't wish to run into the problem again.


I'm running FreeBSD 5.4-p5, the interface in question is a VIA VT6105 
Rhine III using the vr(4) driver.


Should this occur again, the starting point to investigate is to determine 
whether it's sending that's broken, receiving that's broken, or both. 
I would investigate them by:


- Using ping on the system to ping a remote host, see if the other system
  receives the ping packets using tcpdump.

- Use ping on another host to ping the local host, and see if tcpdump on
  the local host sees the ping packets.

As an FYI, ideally you'll do it using a pair of machines that already have 
each other in the ARP cache, or otherwise you'll need to look for ARP 
requests on the local area network instead of ICMP requests.  Beware 
switches and routers that mask traffic from third parties (hence 
suggesting using those two machines).


Also, it would be good to know if the if_vr interface receives interrupts 
or not when it's wedged -- you can check this using vmstat -i or systat 
-vmstat 1 and see what the interrupt count for the interface is.  I prefer 
systat to vmstat, FYI.


Finally, if you sit there and ping for a while, do you start getting 
ENOBUFS back from the interface?


Finally, the dmesg probe output would be helpful.

Thanks,

Robert N M Watson
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Quality of FreeBSD

2005-07-21 Thread MikeM

On 7/21/2005 at 2:29 PM Robert Watson wrote:

|Some of us have actually spent quite a bit of time looking at the
defect 
|sets reported for 5.x.  Depending on the release they fall into a
number 
|of categories, but here are the major ones I've identified:
| [snip]
|- Network stack stability under high load, especially on SMP.  Many of
|   these bugs had to do with exercising timing and race conditions
|   precisely right, and involved workloads not in the standard set
of
|   testing performed.  In many cases, those workloads have now been
added
|   to the regression test suite.  For example, there were a number of
race
|   conditions relating to the closing of sockets and network stack
teardown
|   in the protocols.  These tended to turn up on systems running tens
of
|   thousands of rapidly opening and closing TCP connections on SMP
|   hardware.  Reproducing those conditions is difficult, and not
something
|   most FreeBSD developers have the resources to do, so have to wait
for
|   bug reports from people who do have those resources.
|  [snip]
 =

Thank you for the clear answer.  For the record, I am very pleased with
the overall quality of FreeBSD, my comments were only meant in the
sense of everything has room for improvement, even something as
excellent as FreeBSD.

I snipped out one section of your reply because it illustrates a main
point of my message.  

While it is good to have the testing in place to catch race conditions,
has anyone done a post mortem to determine why and/or how the race
conditions got into the code in the first place?  *Someone* coded that
race condition.   Was it that two developers were using the same data
structure without one knowing about the other?  If so, then there's a
problem that needs to be fixed.   Chances are, though, that wasn't the
problem.  Only the developers would be able to look at the development
process and determine why the process allowed a race condition to occur
in the code.  But if they took the time to do this, then the knowledge
gained would be useful across a wide swath of FreeBSD development.

Thank you for your offer of allowing me to contribute to the FreeBSD
project, however I have professional obligations that prevent me from
making the necessary commitment to the project.  For the most part I
just lurk here, popping my head up on occasion.  In doing so, it is not
my intent to to snipe at anyone or carp at anything.  As such, I'll let
this sub-thread die out at this point





___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: TinyBSD Call For Testers

2005-07-21 Thread Marten Vijn

I tried to build tiny freebsd a 6.0 version,
which currently works on my laptop ( cvs checked out to day)

did a build/install world + kernel

The image build doesn't exit somewhere or errors...


burned an image to my cf-card

cat my.img  /dev/ad4

Then booted

the image boot stops : can't load kernel

? is the TINYBSD kernelconfig is not prepared for 6.0

an attempt to build this kernelconfig separately fails at the
atheros driver:

if_ath.o(.text+0x213a): In function `ath_node_alloc':
: undefined reference to `ath_rate_node_init'
if_ath.o(.text+0x2187): In function `ath_node_free':
: undefined reference to `ath_rate_node_cleanup'
if_ath.o(.text+0x21b6): In function `ath_node_free':
: undefined reference to `ath_rate_node_cleanup'
if_ath.o(.text+0x322a): In function `ath_start':
: undefined reference to `ath_rate_setupxtxdesc'
if_ath.o(.text+0x342c): In function `ath_start':
: undefined reference to `ath_rate_findrate'
if_ath.o(.text+0x3fa1): In function `ath_tx_processq':
: undefined reference to `ath_rate_tx_complete'
if_ath.o(.text+0x4764): In function `ath_detach':
: undefined reference to `ath_rate_detach'
if_ath.o(.text+0x4f35): In function `ath_newstate':
: undefined reference to `ath_rate_newstate'
if_ath.o(.text+0x4ffe): In function `ath_newstate':
: undefined reference to `ath_rate_newstate'
if_ath.o(.text+0x5352): In function `ath_newassoc':
: undefined reference to `ath_rate_newassoc'
if_ath.o(.text+0x6e3d): In function `ath_attach':
: undefined reference to `ath_rate_attach'
*** Error code 1

Stop in /usr/obj/usr/src/sys/TINYBSD.
*** Error code 1

Stop in /usr/src.
*** Error code 1

Stop in /usr/src.
medion#

Then I copied the GENERIC kernelconfig to /usr/local/share/tinybsd/TINYBSD

and repated the build proces...

This still leaves me boot message:

can't load kernel, so someting else more is going wrong??


I mounted the cf-card on my laptop:

medion# cp -v /boot/kernel/kernel /mnt/boot/kernel/

After this there is a bootable system. Next to find out is why the kernel
wasn't in the image.

Thougths:
- something with coping the kernel went wrong
  (exits on errors would be fine)
- atheros drivers do not like to be build in kernel but are fine to be
loaded as a modules (I tested the loading of these modeles)

Apart from this, opening a getty on a com port by default would safe some
time on serial only boxes

in /etc/ttys I changed :

ttyd0   /usr/libexec/getty std.9600   dialup  off secure
to:
ttyd0   /usr/ibexec/getty std.9600   ansi on  secure

Like this a had a soekris 4521 booted :

https://martenvijn.nl/tinybsd/net4521.txt




Marten


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: READ_DMA, WRITE_DMA errors

2005-07-21 Thread Paul Mather

On Wed, 2005-07-20 at 23:54 -0500, Steve wrote:
 I've found tons of emails, news messages, listserv messages, and even 
 some bug reports of this seemingly common error.
 
 So, I had been running 5.2 on a server, and, updated to 5.3. Got the 
 READ_DMA and WRITE_DMA error and retries. So, figuring it might be a bad 
 update, took a new drive. put it in, loaded 5.4 for grins, and, same 
 issue, lots of these errors, eventually destroying the FS. Played around 
 with various settings, no avail. So, took it back, got different box, 
 everything new. Same problem, new install of 5.4
 
 So, took it back, got another with another MB (different model), but, 
 same maker (ASUS). Didn't have endless time to spend on production 
 machine. Sure enough, same problem. It's an ASUS A7V880. Controller is 
 SATA VT8237. Played around with tons of settings, eventually, after 
 reading various messages out there, discovered one that resolved the 
 problem. Had to set hw.ata.ata_dma=0. Of course, there is the obvious 
 downside to that! Speed!
 
 But it stinks to have decent hardware, yet, have to cripple the 
 machine. The place I got the equipment at runs ASUS only and has 
 thousands of them running under other OSes. Wished I had stayed with the 
 old FreeBSD version and old hardware now. I have not seen anyone that 
 has ever said the problem was being (or had been) solved though. I see 
 the bug reports, I take it no one has actually pinpointed the problem 
 though. BUT, I do hope it is understood that this is fairly widespread, 
 for me, the likelihood of 3 pcs, 2 different MB models, and, *complete* 
 new hardware for each of the 3 pcs kind of rules out hardware being 
 broken, might be badly designed, but, certainly not defective hardware.
 
 I do hope someone can eventually figure this out, seems to be extremely 
 common, and, definitely a problem for a stable release named 5.4.

I was one of the people who suffered from and reported this seemingly
common error.  On the systems that encountered problems, none had
particularly obscure or cutting-edge hardware (e.g., Intel PIIX4 ATA
controller on the motherboard).  One common thread in my case is that
all ran some kind of software RAID (gvinum or gmirror), though not all
of my software RAIDed machines exhibited the DMA problems leading me to
think perhaps it was a hardware/load/disk combination problem.  Quite
obviously, not all PIIX4 controller users were having this happen, and
so the it doesn't happen to me factor might have contributed to the
general notion that this was probably operator error or something like
that, and dismissed.

Anyway, as well as 5-STABLE, I also run a 6-CURRENT system that suffered
the problem.  Happily, after the ATA Mk.III merge, the situation
improved a LOT.  I occasionally still get the error reported, but it is
not fatal, unlike before (where the drive would be detached, breaking my
geom_mirror, necessitating a lengthy background rebuild).  So, I
consider the ATA Mk. III rewrite to have fixed the problem I had.  It
may be, then, that those upgrading to the upcoming 6.0-RELEASE (when it
appears) might also find their ATA DMA problems solved, too.

As for 5.x, I track -STABLE, and have noticed slight improvements
regarding the DMA TIMEOUT problem.  If you only run -RELEASE, you might
miss these ongoing improvements that crop up from time to time.

Cheers,

Paul.
-- 
e-mail: [EMAIL PROTECTED]

Without music to decorate it, time is just a bunch of boring production
 deadlines or dates by which bills must be paid.
--- Frank Vincent Zappa
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Quality of FreeBSD

2005-07-21 Thread Robert Watson


On Thu, 21 Jul 2005, MikeM wrote:

Thank you for the clear answer.  For the record, I am very pleased with 
the overall quality of FreeBSD, my comments were only meant in the sense 
of everything has room for improvement, even something as excellent as 
FreeBSD.


I think everyone agrees there's room for improvement -- many FreeBSD 
developers come to work on FreeBSD because they are enjoy writing software 
and are dissatisfied with what they find in the commercial world. 
However, I've found most problems in the FreeBSD development process stem 
from a lack of resources to implement the best processes, rather than 
processes being wrong by design.  I.e., there being a strong interest in 
producing tested code, but inadequate resources to provide the thorough 
testing we'd like.  Or, the best of intentions (a company agrees to 
support development of a feature, starts work, and then goes out of 
business) preventing follow-through.  As has already been mentioned we're 
intentionally going for a much less agressive 6.x feature set in order to 
refine some of the hard architectural work in 5.x, and to avoid 
over-committing resources.  One of the biggest problems with the SMP work 
in 5.x was the dot.com crash: companies that had committed resources to 
manage and develop on the project ceased to be available.


I snipped out one section of your reply because it illustrates a main 
point of my message.


While it is good to have the testing in place to catch race conditions, 
has anyone done a post mortem to determine why and/or how the race 
conditions got into the code in the first place?  *Someone* coded that 
race condition.  Was it that two developers were using the same data 
structure without one knowing about the other?  If so, then there's a 
problem that needs to be fixed.  Chances are, though, that wasn't the 
problem.  Only the developers would be able to look at the development 
process and determine why the process allowed a race condition to occur 
in the code.  But if they took the time to do this, then the knowledge 
gained would be useful across a wide swath of FreeBSD development.


There's some information, FYI, on the netperf cluster:

http://www.freebsd.org/projects/netperf/cluster.html

It needs a bit more updating for recent hardware additions, courtesy 
Sentex.


With respect to the network stack changes -- yes.  And in some cases, the 
areas of problems were actually marked with comments indicating they were 
known, but not easily resolvable (or not thought to be bugs that were 
exercised in practice).  In other cases, they were due to the 
mis-understanding of code in the stack, or the fact that data structures 
or code were not originally designed with parallelism in mind, and the 
communal discovery of unexpected or undocumented complexity.  A 
significant part of 5.x and 6.x work has been fixing existing 
architectural problems present for decades, but that suddenly become more 
relevant as the kernel supports SMP and threading better.


In several cases, they were bugs already present in FreeBSD 4.x, but only 
exercisable under extremely high memory load.  Something you'll find in 
later 5.x versions is a much greater use of locking assertions than in 
earlier versions.



Thank you for your offer of allowing me to contribute to the FreeBSD 
project, however I have professional obligations that prevent me from 
making the necessary commitment to the project.  For the most part I 
just lurk here, popping my head up on occasion.  In doing so, it is not 
my intent to to snipe at anyone or carp at anything.  As such, I'll let 
this sub-thread die out at this point


If only the realities of paid work didn't intervene so frequently -- 
sadly, I'm only too familiar with that problem :-).


Robert N M Watson
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: TinyBSD Call For Testers

2005-07-21 Thread Patrick Tracanelli



Hello Marten,

Thanks for your input.

Yesterday sysutils/tinybsd was updated to reflect fetching the new 0.2 
TinyBSD which has some improvements related to lib depends, specially 
pam as it was not functional on tinybsd (opie related problems) in 
FreeBSD 6 like it was in RELENG_5 before. Also, new entries were added 
to the kernel (commented, by default) with the new atheros entries (ath 
rate is probably what is causing your problem, uncomment it on the new 
0.2 tinybsd to build your system under FreeBSD 6).


Also, your change on ttys will probably be interesting for other users 
too. It makes me think that it is probably time to maintain a separated 
etc/ customized tree under tinybsd development dirs, in a PicoBSD 
fashion. In fact it is already added to the TODO listing for TinyBSD. I 
believe it is a better way than changing anything under etc/ without the 
embedded system developer explicity will.


Please, if you get the same (or new) problems under FreeBSD 6 w/ TinyBSD 
0.2, send a note.


--
Patrick Tracanelli

FreeBSD Brasil LTDA.
(31) 3281-9633 / 3281-3547
sip://[EMAIL PROTECTED]
http://www.freebsdbrasil.com.br
Long live Hanin Elias, Kim Deal!
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Quality of FreeBSD

2005-07-21 Thread Martin


Robert Watson wrote:

- ATA problems.  Many of these, while a symptom of bugs in the ATA code
  running without Giant, were very specific to timing, or divergent/poor
  ATA hardware.  As a result, they were difficult to reproduce in any
  environment but the original reporting environment.  The same hardware
  might perform fine in a FreeBSD developer's system.  Many of these
  problems have now been resolved, but some have not.  Often as not, the
  problems have to do with retrying requests to drives.


My system is instable with latest -STABLE kernels, producing ATA DMA
errors. I also think that this does have directly a connection to buggy
ATA code. It seems it is something more general.


 As I mentioned,
  we believe the ATA code in 6.x is much more resilient, but right now
  what it needs is testing, not merging to 5.x yet.  Fixes require just as
  much testing as any other change, since a fix for one issue may well
  trigger another issue, especially in the world of cheap PC hardware.


This is true for me. RELENG_6 is great, but there are still annoying
bugs which prevent me from migrating the system completely. I'm using
FreeBSD mainly as desktop and I really need bktr(4) to work correctly.
Then there is some trouble with ath(4) making my notebook unusable.

To put it straight, there is no FreeBSD branch which works well
for me since about 2 months. This is frustrating for me, but I try
to have patience, because you do a great job and btw, I cannot
imagine to use my PCs without FreeBSD.

One more thing about cheap hardware: if you know that a piece of
hardware is potentially buggy (I mean real BUGS and not missing
support), please publish your opinion, because I will buy hardware
FOR FREEBSD, so I avoid major problems. How about test suites for
ACPI quality, e.g.? Would it be possible? There are people who spend
time to test FOR YOU, you don't need to buy all the hardware in
this world.

Martin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Quality of FreeBSD

2005-07-21 Thread Mike Tancsa


At 09:23 AM 21/07/2005, Joao Barros wrote:

On 7/21/05, Robert Watson [EMAIL PROTECTED] wrote:

 On Thu, 21 Jul 2005, Joao Barros wrote:

  I was hopping for you to mention user's feedback. I started this thread
  http://lists.freebsd.org/pipermail/freebsd-current/2005-July/052288.html

 There are two likely causes of problems:

 (1) amr driver problems
 (2) General PCI/interrupt/ACPI/APIC problems

I suspect the 2nd



John started debugging this with another person with similar problems
on 5 and the debugging never got to 6 (no feedback from the other
person): 
http://lists.freebsd.org/pipermail/freebsd-current/2005-July/052727.html


I finally got around to testing John's last suggestion, and the 
modification allows me to boot a RELENG_6 kernel!  So there is a work 
around at least on my DELL PE6350.  Take a look at the thread on current 
for a full dmesg.


---Mike


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: READ_DMA, WRITE_DMA errors

2005-07-21 Thread Steve




Paul Mather wrote:


One common thread in my case is that
all ran some kind of software RAID (gvinum or gmirror), though not all
of my software RAIDed machines exhibited the DMA problems leading me to
think perhaps it was a hardware/load/disk combination problem.  
 


I do not use RAID at all, so, not common for me.


Anyway, as well as 5-STABLE, I also run a 6-CURRENT system that suffered
the problem.  Happily, after the ATA Mk.III merge, the situation
improved a LOT.  I occasionally still get the error reported, but it is
not fatal, unlike before (where the drive would be detached, breaking my
geom_mirror, necessitating a lengthy background rebuild).  
 

Well, that's good news, I just hope that is a widespread fix, there 
seems to be different issues, and, hopefully, the rewrite intentionally 
or unintentionally resolves them all! Sounds like in your case, it's 
almost 100%. An occasional error (we get watchdog timeouts on network) 
is not bad as long as it doesn't destroy the FS, obviously, we want 
zero, but, things happen. It's quite conceivable that 1 error per day IS 
a hardware issue. But, in our case, with 4 machines and the corruption, 
not the case!


Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: READ_DMA, WRITE_DMA errors

2005-07-21 Thread Steve




Robert Watson wrote:

6.0 contains a significant re-write and update of the ATA driver, and 
corrects a number of known problems with timeouts and reliability.  
This rewrite is available as patches against 5.x, but has not been 
committed because ATA is a very sensitive thing (lots of very diverse 
and very broken hardware), and has had insufficient testing.  If you 
have test hardware available that's not in production, it would be 
quite helpful if you could install 6.0-BETA2, once that comes out in 
the next week or so, and see if the specific ATA problems you're 
experiencing occur there. It's not impossible that the new ATA code 
will be merged to 5.x, but I think we cannot do that until it has seen 
a lot more exposure.  If you search back through the mailing archives, 
you should be able to find posts from Soren regarding the new ATA 
patches, if you want to give them a try on 5.x.


Yes, I will try and find those patches for 5, I do not have a free 
machine that exhibits the problem, but, I do have my disk cloned so a 
quick test of a patch should be simple and risk free over a weekend when 
I have time to mess around.


If anyone has that link handy, please post. (for the patch)

Steve
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Debug output when wi0 pcccard removed

2005-07-21 Thread Patrick Bowen


Hello, All;

I recently CVSUP'd from 5.4 to 6.0BETA1. I used the instructions in 
UPDATING to build/install world and used GENERIC unmodified to build the 
kernel. The whole procedure went without a single hitch. My question 
concerns the meaning of a debug message which appears when I remove my 
Wi-Fi card (which by the way works fine...I'm using it to send this mail).


Here's the output from dmesg when I insert the card (just for reference);

wi0: SMC SMC2532W-B EliteConnect Wireless Adapter at port 0x100-0x13f 
irq 11 function 0 config 1 on pccard1

wi0: using RF:PRISM2.5 MAC:ISL3873
wi0: Intersil Firmware: Primary (1.1.0), Station (1.4.9)
wi0: Ethernet address: 00:04:e2:80:34:be

When I remove the card I get the following;

taskqueue_drain with the following non-sleepable locks held:
exclusive sleep mutex wi0 (network driver) r = 0 (0xc2416afc) locked @ 
/usr/src/sys/dev/wi/if_wi.c:845

KDB: stack backtrace:
kdb_backtrace(1,c1af9250,c1af9000,c1989b80,d44bfc2c) at kdb_backtrace+0x29
witness_warn(5,0,c0854d21,c1af9000,c1af9000) at witness_warn+0x18e
taskqueue_drain(c1989b80,c1af9250,c1af9000,c1af9000,c1af9000) at 
taskqueue_drain+0x1a

if_detach(c1af9000,c1af9000) at if_detach+0x1a
ether_ifdetach(c1af9000,0,c2416000,d44bfc94,c05debfc) at ether_ifdetach+0x28
ieee80211_ifdetach(c2416004,c1af9000,c1af9000,0,c1c51880) at 
ieee80211_ifdetach+0x50

wi_detach(c1c51880) at wi_detach+0x64
device_detach(c1c51880) at device_detach+0x70
pccard_detach_card(c1aaa600) at pccard_detach_card+0x41
exca_removal(c1a6e804) at exca_removal+0x46
cbb_removal(c1a6e800) at cbb_removal+0x2c
cbb_event_thread(c1a6e800,d44bfd38,c1a6e800,c0579df0,0) at 
cbb_event_thread+0x9a

fork_exit(c0579df0,c1a6e800,d44bfd38) at fork_exit+0xa0
fork_trampoline() at fork_trampoline+0x8
--- trap 0x1, eip = 0, esp = 0xd44bfd6c, ebp = 0 ---
wi0: detached

I don't read debug messages yet, and am wondering if this is a problem, 
is it just because WITNESS and INVARIANTS are enabled, or if it's normal 
but never seen in a non-debug kernel.


I get a similar message when I shutdown, having to do mostly with ACPI, 
but since that's been buggy on this machine (Dell Latitude C600), I 
almost expected that.


Thanks in advance--

Patrick Bowen
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Quality of FreeBSD

2005-07-21 Thread Karl Denninger

Agreed.

I have a PR open on the ATA issues, particularly with SATA drives, and
have had it open since before 5.4-RELEASE.

It remains open.

Careful selection of what's where can avoid major trouble, but this is
hardware that worked properly on 4.x for a LONG time - its definitely NOT
defective.

This is a major sore spot, and is not a trivial issue by any means.  Disk
I/O is arguably THE major thing that must work right for any operating
system to be usable.

--
-- 
Karl Denninger ([EMAIL PROTECTED]) Internet Consultant  Kids Rights Activist
http://www.denninger.netMy home on the net - links to everything I do!
http://scubaforum.org   Your UNCENSORED place to talk about DIVING!
http://homecuda.com Emerald Coast: Buy / sell homes, cars, boats!
http://genesis3.blogspot.comMusings Of A Sentient Mind

On Thu, Jul 21, 2005 at 05:46:13PM +0200, Martin wrote:
 Robert Watson wrote:
 - ATA problems.  Many of these, while a symptom of bugs in the ATA code
   running without Giant, were very specific to timing, or divergent/poor
   ATA hardware.  As a result, they were difficult to reproduce in any
   environment but the original reporting environment.  The same hardware
   might perform fine in a FreeBSD developer's system.  Many of these
   problems have now been resolved, but some have not.  Often as not, the
   problems have to do with retrying requests to drives.
 
 My system is instable with latest -STABLE kernels, producing ATA DMA
 errors. I also think that this does have directly a connection to buggy
 ATA code. It seems it is something more general.
 
  As I mentioned,
   we believe the ATA code in 6.x is much more resilient, but right now
   what it needs is testing, not merging to 5.x yet.  Fixes require just 
   as
   much testing as any other change, since a fix for one issue may well
   trigger another issue, especially in the world of cheap PC hardware.
 
 This is true for me. RELENG_6 is great, but there are still annoying
 bugs which prevent me from migrating the system completely. I'm using
 FreeBSD mainly as desktop and I really need bktr(4) to work correctly.
 Then there is some trouble with ath(4) making my notebook unusable.
 
 To put it straight, there is no FreeBSD branch which works well
 for me since about 2 months. This is frustrating for me, but I try
 to have patience, because you do a great job and btw, I cannot
 imagine to use my PCs without FreeBSD.
 
 One more thing about cheap hardware: if you know that a piece of
 hardware is potentially buggy (I mean real BUGS and not missing
 support), please publish your opinion, because I will buy hardware
 FOR FREEBSD, so I avoid major problems. How about test suites for
 ACPI quality, e.g.? Would it be possible? There are people who spend
 time to test FOR YOU, you don't need to buy all the hardware in
 this world.
 
 Martin
 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to [EMAIL PROTECTED]
 
 
 %SPAMBLOCK-SYS: Matched [EMAIL PROTECTED], message ok


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

RE: Quality of FreeBSD

2005-07-21 Thread Alexey Yakimovich

First of all thank you very much all for your replies.
I just want to add some comments based on previous mails.
- I completely agree with MikeM - any kind of complex software could be
tested with right prepared test cases, specially if they are going to be
reused in the next release;
- if those problems happened to 5 branch, probably it would happened again
for 6 or 7, so why I have to switch to 6 right now? Is it because 5 will
never be fixed? Does word production mean something to FreeBSD project
now? 
- I remember some time ago you can stay on current all the time not worrying
that your box is crashed and didn't auto rebooted;
- chip hardware was always in use by FreeBSD, as far as I remember, or
something is changed recently, specially to US, and people buying only
expensive hardware. Probably it is no longer important to support chip
hardware because of more important FreeBSD clients like Yahoo or Apple use
real hardware, not the stupid one like ATA and they have these aggressive
project schedules. Believe me I know what aggressive project schedule
means, with long, long list of new features. It is important for such
companies like Yahoo only and I know why, because it's easy to sell useless
product with lots of new features than stable product with few ones. For
regular guy better to have some stable system running all the time and doing
real work (development or providing some service) than rebooting the box,
because of some new fancy feature. It's getting close to Windows right now.
- IBM, Yahoo, Intel, Apple ..., those guys are smart, having millions of
unpaid open source developers working on them. The problem is that some day
those projects will have theirs aggressive project schedules, then will
disappeared or changed to .com. So make sure you are still doing what you
like to do and you are having a fun of it.

Thanks,
Alexey

 -Original Message-
 From: Robert Watson [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, July 21, 2005 5:21 AM
 To: Marc Olzheim
 Cc: Alexey Yakimovich; freebsd-stable@FreeBSD.org
 Subject: Re: Quality of FreeBSD
 
 
 On Thu, 21 Jul 2005, Marc Olzheim wrote:
 
  Indeed. That's why my company started taking FreeBSD 5.3 in use for 
  production servers when it was out. Since then numerous 
 bugs were fixed, 
  some of which reported by us. Now that we're X bug fixes 
 later in time 
  and started to get a good feeling about the number of open 
 problems, it 
  is extremely annoying to hear the This will (probably) not 
 be fixed in 
  5.x statements. That conflicts with 'gradually get 
 resolved'. What do 
  you recommend larger consumers to do ? Keep using FreeBSD 4 
 and start 
  testing FreeBSD 6.x, dropping 5.x all together ?
 
  I know FreeBSD 5 was a strange exception in the relase 
 scheduling and 
  that a lot has been learned from it for the future and I'm 
 certainly not 
  unthankful for all the work that's done, but I'd like a 
 clear answer on 
  what to do now in regard to taking FreeBSD 5 into 'real' 
 production...
 
 Marc,
 
 I should start out by saying I appreciate your clear and concise bug 
 reports, and the list of your company's show-stopper 5.x bugs 
 has made the 
 rounds among FreeBSD developers.  I'm happy that at least one of the 
 issues on the list was fixed by me. :-)  As you probably saw 
 yesterday, 
 I've started bugging Poul-Henning to look at the pty problem you're 
 experiencing, and will get that on our 6.0 release 
 show-stopper list.  I 
 haven't yet had a chance to reproduce it locally, but it 
 sounds like that 
 should be straight forward.
 
 FreeBSD 5 has been an exception -- normally, in as much as major 
 releases have a normal, the set of new features is a lot 
 less agressive, 
 and it has been our goal with 6.x to restore the expectation 
 of a more 
 rapid release cycle with a less agressive feature set.  This 
 should reduce 
 the number of problems by virtue of reducing the level of change.  It 
 should also make it easier for users to pick what version to 
 run on, as 
 the amount of adaptation they have to do to slide forward a 
 version will 
 be greatly reduced.  I.e., right now it's relatively easy to 
 move back and 
 forward between 5.x and 6.x.
 
 With respect to 5.x vs 6.x upgrades: I've seen companies take two 
 different strategies.  Most of them have been at least 
 experimenting with 
 deploying 5.x, and are very interested in its feature set.  
 Support for 
 large file systems, 64-bit support on newer AMD and Intel hardware, 
 improved PAM support, etc.  Some of my customers are specifically 
 interested in the support for mandatory access control, but that's 
 obviously a less common feature request :-).  The biggest determining 
 factor for companies today comes from their own product 
 schedule, since 
 most big consumers of FreeBSD treat it as a component in a 
 product they 
 deliver for others.
 
 For example, my understanding is that Yahoo is now deploying 
 6.0 betas 
 across their server

Re: Serious issue with serial console in 5.4

2005-07-21 Thread Kris Kennaway

On Thu, Jul 21, 2005 at 10:56:54AM +0200, Eirik ?verby wrote:

 You might have to wait until 6.0-R since fixing it seems to require
 infrastructure changes that cannot easily be backported to 5.x.
 
 With all due respect - if this is (and I'm assuming it is, because it  
 happens on all the servers I'm serial-controlling) an omnipresent  
 problem on 5.x, I daresay it should warrant some more attention.  
 Having unsafe serial terminal support that can bring down your system  
 like that defies much of the point of having serial terminal support  
 in the first place.

It *has* received attention, and the conclusion was as above.  6.0 has
some significant TTY changes relative to 5.x, which probably cannot be
backported without disruption.

 However, since I seem to be the only one who has noticed this,  
 perhaps I'm the last person on earth to routinely use serial terminal  
 switches instead of KVM switches to do my admin work?

No, others have reported it too.

Kris


pgpyB1vKWDp2e.pgp
Description: PGP signature

Strange panic

2005-07-21 Thread Alexander S. Usov

Hi!

I have got a pair of very strange panics today.
I didn't saw the exact panic message for the firs one, as the X was running
at the time it has halted, and all I can say about it is that it was unable
to reboot and there are no coredump.

For the second one I saw a message (approximate -- I type it in from paper)
panic: sbflush_locked: cc 0 || mb 0xc1bfa600 || mbcnt 0, and it looks like
that as soon as it has tried to dump core it has got a second panic and
went to reboot. I am unsure if the machine was able to reboot
automatically, as I pressed a key to write down panic message. As in the
previous case there is no core, so I can't get any backtrace from it.

It all has happeden on 5.4-RELEASE-p3.

-- 
Best regards,
  Alexander.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Quality of FreeBSD

2005-07-21 Thread Matthias Schuendehuette


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi Robert,

Am 21.07.2005 um 13:00 schrieb Robert Watson:


  Have you tried, and do you plan to try, our 6.0 test releases before
  6.0-RELEASE goes out the door?  Specifically, on the hardware you  
know

  you're having problems with 5.4 on?


Yes, I did - see the thread mpt + gvinum on 6.0-BETA.

But I'm a bit disappointed, that until now there's not *one* reply on  
my report.


It's new hardware, which doesn't even boot with 5.3/5.4-RELEASE (but  
with 5.2.1 :-)

and probably a more popular Server (FUJITSU-SIEMENS RX300 S2)...

what was my fault here? Should I post to -current instead?

- -- 
Ciao/BSD - Matthias


Matthias Schuendehuettemsch [at] snafu.de, Berlin (Germany)
PGP-Key at pgp.mit.edu and wwwkeys.de.pgp.net ID: 0xDDFB0A5F

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.1 (Darwin)

iD8DBQFC3+wAf1BNcN37Cl8RAkgOAJ9uNrNXRdoQbn8CGKGnlp6e0+aTLwCdFrzU
MkbX3dKcLQhI0B2wgEN6j7w=
=Iaju
-END PGP SIGNATURE-
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Quality of FreeBSD

2005-07-21 Thread pcasidy

Hi all!

I have read this thread with a lot of interest and I have to
congratulate each of you for bringing calm, clever and interesting
answers.

I too felt that the quality of 5.x is not what I was used to but there
are new nice and promising features.

Having read most of all the emails it looks like to me that there is a
key element : the hardware. There are too many combinations on the
market to build an i386-like based platform.

Isn't it time to build a suggested hardware list or a hardware
blacklist?
I do not how to do that because maybe there is a high risk of being sued
by a company making bad hardware even under the right of free speech.
Perhaps it can be done by making a list of hardware company from which
FreeBSD has a good support (not saying good hardware but good feedback
on how to solve problems).
I know of the hardware vendor and supported hardware list but I am not
sure if it is up to date and I diddn't manage to get good use of it :
how has it really be tested on that hardware?

My main problem, and to others after seeing the question from times to
times, is to know which is a good (not necessarly the best) hardware to
run FreeBSD on?
When I buy a new motherboard, which chipset to choose/avoid, which controllers?

Twenty years ago, when you bought a computer (not a PC), the system
delivered with it used to work well or had known problems with
workarounds. Okay, there were simpler but in case of problems, it was
easy to try to reproduce and investigate the problem.

I am not saying we should choose one defined platform. I don't know if
it is feasible but having a list of hardware recommendations from which
we are sure to get good support from would be an added value.

As it is too hard to support every combination of hardware why not focus
on a few ones? Maybe the ones developpers have an esay access to?
If someone use another combination, no problem : he will have the same
support as today.

Thanks for reading my attempt to move forward.

Phil.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

RE: Quality of FreeBSD

2005-07-21 Thread Robert Watson



On Thu, 21 Jul 2005, Alexey Yakimovich wrote:


First of all thank you very much all for your replies.
I just want to add some comments based on previous mails.

- I completely agree with MikeM - any kind of complex software could be 
tested with right prepared test cases, specially if they are going to be 
reused in the next release;


The trick is balancing the investment of time in different areas, and 
motivating people to do the things that aren't enjoyable, don't receive 
much appreciation, etc. Testing is both difficult and time-consuming.  It 
works best when people are willing to dedicate all or more of their time 
to the task, since it requires the building of frameworks, the regular 
application of those tests, etc.  People who step forward to work 
consistently on testing and bug reporting, like Peter Holm, do the project 
an invaluable service.  And people like Marc Olzheim who take the time to 
evaluate the system thoroughly, work through the bug report and fix cycle, 
and have the patience to deal with situations where there aren't enough 
hours in the day to fix a problem make it all worthwhile.  It's easy to 
say that more testing should be done, but testing requires as much 
expertise in the internals of a piece of software as writing it, and far 
more time.


- if those problems happened to 5 branch, probably it would happened 
again for 6 or 7, so why I have to switch to 6 right now? Is it because 
5 will never be fixed? Does word production mean something to FreeBSD 
project now?


As has been discussed extensively in this thread and other threads, the 
FreeBSD development model typically addresses change at the tree HEAD, 
where the changes are tested and evaluated, and then they are back-ported. 
Some changes are low-risk, and are backported quickly (minor locking 
fixes, error handling, etc).  Others are higher risk, and are backported 
only when they are felt to have received sufficient testing (driver 
re-writes, structural changes).  Other changes are considered too large to 
ever back backported, as you might as well move the users forward as it 
will be less work and come to much the same thing (major architectural 
changes, such as SMPng, new hardware platforms, new kernel subsystems). 
I can't promise that every fix in HEAD (7.x) or the upcoming 6-STABLE 
branch will make it to 5-STABLE, because many of the changes there won't 
be appropriate for a backport, or would take so much work to backport that 
the time is better spent on other tasks.  However, the hope is to bring as 
many changes as is sensible back.


As we've already discussed, there are several important improvements 
germinating in 6.x, and many of them will be things that can and will be 
backported.  If you look at the network stack differences between 5.x and 
6.x, you'll find very few, because I and others have worked to agressively 
merge fixes, usually on a time lag of between one week and one month.  I 
know this is also true in other areas of the system.  If you're aware of 
changes that fix something in 6.x or 7.x that haven't been backported, and 
it's been over a month, please contact the developer to ask about a 
backport.


- I remember some time ago you can stay on current all the time not 
worrying that your box is crashed and didn't auto rebooted;


Certainly.  I also remember long periods of time where you didn't want to 
be running current unless you were a VM kernel hacker, such as leading up 
to the 3.x release cycle, or just after the introduction of background 
fsck in 5.x.  The 6.x/7.x HEAD branches have been quite on the stable side 
compared to the 3.x and 5.x development cycle, and my hope is they will 
remain that way.


- chip hardware was always in use by FreeBSD, as far as I remember, or 
something is changed recently, specially to US, and people buying only 
expensive hardware. Probably it is no longer important to support chip 
hardware because of more important FreeBSD clients like Yahoo or Apple 
use real hardware, not the stupid one like ATA and they have these 
aggressive project schedules. Believe me I know what aggressive 
project schedule means, with long, long list of new features. It is 
important for such companies like Yahoo only and I know why, because 
it's easy to sell useless product with lots of new features than stable 
product with few ones. For regular guy better to have some stable system 
running all the time and doing real work (development or providing some 
service) than rebooting the box, because of some new fancy feature. It's 
getting close to Windows right now.


All software development involves the balancing of risks and benefits. 
That's one of the reasons why the FreeBSD Project offers several 
development branches, which allow users to balance new features and long 
running stale source code.  Notice that we'll be supporting the 4.x 
branch for several years to come.  Of course, if you run 4.x, you won't be 
getting many new features, but it's a

Re: Serious issue with serial console in 5.4

2005-07-21 Thread Vivek Khera



On Jul 21, 2005, at 4:56 AM, Eirik Øverby wrote:

However, since I seem to be the only one who has noticed this,  
perhaps I'm the last person on earth to routinely use serial  
terminal switches instead of KVM switches to do my admin work?




no, there are plenty of us out here... i have two 16 port cyclades  
boxes I use for this purpose.  i've never run into this problem, but  
then I only have 3 boxes running FreeBSD 5.x and I almost never log  
into the console: only for OS upgrades or the extremely rare panic on  
one of the  a dual proc Opteron systems.


Vivek Khera, Ph.D.
+1-301-869-4449 x806


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Machine Replication

2005-07-21 Thread Eli K. Breen


All,

Does anyone have a good handle on how to replicate (read: image) a 
freebsd machine from one machine to an ostensibly similar machine?


So far I've used countless variations and combinations of the following:

dd  (Slow, not usefull if the hardware isn't identical?)
tar (Doesn't replicate MBR)
rsync   (No MBR support)
Norton Ghost(Doesn't support UFS/UFS2?)
G4U (little experience with this)

Now whether my details are a bit off, that's fine, I don't want this to 
be diluted in to discussion of minute frivolous details (as these things 
are wont to do), but what I _am_ looking for is a tried, tested and true 
method of FreeBSD machine replication, specifically for the 5.3+ releases.


Many thanks,

-E-
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Quality of FreeBSD

2005-07-21 Thread Karl Denninger

Ok, Robert, but then here's the question

How come the ATA code which was very stable in 4.x was screwed with in a
production release, breaking it, with no path backwards to the working
code?

This is a perfectly valid thing to do in -HEAD, where its heh, you know
this might go BOOM on you!  I've been told that before when reporting
problems with -HEAD, and while I might not have liked hearing it, its a
valid point of view.

But the same thing in a production release is an entirely different
matter, especially when it impacts MAINSTREAM hardware (the SII chipset 
is EXTREMELY common among SATA implementations, being on basically ALL
PCI plug-in boards, with Hitachi and Maxtor being hardly uncommon disks!)

I originally thought perhaps this was a Maxtor problem, given my past
history with them playing a bit fast and loose with the rules.  However,
when I replicated the problem on my Hitachi Deskstar drives that theory 
went out the window.

I understand your dissertation below, and agree with it.  However, this is
a case where code was tampered with in ways that broke things for a LOT of
people, myself included, on a PRODUCTION release, and was let loose with
inadequate testing.

It is NOT a situation where obscure, little-used hardware becomes
obsolete and thus ignored - eventually falling into ruin.  This is a
situation where current, in-service hardware on literally millions of
machines becomes suddenly unstable to unusable entirely with FreeBSD.

I understand and expect that if I run -HEAD, I'm asking for it.  I used to
do this on a fairly regular basis ANYWAY, since there were features I
NEEDED in certain environments, and while I did bitch from time to time,
and worked to find solutions when I could, in general this was an ok
path for me, with my own personal resources dedicated to testing and
evaluation on the specific hardware which I needed to use.

This is different.  The ATA problems are neither rare or difficult to
reproduce.  Indeed, on the PR I opened, I can take any of the SATA drives
I have (from two different manufacturers - Hitachi and Maxtor), put them
on ANY adapter using the most common (SII) chipset (Adaptec's and Bustek's
both tested) and get the same results - DMA errors when under any
significant load.  

It is trivially easy to reproduce the problem.

I came up with a patch to prevent the disconnects on a mirrored drive 
(but not the errors themselves) which then led to requests that I test 
a bunch of related patches - a request I begrudgingly complied with.  

Why begrudging?  Because the patch contemplated didn't address the problem
- it papered over it.  Now the errors still come, but they don't detach
the disk.  They DO severely impact performance though, and for
non-mirrored configurations the results might be data loss instead of a
complaint.  Since data corruption in these circumstances is very difficult
to detect until it has become catastrophic, I'm not about to attempt to 
provoke it on a production machine (which is likely the only way I could
identify WITH CERTAINTY that corruption has taken place.)

So what's going on here Robert?  The PR I filed is still open, it was filed 
on 2/17!  Last activity is from April 4th.  I first noted the issue on 1/31
and failing the note of any real resolution in the codebase forward, I
filed the PR on 2/17 after exhausting my own internal testing and remedy
process.

It is now the middle of July, the ticket is still open, and there is no
path out of this box that I can see.

I understand that there is concern that while ATA-GenX might fix this, it
might also break other things, and thus there is reluctance to MFC it back
into 5.x.  

That's a valid concern, but IMHO it misses the larger point.  

The question unaddressed is why the STABLE code in 4.x was abandoned before 
it was known that the replacement was AT LEAST as good as that which it
replaced!

This isn't a gnat - it was submitted as serious, and I meant that 
when I submitted it.  The only reason I didn't consider it critical and
high priority is that it doesn't hit EVERY configuration - but if it
hits yours, your system is severely impacted.

As things stand right now I'm not even sure WHAT codeset I can CVSUP and
test to have a decent shot at getting a FULLY working ATA/gmirror 
implementation.

--
-- 
Karl Denninger ([EMAIL PROTECTED]) Internet Consultant  Kids Rights Activist
http://www.denninger.netMy home on the net - links to everything I do!
http://scubaforum.org   Your UNCENSORED place to talk about DIVING!
http://homecuda.com Emerald Coast: Buy / sell homes, cars, boats!
http://genesis3.blogspot.comMusings Of A Sentient Mind

On Thu, Jul 21, 2005 at 08:00:40PM +0100, Robert Watson wrote:
 
 On Thu, 21 Jul 2005, Alexey Yakimovich wrote:
 
 First of all thank you very much all for your replies.
 I just want to add some comments based on previous mails.
 
 - I completely agree with MikeM - any kind of complex software could be

Re: Machine Replication

2005-07-21 Thread Karl Denninger

On Thu, Jul 21, 2005 at 12:20:34PM -0700, Eli K. Breen wrote:
 All,
 
 Does anyone have a good handle on how to replicate (read: image) a 
 freebsd machine from one machine to an ostensibly similar machine?
 
 So far I've used countless variations and combinations of the following:
 
 dd(Slow, not usefull if the hardware isn't identical?)
 tar   (Doesn't replicate MBR)
 rsync (No MBR support)
 Norton Ghost  (Doesn't support UFS/UFS2?)
 G4U   (little experience with this)
 
 Now whether my details are a bit off, that's fine, I don't want this to 
 be diluted in to discussion of minute frivolous details (as these things 
 are wont to do), but what I _am_ looking for is a tried, tested and true 
 method of FreeBSD machine replication, specifically for the 5.3+ releases.
 
 Many thanks,
 
 -E-

Define similar.

If the disk is compatable (target disk equal or larger in size than the 
source), you can use gmirror to image a machine, quiesce the machine,
force-detach the hardware (even hot-unplug it if supported) and boot the 
resulting disk (if you set up the gmirror system properly in the 
first place)

Not the fastest method, but it works and copies EVERYTHING.

There are other options but you need to be more specific as to what you
mean by similar.

--
-- 
Karl Denninger ([EMAIL PROTECTED]) Internet Consultant  Kids Rights Activist
http://www.denninger.netMy home on the net - links to everything I do!
http://scubaforum.org   Your UNCENSORED place to talk about DIVING!
http://homecuda.com Emerald Coast: Buy / sell homes, cars, boats!
http://genesis3.blogspot.comMusings Of A Sentient Mind


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Machine Replication

2005-07-21 Thread Gary Mulder

On Thu, 21 Jul 2005, Eli K. Breen wrote:

 All,
 
 Does anyone have a good handle on how to replicate (read: image) a 
 freebsd machine from one machine to an ostensibly similar machine?
 
 So far I've used countless variations and combinations of the following:
 
 dd(Slow, not usefull if the hardware isn't identical?)
 tar   (Doesn't replicate MBR)
 rsync (No MBR support)
 Norton Ghost  (Doesn't support UFS/UFS2?)
 G4U   (little experience with this)
 

Try dump and restore. They seem to be fast and reliable (although not
under Linux from all accounts).

I usually use tar and disklabel -B /dev/XXX out of habit, but have
found that tar doesn't honour the permissions on /tmp and /var/tmp. The
sticky bit is set on these two dirs, but the permissions are not set to
777. This has me wondering what other (dir) perms are not correctly set.

Gary


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Machine Replication

2005-07-21 Thread asym


At 15:20 7/21/2005, Eli K. Breen wrote:

All,

Does anyone have a good handle on how to replicate (read: image) a freebsd 
machine from one machine to an ostensibly similar machine?


So far I've used countless variations and combinations of the following:

dd(Slow, not usefull if the hardware isn't identical?)
tar   (Doesn't replicate MBR)
rsync   (No MBR support)
Norton Ghost  (Doesn't support UFS/UFS2?)
G4U   (little experience with this)


I've found a combination of dd + tar works great, as documented.

Stick the new drive in the box to be duplicated, use dd on the first 
(forget how many) sectors to copy the mbr and partition tables over, then 
use a tar pipe to copy from one drive to the other, preserving all perms 
and so forth.


Barring that, commercial single-disk duplicators aren't THAT 
expensive.  Hell you could just use a cheap raid card to raid-1 mirror the 
drive, then yank it out and toss it in another box, which I've done on 
occasion when pressed.



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Machine Replication

2005-07-21 Thread Mike Tancsa


At 03:20 PM 21/07/2005, Eli K. Breen wrote:

All,

Does anyone have a good handle on how to replicate (read: image) a freebsd 
machine from one machine to an ostensibly similar machine?


So far I've used countless variations and combinations of the following:

dd  (Slow, not usefull if the hardware isn't identical?)
tar (Doesn't replicate MBR)
rsync   (No MBR support)
Norton Ghost(Doesn't support UFS/UFS2?)
G4U (little experience with this)



g4u is a REALLY nice front end to dd basically, but works very well and is 
reasonably fast.


If you want fast,
dump | restore
as it will only copy data and ignore empty blocks.  You then just need to 
install the MBR which is easy to do via sysinstall if you are not 
comfortable disklabel


e.g.

cd /;dump -C 20 -0f - / | (cd /mnt/root-disk; restore -rf - )
cd /;dump -C 20 -0f - /usr | (cd /mnt/usr-disk; restore -rf - )

and so on.

---Mike 


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Quality of FreeBSD

2005-07-21 Thread Mike Tancsa


At 03:26 PM 21/07/2005, Karl Denninger wrote:

Ok, Robert, but then here's the question

How come the ATA code which was very stable in 4.x was screwed with in a
production release, breaking it, with no path backwards to the working
code?


I understand your frustration, but others would argue if the changes were 
not made that would say (and have) How come modern and common hardware 
like  do not work with FreeBSD.  The driver is old and unmaintained and 
does not support feature Y.  I dont see Soren's work as screwing with 
production drivers as opposed to him re-writing them to take advantage of 
modern hardware designs.  Unfortunately along the way some things might 
break.  They have for me, but that sometimes happens in open source (and 
commercial code too for that matter).


---Mike 


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Machine Replication

2005-07-21 Thread Dan Mack


On Thu, 21 Jul 2005, Eli K. Breen wrote:


All,

Does anyone have a good handle on how to replicate (read: image) a freebsd 
machine from one machine to an ostensibly similar machine?


So far I've used countless variations and combinations of the following:

dd  (Slow, not usefull if the hardware isn't identical?)
tar (Doesn't replicate MBR)
rsync   (No MBR support)
Norton Ghost(Doesn't support UFS/UFS2?)
G4U (little experience with this)


snip

Is there a jumpstart (solaris), kickstart (redhat linux), roboinst (irix),
or ignite (hpux) like auto-installer for BSD?

If there was, then I wouldn't image the disk at all, I'd instead setup up 
custom network images that I could blast to any system just by pxebooting 
it.  I'm not sure if it is possible with FreeBSD though, anyone?


Dan
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Quality of FreeBSD

2005-07-21 Thread Matthias Buelow

[EMAIL PROTECTED] writes:

My main problem, and to others after seeing the question from times to
times, is to know which is a good (not necessarly the best) hardware to
run FreeBSD on?
When I buy a new motherboard, which chipset to choose/avoid, which controllers
?

Maybe some website like it is being done for notebooks (with
Linux/FreeBSD support) would be in order. I'm thinking about something
like http://www.linux-laptop.net/, only for FreeBSD and all kinds of
machines, not just notebooks. (Or, if some collaboration would be ok,
for *BSD in general, with people posting experience from NetBSD,
OpenBSD, Dragonfly, even Darwin aswell. That way one could also compare
support for hardware and see what problems the individual systems have.)

Make it a Wiki, or something similar, where people can freely post
experiences they have with their hardware. That could be whole machines
(Dell model xxx desktop, IBM yyy laptop, HP zzz server) aswell as
components (Asus blah motherboard, 3Com wlan card model foobar, etc.)
and make the thing searchable, and perhaps allow one to post comments on
entries (easy with a Wiki). That way people can quickly search  review
hardware, awell as test suggested workarounds by the posters, without
having to google for obscured mailing list entries, or problem reports.

mkb.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Quality of FreeBSD

2005-07-21 Thread Paul Mather

On Thu, 2005-07-21 at 14:26 -0500, Karl Denninger wrote:

 Ok, Robert, but then here's the question
 
 How come the ATA code which was very stable in 4.x was screwed with in a
 production release, breaking it, with no path backwards to the working
 code?

Not to mention that this happened during the 5.x release cycle.  It's
one thing to have a regression creep in when moving from one major
release to another (e.g., oh, that's the fallout from introducing Big
Feature XYZ or a big architectural revamp may have broken some
things), but it's another thing entirely to have it happen between
minor releases, which are supposed to be evolution, not revolution.

(Although the whole Early Adopter status for early 5.x releases might
mean all that is muddied when it comes to the 5.x series.)

My main disappointment with the ATA DMA TIMEOUT bug is not that it crept
in (these things happen), but that it did not seem to be taken seriously
when it had done so.  (Though, as Robert said, if the developers can't
reproduce the problem, it's hard for them to work on and fix it.)

Cheers,

Paul.
-- 
e-mail: [EMAIL PROTECTED]

Without music to decorate it, time is just a bunch of boring production
 deadlines or dates by which bills must be paid.
--- Frank Vincent Zappa
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Machine Replication

2005-07-21 Thread Eli K. Breen

Just as a point of note, I'm not trying to roll out squeeky-clean new 
machines. Let's say I've got ten-fifteen sets of clusters, I need to be 
able to just rip a copy and blast it to another machine.


Thanks for all the responses so far.

-E-

Dan Mack wrote:

On Thu, 21 Jul 2005, Eli K. Breen wrote:


All,

Does anyone have a good handle on how to replicate (read: image) a 
freebsd machine from one machine to an ostensibly similar machine?


So far I've used countless variations and combinations of the following:

dd(Slow, not usefull if the hardware isn't identical?)
tar(Doesn't replicate MBR)
rsync(No MBR support)
Norton Ghost (Doesn't support UFS/UFS2?)
G4U(little experience with this)



snip

Is there a jumpstart (solaris), kickstart (redhat linux), roboinst (irix),
or ignite (hpux) like auto-installer for BSD?

If there was, then I wouldn't image the disk at all, I'd instead setup 
up custom network images that I could blast to any system just by 
pxebooting it.  I'm not sure if it is possible with FreeBSD though, anyone?


Dan

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Quality of FreeBSD

2005-07-21 Thread Karl Denninger

On Thu, Jul 21, 2005 at 03:51:13PM -0400, Mike Tancsa wrote:
 At 03:26 PM 21/07/2005, Karl Denninger wrote:
 Ok, Robert, but then here's the question
 
 How come the ATA code which was very stable in 4.x was screwed with in a
 production release, breaking it, with no path backwards to the working
 code?
 
 I understand your frustration, but others would argue if the changes were 
 not made that would say (and have) How come modern and common hardware 
 like  do not work with FreeBSD.  The driver is old and unmaintained 
 and does not support feature Y.  I dont see Soren's work as 
 screwing with production drivers as opposed to him re-writing them to 
 take advantage of modern hardware designs.  Unfortunately along the way 
 some things might break.  They have for me, but that sometimes happens in 
 open source (and commercial code too for that matter).
 
 ---Mike 

ATA-NG (Soren's new code) is not (from what I understand) in the 5.x 
codebase.  One bone of contention is that apparently it IS in -HEAD, but 
there are no plans to MFC it to 5.x. 

My understanding is that the 5.x code is a half-baked version of ATA-NG,
and IMHO it had no business going into a PRODUCTION release in the state
that it was pushed over.

The decision path on including half a loaf in this case is not something I 
was privvy to - but I've certainly been privvy to the results!  I fought
with unsolicited detachments of drives claimed to be defective (when
they were and are not) and several crashes when the only remaining good
device on the mirror was also declared bad - some of which came with
filesystem data corruption - for over a month before I came up with a
configuration that gives me both RAID 1 data protection and REASONABLE
stability (meaning I have uptimes which are not controlled by unsolicited 
crashes!)

I am however VERY leery of following -STABLE, since there are reports here
on the list that more recent versions than what I'm running may have
regressed once again.  

I DEFINITELY do not want to go through what I did back in the first part 
of the year again.

Given that we were all strongly encouraged to upgrade to 5.x for production
machines a few months ago it was a truly ugly surprise to find that current 
production hardware which ran just fine on 4.x was hosed to the point of 
unusability with 5.x as a consequence of serious (some would say CRITICAL)
driver issues.

Whether the full ATA-NG code actually fixes the problem is (to me anyway)
unknown - but I am not about to devote a bunch of testing time to it when
its in a codebase that I can't run AND it has been stated that there is 
no intent to MFC it.

Now if there was a commitment to MFC the code I would be happy to engage 
in testing against -HEAD, and see if I can provoke the same sort of 
misbehavior I get on 5.x.

Without that commitment, however, testing it is fruitless for me, since 
I have no path out of the box I'm in other than sit on hands and wait an
indeterminate amount of time, and this testing involves a significant
time commitment - I not only have to replicate the 5.x production machines 
I've got in the field that have had trouble (not too hard), I also have to 
generate a synthetic load sufficient to know if the problem is truly 
resolved or not (that will take some effort.)

I've come up with a workaround that is functional for my production
systems, but that workaround came only with a huge time investment and 
IMHO this is a stability defecit that simply should not have happened.  

In the time I've run FreeBSD (going back a LONG ways, including using it
as the OS of choice behind a major regional ISP in the mid-late 90s) this 
is the worst instance of regression in terms of stability across purported 
RELEASE versions I've seen - for it to be poo-pooed and outstanding
PRs effectively ignored for six months is IMHO quite a black eye event.

--
-- 
Karl Denninger ([EMAIL PROTECTED]) Internet Consultant  Kids Rights Activist
http://www.denninger.netMy home on the net - links to everything I do!
http://scubaforum.org   Your UNCENSORED place to talk about DIVING!
http://homecuda.com Emerald Coast: Buy / sell homes, cars, boats!
http://genesis3.blogspot.comMusings Of A Sentient Mind


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Machine Replication

2005-07-21 Thread Karl Denninger

I had a shell script that would replicate a machine when I ran my ISP; you
put the loader and partition table, plus a minimal system on the new machine,
then ran the script and pointed it at the source.

UUUPPP!  In about 20 minutes it was done.

Not hard to do at all with a simple shell script.

Used this all the time to push new OS versions out to the cluster (a
couple of dozen machines) when I was done testing them as well as
adding new machines to the existing cluster as demand warranted.

--
-- 
Karl Denninger ([EMAIL PROTECTED]) Internet Consultant  Kids Rights Activist
http://www.denninger.netMy home on the net - links to everything I do!
http://scubaforum.org   Your UNCENSORED place to talk about DIVING!
http://homecuda.com Emerald Coast: Buy / sell homes, cars, boats!
http://genesis3.blogspot.comMusings Of A Sentient Mind

On Thu, Jul 21, 2005 at 03:04:01PM -0500, Dan Mack wrote:
 On Thu, 21 Jul 2005, Eli K. Breen wrote:
 
 All,
 
 Does anyone have a good handle on how to replicate (read: image) a 
 freebsd machine from one machine to an ostensibly similar machine?
 
 So far I've used countless variations and combinations of the following:
 
 dd   (Slow, not usefull if the hardware isn't identical?)
 tar  (Doesn't replicate MBR)
 rsync(No MBR support)
 Norton Ghost (Doesn't support UFS/UFS2?)
 G4U  (little experience with this)
 
 snip
 
 Is there a jumpstart (solaris), kickstart (redhat linux), roboinst (irix),
 or ignite (hpux) like auto-installer for BSD?
 
 If there was, then I wouldn't image the disk at all, I'd instead setup up 
 custom network images that I could blast to any system just by pxebooting 
 it.  I'm not sure if it is possible with FreeBSD though, anyone?
 
 Dan
 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to [EMAIL PROTECTED]
 
 
 %SPAMBLOCK-SYS: Matched [EMAIL PROTECTED], message ok


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Quality of FreeBSD

2005-07-21 Thread Karl Denninger

On Thu, Jul 21, 2005 at 04:12:47PM -0400, Paul Mather wrote:
 On Thu, 2005-07-21 at 14:26 -0500, Karl Denninger wrote:
 
  Ok, Robert, but then here's the question
  
  How come the ATA code which was very stable in 4.x was screwed with in a
  production release, breaking it, with no path backwards to the working
  code?
 
 Not to mention that this happened during the 5.x release cycle.  It's
 one thing to have a regression creep in when moving from one major
 release to another (e.g., oh, that's the fallout from introducing Big
 Feature XYZ or a big architectural revamp may have broken some
 things), but it's another thing entirely to have it happen between
 minor releases, which are supposed to be evolution, not revolution.
 
 (Although the whole Early Adopter status for early 5.x releases might
 mean all that is muddied when it comes to the 5.x series.)
 
 My main disappointment with the ATA DMA TIMEOUT bug is not that it crept
 in (these things happen), but that it did not seem to be taken seriously
 when it had done so.  (Though, as Robert said, if the developers can't
 reproduce the problem, it's hard for them to work on and fix it.)
 
 Cheers,
 
 Paul.
 -- 
 e-mail: [EMAIL PROTECTED]

My main disappointment is that it STILL isn't being taken seriously, six
months down the road.

My PR, for instance, is still open - as well it should be, as the 
DMA_TIMEOUT bug still exists.  Fixing the retry code so that the 
transaction is actually retried up to three times (instead of causing 
the disk to be declared broken on the first instance) IS NOT A FIX.

The problem is very easy to reproduce; I have put forward the exact 
configuration necessary to do so in the original PR.  I have since 
discovered (and others have reported) that it is not particularly 
sensitive to the exact hardware involved - basically any SII chipset
PCI SATA adapter (which is like all of the basic ones, including the
Adaptec and Bustek) with a pair of SATA disks appears to be all that is
required.

--
-- 
Karl Denninger ([EMAIL PROTECTED]) Internet Consultant  Kids Rights Activist
http://www.denninger.netMy home on the net - links to everything I do!
http://scubaforum.org   Your UNCENSORED place to talk about DIVING!
http://homecuda.com Emerald Coast: Buy / sell homes, cars, boats!
http://genesis3.blogspot.comMusings Of A Sentient Mind


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Quality of FreeBSD

2005-07-21 Thread Garance A Drosihn


At 8:50 AM -0400 7/21/05, MikeM wrote:

On 7/21/2005 at 8:29 PM Daniel O'Connor wrote:
|
| I think the best way to rectify this is to test RC candidates
| on YOUR hardware.. This finds the bugs you need fixed at a
| time when people are very receptive to fixing them.
|
| It's not realistic for the release engineer to test on a lot
| of hardware as they are very busy doing other things.
 =

Your comment presupposes that most of the bugs are specific to
one piece of hardware, I doubt that is a valid assertion.  I
would offer that most of the bugs are not present in source code
specific to a certain piece of hardware, ...


Some problems are not tied to one specific piece of hardware, but
to the combination of different hardware.  I also went through a
lot of pain with ATA problems for awhile there, and I was fed up
enough that I tried to buy my way out of the problem.  I ended up
with three different SATA controllers, and two different SATA
hard disks.

The thing was, the problems I saw depended on the *combination*
of a hard disk and SATA controller.  My real-SATA hard drive
would fail (in some ways) when connected to one SATA controller,
but not to the other.  And my fake-SATA drive would *work* on
the controller which the real-sata drive failed on, but fail
on the controller the real-sata drive worked on!

There is no question that this was infuriating for me, so I can
sympathize with your frustration.  But I helped Søren get some
hardware he needed for testing, and things gradually improved.
But the problems weren't specific to the hard drive I was using,
or the SATA controller I was using.  They depended on the
combination of pieces that were in my PC.


Once a bug is reported, and that bug can be reproduced on the
hardware of the development team, then that bug should not
reappear again,


In my case, the development team needed to *buy* hardware to
reproduce some of the problems I was seeing.  But their hardware
still isn't *exactly* the same as mine.  So, they made some fixes
which solved problems on their hardware and (happily) on mine.
But it is certainly possible for some future change to work
perfectly fine on their hardware, and *not* work on mine.  There
is still no substitute for testing on your hardware, with some
sort of real-world loads.  The project, as such, simply can not
test all combinations of hardware, on all kinds of real-world
loads.  Even if we had a huge collection of PC's to test on,
we're not necessarily going to throw the same kinds of loads
on those machines as you deal with.

I should note that *all* of my SATA-based hardware is stuff that
was not supported at all under 4.x.  So it's awkward for me to
complain too loudly, because I *do* want SATA, and the only way
for FreeBSD to support these new controllers was to make changes
to some previously-working code.

--
Garance Alistair Drosehn=   [EMAIL PROTECTED]
Senior Systems Programmer   or  [EMAIL PROTECTED]
Rensselaer Polytechnic Instituteor  [EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

RE: Machine Replication

2005-07-21 Thread Andresen,Jason R.

From: [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf Of Eli K. Breen
Sent: Thursday, July 21, 2005 3:21 PM
To: freebsd-stable@freebsd.org
Subject: Machine Replication

All,

Does anyone have a good handle on how to replicate (read: image) a 
freebsd machine from one machine to an ostensibly similar machine?

So far I've used countless variations and combinations of the 
following:

dd (Slow, not usefull if the hardware isn't identical?)
tar(Doesn't replicate MBR)
rsync  (No MBR support)
Norton Ghost   (Doesn't support UFS/UFS2?)
G4U(little experience with this)

If you need stuff replicated fast and you don't mind a bit of setup,
there is emulab http://www.emulab.net/.  I can push out new images to
machines in less than 10 minutes including the time it takes to reboot
twice (once into the imager and once back to the OS).  

You may need to use UFS1 for your filesystems though, I don't know if
the imager can handle UFS2 yet.  We use UFS1 here just to be safe.  
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Machine Replication

2005-07-21 Thread Francois Tigeot

On Thu, Jul 21, 2005 at 12:20:34PM -0700, Eli K. Breen wrote:
 
 Does anyone have a good handle on how to replicate (read: image) a 
 freebsd machine from one machine to an ostensibly similar machine?

[...]

 Now whether my details are a bit off, that's fine, I don't want this to 
 be diluted in to discussion of minute frivolous details (as these things 
 are wont to do), but what I _am_ looking for is a tried, tested and true 
 method of FreeBSD machine replication, specifically for the 5.3+ releases.

I have found the following paper to be incredibly usefull :

http://www.pix.net/software/pxeboot/archive/SANE.pdf

I used some of the ideas in it to clone machines in the 5.1-5.2 era.

-- 
Francois Tigeot, CEO, Zefyris
http://www.zefyris.com/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Quality of FreeBSD

2005-07-21 Thread Mark Linimon

On Thu, Jul 21, 2005 at 08:00:40PM +0100, Robert Watson wrote:
 [original poster wrote:]
 - I completely agree with MikeM - any kind of complex software could be
 tested with right prepared test cases, specially if they are going to be
 reused in the next release;

For static problems -- yes.  For dynamic problems, such as race conditions,
the problem space you are trying to test is many orders of magnitude more
complex.  This is true of any engineering discipline, but much more so
with software engineering due to the immense complexity of the constructed
artifacts.

 [rwatson again:]
 As has been discussed extensively in this thread and other threads, the
 FreeBSD development model typically addresses change at the tree HEAD,
 where the changes are tested and evaluated, and then they are back-ported.
 Some changes are low-risk, and are backported quickly (minor locking
 fixes, error handling, etc).  Others are higher risk, and are backported
 only when they are felt to have received sufficient testing (driver
 re-writes, structural changes).  Other changes are considered too large
 to ever be backported [ ... ]

To add to Robert's comments, there was at least one case during the 5.2
cycle where a large backport was made that destabilized the tree for quite
some time.  This was not due to any lack of diligence on the developer's
part; it turned out that the problems were far more subtle and complex
than anyone could have reasonably anticipated.  Since that time, AFAICT
the sentiment has shifted away from large backports.

There is always risk in any backport and the risks escalate dramatically
the less compartmentalized the changes are.  One of the goals for 6.X
and beyond is to try to keep changes more compartmentalized; there was
simply no way to do such a thing with e.g. SMP and VM changes.  At the
same time, the sentiment seems to be let's debug one set of featureset
changes all together and then release them as a major release.

Of course, backports also require developer time both to do the initial
commit and then, more onerously, the followup support.

To conclude this thought, the motivation for changing the way FreeBSD
is going to do releases going forwards is to try to mitigate such
problems: to try to debug, and release, a smaller set of features with
new major releases, and more frequently, and with a better-known
schedule (every 18 months).

 Notice that we'll be supporting the 4.x branch for several years to come.

The limiting factor on the 4.X branch is going to be the ports tree
more quickly than the base system, particularly for people running
desktop installations.  The FreeBSD GNOME team has already announced
that they are not going to support 4.X by default in the next major
GNOME release due this fall.  The next major KDE release will probably
not work on the 4.X gcc compiler as well IIUC.  There are simply an
insufficient amount of developer resources to support releases that
have different toolchains, include files, and so on.

Staying on 4.X indefinitely is not going to be an option at some point
in the future, but when, exactly, is difficult to tell right now.  It
is fair to note, however, that almost no developer attention is being
spent on 4.X except for security problems as they are found.

Further, the more people we have stay on 4.X, the less people we have
testing whichever release we consider the latest stable release, and
therefore, the less bugs we'll get fixed on that release.

One last thought.  It always bears repeating that, except for a handful
of cases, people who work on FreeBSD are not being paid to do so.  Users
should always adjust their expectations accordingly.  We do our absolute
best given the relatively small number of developers that we do have,
but we always need more people who are willing to work on regression
testing and QA activities.  For the companies which view FreeBSD as
'mission-critical' (and we do welcome them!), I challenge them to
consider funding development/testing efforts going forwards.  (Yes, a
number already do, but more would be welcome.)

mcl
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

What to do when panic?

2005-07-21 Thread Kövesdán Gábor


Hello,

I've never debugged FreeBSD, but now I've decided to help the testing 
process of  FreeBSD 6. I installed it, and then I had a panic. I got a 
debugger prompt, but I don't know what to do with that. I don't know the 
debugger commands. Please let me know what should I do when I have an 
another panic. What should I type and what kind of information should I 
send as a PR.


Thanks,

Gábor Kövesdán
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Silent crash on FreeBSD 6.0-BETA1

2005-07-21 Thread Kövesdán Gábor


Hi,

I've installed FreeBSD 6.0-BETA1 and if I use more consoles I have a 
silent crash. The cursor won't move and I can't change back to another 
console. It has happened three times so far when I was using two 
consoles. (I was using make + ee in the first two cases and in the third 
case cvsup + less.) How can I find out what's wrong? I suspect it is 
some kind of hardware support issue since I have a fairly new PC.


Cheers,

Gábor Kövesdán
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Machine Replication

2005-07-21 Thread Eli K. Breen

I should point out, this is for replication in a running production 
environment. Machines cannot be taken down, and swapping hardware is not 
an option.


I'm currently experimenting with a copy of the MBR, and the root 
partition on a CD, with enough tools to attach to the network to 
retrieve images of the rest of the partitions (which can be taken as 
current snapshots from various servers).


This _should_ result in the following scenario:

Boot new machine with CD
partition drive(s)
dump MBR
dump root
ssh [EMAIL PROTECTED] 'dump -C 64 -0af - /sliceX'| (cd /usr; restore -rf -)
[repeat above for all drives, could be automated]

Seem reasonable?

-E-


Elliot Finley wrote:
- Original Message - 
From: Francois Tigeot [EMAIL PROTECTED]



On Thu, Jul 21, 2005 at 12:20:34PM -0700, Eli K. Breen wrote:


Does anyone have a good handle on how to replicate (read: image) a
freebsd machine from one machine to an ostensibly similar machine?


[...]



Now whether my details are a bit off, that's fine, I don't want this to
be diluted in to discussion of minute frivolous details (as these things
are wont to do), but what I _am_ looking for is a tried, tested and true
method of FreeBSD machine replication, specifically for the 5.3+


releases.


I have found the following paper to be incredibly usefull :

http://www.pix.net/software/pxeboot/archive/SANE.pdf

I used some of the ideas in it to clone machines in the 5.1-5.2 era.



You could also just mirror the drive with a Promise RAID 1 card.  I've done
that a couple of times and it works really well.

Elliot


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Machine Replication

2005-07-21 Thread Danny Howard

On Thu, Jul 21, 2005 at 03:04:01PM -0500, Dan Mack wrote:
 snip
 
 Is there a jumpstart (solaris), kickstart (redhat linux), roboinst (irix),
 or ignite (hpux) like auto-installer for BSD?

No.  g4u and a script might do a good job for you if your hardware is
mostly similar.

 If there was, then I wouldn't image the disk at all, I'd instead setup up 
 custom network images that I could blast to any system just by pxebooting 
 it.  I'm not sure if it is possible with FreeBSD though, anyone?

It is possible.  I have done it before.  I had some of those funky VA
Linux machines which need the dongle boxes to support video and keyboard.
I had them booting from hard drive or DHCP, and if I wanted to re-image
a machine I just had to clobber the MBR and reboot. :)

Setting up the disk partition with sysinstall was the biggest bitch.

If I were to set up a system like this again, I might do something with
g4u to set out the basic systems, with an rc script that can pull a
post-install recipe which does things like growfs /usr/local, and do
machine-specific customization.  Then PUBLISH your work before you get
laid off.  (That is how my last efforts were concluded.)

Cheers,
-danny

-- 
http://dannyman.toldme.com/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: What to do when panic?

2005-07-21 Thread Scot Hetzel

On 7/21/05, Kövesdán Gábor [EMAIL PROTECTED] wrote:
  FreeBSD 6. I installed it, and then I had a panic. I got a
 debugger prompt, but I don't know what to do with that. I don't know the
 debugger commands. Please let me know what should I do when I have an
 another panic. What should I type and what kind of information should I
 send as a PR.
 
Look at the FreeBSD Developer HandBook on debugging:

http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/debugging.html

http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html

Scot

-- 
DISCLAIMER:
No electrons were mamed while sending this message. Only slightly bruised.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Machine Replication

2005-07-21 Thread Karl Denninger

Yep.  Pretty much what I used to do with my ISP.

--
-- 
Karl Denninger ([EMAIL PROTECTED]) Internet Consultant  Kids Rights Activist
http://www.denninger.netMy home on the net - links to everything I do!
http://scubaforum.org   Your UNCENSORED place to talk about DIVING!
http://homecuda.com Emerald Coast: Buy / sell homes, cars, boats!
http://genesis3.blogspot.comMusings Of A Sentient Mind

On Thu, Jul 21, 2005 at 03:03:31PM -0700, Eli K. Breen wrote:
 I should point out, this is for replication in a running production 
 environment. Machines cannot be taken down, and swapping hardware is not 
 an option.
 
 I'm currently experimenting with a copy of the MBR, and the root 
 partition on a CD, with enough tools to attach to the network to 
 retrieve images of the rest of the partitions (which can be taken as 
 current snapshots from various servers).
 
 This _should_ result in the following scenario:
 
 Boot new machine with CD
 partition drive(s)
 dump MBR
 dump root
 ssh [EMAIL PROTECTED] 'dump -C 64 -0af - /sliceX'| (cd /usr; restore -rf -)
 [repeat above for all drives, could be automated]
 
 Seem reasonable?
 
 -E-
 
 
 Elliot Finley wrote:
 - Original Message - 
 From: Francois Tigeot [EMAIL PROTECTED]
 
 On Thu, Jul 21, 2005 at 12:20:34PM -0700, Eli K. Breen wrote:
 
 Does anyone have a good handle on how to replicate (read: image) a
 freebsd machine from one machine to an ostensibly similar machine?
 
 [...]
 
 
 Now whether my details are a bit off, that's fine, I don't want this to
 be diluted in to discussion of minute frivolous details (as these 
 things
 are wont to do), but what I _am_ looking for is a tried, tested and 
 true
 method of FreeBSD machine replication, specifically for the 5.3+
 
 releases.
 
 I have found the following paper to be incredibly usefull :
 
 http://www.pix.net/software/pxeboot/archive/SANE.pdf
 
 I used some of the ideas in it to clone machines in the 5.1-5.2 era.
 
 
 You could also just mirror the drive with a Promise RAID 1 card.  I've 
 done
 that a couple of times and it works really well.
 
 Elliot
 
 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to [EMAIL PROTECTED]
 
 
 %SPAMBLOCK-SYS: Matched [EMAIL PROTECTED], message ok


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Machine Replication

2005-07-21 Thread Andrea Campi

On Thu, Jul 21, 2005 at 03:04:01PM -0500, Dan Mack wrote:
 Is there a jumpstart (solaris), kickstart (redhat linux), roboinst (irix),
 or ignite (hpux) like auto-installer for BSD?
 
 If there was, then I wouldn't image the disk at all, I'd instead setup up 
 custom network images that I could blast to any system just by pxebooting 
 it.  I'm not sure if it is possible with FreeBSD though, anyone?

Well, sysinstall is perfectly capable of doing this, as it's fully
scriptable. You can setup a pxeboot-install environment by setting up
a dhcp/tftp/nfs server, copying a standard release CD, and creating a
simple config file. I don't have exact details handy, but I know it's
possible as it's the way we've been pressing FreeBSD boxes at $REALJOB
for ages. You'll find plenty (well, some) of information in the archives
as well.

Bye,
Andrea

-- 
   Press every key to continue.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

RE: Quality of FreeBSD

2005-07-21 Thread Alexey Yakimovich

Even for dynamic problems you can have your code generating detailed logs,
including time, pid, thread id, cpu, function, memory ..., and have them
analyzed later by some script.
But this not my main point here, in this thread.

All thoughts in the mails of this thread, developers as well as users, seem
to me so right, so true.
But I would like to repeat my main point:
From my personal experience, maybe I'm wrong, but what I see close to me,
FreeBSD project is loosing a lot of users, I don't know anything about
developers, but it seems to me true too. No users no developers no project.
And the main problem seems to me is a quality, at least from users point of
view.
I don't know what caused this problem. But in my opinion, it would be good
to try to re-evaluate goals of the project, including small ones like GNOME
(how many people using FreeBSD as desktop? Do you know any real world
desktop solutions, except for OS X or Windows?). If you want to grab
everything you would probably have nothing. And if car's engine does not
work, why we need GPS inside? 

Thank you very much again for your time.
I really appreciate it.

Alexey
FreeBSD user

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Mark Linimon
 Sent: Thursday, July 21, 2005 2:15 PM
 To: Robert Watson
 Cc: 'Marc Olzheim'; Alexey Yakimovich; freebsd-stable@freebsd.org
 Subject: Re: Quality of FreeBSD
 
 On Thu, Jul 21, 2005 at 08:00:40PM +0100, Robert Watson wrote:
  [original poster wrote:]
  - I completely agree with MikeM - any kind of complex 
 software could be
  tested with right prepared test cases, specially if they 
 are going to be
  reused in the next release;
 
 For static problems -- yes.  For dynamic problems, such as 
 race conditions,
 the problem space you are trying to test is many orders of 
 magnitude more
 complex.  This is true of any engineering discipline, but much more so
 with software engineering due to the immense complexity of 
 the constructed
 artifacts.
 
  [rwatson again:]
  As has been discussed extensively in this thread and other 
 threads, the
  FreeBSD development model typically addresses change at the 
 tree HEAD,
  where the changes are tested and evaluated, and then they 
 are back-ported.
  Some changes are low-risk, and are backported quickly (minor locking
  fixes, error handling, etc).  Others are higher risk, and 
 are backported
  only when they are felt to have received sufficient testing (driver
  re-writes, structural changes).  Other changes are 
 considered too large
  to ever be backported [ ... ]
 
 To add to Robert's comments, there was at least one case 
 during the 5.2
 cycle where a large backport was made that destabilized the 
 tree for quite
 some time.  This was not due to any lack of diligence on the 
 developer's
 part; it turned out that the problems were far more subtle and complex
 than anyone could have reasonably anticipated.  Since that 
 time, AFAICT
 the sentiment has shifted away from large backports.
 
 There is always risk in any backport and the risks escalate 
 dramatically
 the less compartmentalized the changes are.  One of the goals for 6.X
 and beyond is to try to keep changes more compartmentalized; there was
 simply no way to do such a thing with e.g. SMP and VM changes.  At the
 same time, the sentiment seems to be let's debug one set of 
 featureset
 changes all together and then release them as a major release.
 
 Of course, backports also require developer time both to do 
 the initial
 commit and then, more onerously, the followup support.
 
 To conclude this thought, the motivation for changing the way FreeBSD
 is going to do releases going forwards is to try to mitigate such
 problems: to try to debug, and release, a smaller set of features with
 new major releases, and more frequently, and with a better-known
 schedule (every 18 months).
 
  Notice that we'll be supporting the 4.x branch for several 
 years to come.
 
 The limiting factor on the 4.X branch is going to be the ports tree
 more quickly than the base system, particularly for people running
 desktop installations.  The FreeBSD GNOME team has already announced
 that they are not going to support 4.X by default in the next major
 GNOME release due this fall.  The next major KDE release will probably
 not work on the 4.X gcc compiler as well IIUC.  There are simply an
 insufficient amount of developer resources to support releases that
 have different toolchains, include files, and so on.
 
 Staying on 4.X indefinitely is not going to be an option at some point
 in the future, but when, exactly, is difficult to tell right now.  It
 is fair to note, however, that almost no developer attention is being
 spent on 4.X except for security problems as they are found.
 
 Further, the more people we have stay on 4.X, the less people we have
 testing whichever release we consider the latest stable release, and
 therefore, the less bugs we'll get fixed on that

Re: Quality of FreeBSD

2005-07-21 Thread Robert Watson



On Thu, 21 Jul 2005, Karl Denninger wrote:

ATA-NG (Soren's new code) is not (from what I understand) in the 5.x 
codebase.  One bone of contention is that apparently it IS in -HEAD, but 
there are no plans to MFC it to 5.x.


Then you misunderstand.  Soren has asked to MFC it, and we've asked him to 
wait until it's had more testing exposure, precisely because it is a 
sensitive code base, and we don't want to see further regression.


Robert N M Watson
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

RE: Quality of FreeBSD

2005-07-21 Thread Robert Watson



On Thu, 21 Jul 2005, Alexey Yakimovich wrote:

Even for dynamic problems you can have your code generating detailed 
logs, including time, pid, thread id, cpu, function, memory ..., and 
have them analyzed later by some script. But this not my main point 
here, in this thread.


Instrumentation is very expensive at run-time, and substantially changes 
timing, especially in the network stack and network-related device 
drivers, so will often close race conditions by changing the timing.  We 
have an extensive instrumentation system named KTR(9).  If you're 
interested in giving it a try, you can find out more here:


http://www.watson.org/~robert/freebsd/netperf/ktr/

This page is primarily targetted at tracing locks, memory allocation, and 
context switching, but you can also trace I/O, bus operations, VFS 
operations, and a range of other things. While my web page doesn't talk 
about it, as it's generally focused on micro-tracing of kernel events, you 
can also queue the event stream to disk using alq(9).  The man pages have 
more information.  There are some neat tools, such as Jeff Roberson's 
schedgraph, for managing and rendering trace results.


The downside, is of course performance and perturbing of the events. 
Adding trace operations in rapid firing events, such as context switches, 
lock operations, and so on, even if they're disabled at run-time, has a 
huge performance cost.  As a result, the trace mechanisms are added via 
compile-time options for the kernel.  There's some interest in introducing 
run-time instrumentation, although the focus of that has primarily been 
related to run-time adaptation of kernels between UP and SMP, in order to 
avoid lock costs on an SMP-compiled kernel running on UP.  Even then, the 
performance perturbance is a big issue for tracking subtle races.


All thoughts in the mails of this thread, developers as well as users, 
seem to me so right, so true. But I would like to repeat my main point: 
From my personal experience, maybe I'm wrong, but what I see close to 
me, FreeBSD project is loosing a lot of users, I don't know anything 
about developers, but it seems to me true too. No users no developers no 
project.


I appreciate your concern, but at least from looking at the committer 
count and commit rates, FreeBSD is gaining developers rather than losing 
them.  Likewise, while users come and go, reports from organizations like 
Netcraft have tracked a moderate to substantial increase in FreeBSD use 
over the last few years.  If you then throw in indirect consumers of 
FreeBSD as a result of FreeBSD-derived operating systems, such as Apple's 
Mac OS X, Juniper, etc, the numbers become rediculously large very 
quickly.


None of this is to say quality and a focus on quality aren't important, 
just that while your concerns are valid, I think there's a lot of detail 
to this that isn't as immediately obvious.


Robert N M Watson
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Quality of FreeBSD

2005-07-21 Thread Karl Denninger

On Fri, Jul 22, 2005 at 12:45:03AM +0100, Robert Watson wrote:
 
 On Thu, 21 Jul 2005, Karl Denninger wrote:
 
 ATA-NG (Soren's new code) is not (from what I understand) in the 5.x 
 codebase.  One bone of contention is that apparently it IS in -HEAD, but 
 there are no plans to MFC it to 5.x.
 
 Then you misunderstand.  Soren has asked to MFC it, and we've asked him 
 to wait until it's had more testing exposure, precisely because it is a 
 sensitive code base, and we don't want to see further regression.
 
 Robert N M Watson

I don't think I misunderstand at all Robert.

We (some group) has asked him not to MFC it.  

Ergo, IT IS NOT THERE NOW, and there are no plans (at present) to MFC it.

That's exactly what I said.

However, it obviously wasn't that big of a deal (to the -committers) to 
commit the ORIGINAL changes which broke the implementation going from 4.x 
to 5.something (early 5.x early adopter RELEASEs were ok).

What I don't understand Robert is why Soren's code is too sensitive to 
commit, but the explosive reduction in stability that the changes made 
between 4.x and 5.3 caused weren't enough to back THAT out until it could 
be fixed.

Its not like these problems didn't show up almost immediately when the
affected releases hit the street.  They did.  Six months later, the
problems are still there, and I see nothing in the commit logs to 
suggest that the underlying issues have been addressed.

Papering over the failures so that retries work properly (when they were
broken before) isn't a fix.  A fix would be identifying the root cause of 
the DMA_TIMEOUT errors and addressing them so that they no longer occur.

I realize that this is likely a timing issue in the code, and therefore 
is difficult to debug.

That does not, however, change the fact that this issue has been open for
more than six months without resolution, and that one potential resolution
to the problem (Soren's ATA-NG code) either (1) doesn't fix it, (2) hasn't 
been tested to see if it does, or (3) DOES fix it, but for whatever
internal reasons has not been MFC'd.

If (1), then not only should Soren's code NOT be MFC'd, but 6.x should
absolutely be held until it IS identified and resolved.

If (2), then how about trying to find out of if that solves the problem?

If (3), I think there are a few of us (myself included) that would like an
explanation.

If Soren BELIEVES (2) is the case, I'll test against -BETA1, IF I can have
confirmation that -BETA1 has the ATA-NG code in it.  

Its trivially easy for me to reproduce this problem on my sandbox machine.

--
-- 
Karl Denninger ([EMAIL PROTECTED]) Internet Consultant  Kids Rights Activist
http://www.denninger.netMy home on the net - links to everything I do!
http://scubaforum.org   Your UNCENSORED place to talk about DIVING!
http://homecuda.com Emerald Coast: Buy / sell homes, cars, boats!
http://genesis3.blogspot.comMusings Of A Sentient Mind


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Quality of FreeBSD

2005-07-21 Thread Robert Watson



On Thu, 21 Jul 2005, Karl Denninger wrote:

If Soren BELIEVES (2) is the case, I'll test against -BETA1, IF I can 
have confirmation that -BETA1 has the ATA-NG code in it.


Its trivially easy for me to reproduce this problem on my sandbox 
machine.


As has already been stated, Soren's changes are in 6.x.  If you are able 
to test this workload against 6.0-BETA1 using the hardware in question, 
that would be very helpful.  Depending on the nature of the workload and 
problem, you might find you need to compile out the debugging features, as 
they slow things down quite a bit, so might reduce the transaction rate 
sufficiently to make the problem fail to occur.  If it requires 5.x 
applications, you might find you have to wait for BETA2.


Robert N M Watson
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Quality of FreeBSD

2005-07-21 Thread Robert Watson



On Thu, 21 Jul 2005, Matthias Schuendehuette wrote:


Am 21.07.2005 um 13:00 schrieb Robert Watson:


  Have you tried, and do you plan to try, our 6.0 test releases before
  6.0-RELEASE goes out the door?  Specifically, on the hardware you know
  you're having problems with 5.4 on?


Yes, I did - see the thread mpt + gvinum on 6.0-BETA.

But I'm a bit disappointed, that until now there's not *one* reply on my 
report.


It's new hardware, which doesn't even boot with 5.3/5.4-RELEASE (but 
with 5.2.1 :-) and probably a more popular Server (FUJITSU-SIEMENS RX300 
S2)...


what was my fault here? Should I post to -current instead?


I would post to -current about 6.x issues, as it's not yet considered a 
-STABLE branch.  If it's an mpt problem, the likely contact is Scott Long 
(scottl@), who most recently did cleanup and bug fixing of mpt (July 10).


Lukas Ertl (le@) is probably the starting contact of choice for gvinum 
issues, although there's also a geom mailing list which might be a good 
place to send e-mail.


It could, though, easily be an interrupt-related problem of some sort -- 
e-mail with Scott should hopefully quickly determine if it's the driver, 
an interrupt problem, or a problem that should be in the hands of Lukas.


Robert N M Watson
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Machine Replication

2005-07-21 Thread Craig Boston

On Thu, Jul 21, 2005 at 12:20:34PM -0700, Eli K. Breen wrote:
 dd(Slow, not usefull if the hardware isn't identical?)

I use dd a lot for this type of thing and don't see how it could
possibly be slower than any other method that duplicates the entire raw
drive.  Make sure to give it a bs=1m option as reading/writing the
disk in 512 byte chunks is a lot slower than larger blocks.

If your disks have a lot of free space, copying the filesystem using
dump/restore can be faster, but it's not an *exact* bit-for-bit copy.
The resulting filesystem is functionally equivalent though, so it's
probably the best way for duplicating UFS(2) filesystems.  You do have
to partition manually, but you would probably want to do that if the new
drive was a different size anyway.

Craig
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: RELENG_6 scroll wheel

2005-07-21 Thread Jonguk Kim


Marian Hettwer wrote:

Hej All,

I upgraded to RELENG_6 to help testing.
Everything went smooth so far, but my scroll wheel in X isn't working
anymore. I didn't changed anything regarding the configuration from
FreeBSD RELENG_5 to RELENG_6 ...

some details:
[EMAIL PROTECTED] ~ $ uname -a
FreeBSD beastie.mobile.rz 6.0-BETA1 FreeBSD 6.0-BETA1 #0: Fri Jul 15
17:00:59 CEST 2005 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/GENERIC
 i386
[EMAIL PROTECTED] ~ $ dmesg | grep ums
ums0: Logitech USB-PS/2 Optical Mouse, rev 2.00/11.10, addr 3, iclass 3/1
ums0: 3 buttons and Z dir.
ums0: Logitech USB-PS/2 Optical Mouse, rev 2.00/11.10, addr 2, iclass 3/1
ums0: 3 buttons and Z dir.
ums0: Logitech USB-PS/2 Optical Mouse, rev 2.00/11.10, addr 2, iclass 3/1
ums0: 3 buttons and Z dir.

from /etc/X11/xorg.conf
Section InputDevice
Identifier  Mouse0
Driver  mouse
Option  Protocol auto
Option  Device /dev/sysmouse


I think 'Option Buttons 5' can help you.

jonguk


Option  ZAxisMapping 4 5
EndSection

[EMAIL PROTECTED] ~ $ ps ax | grep moused
 1060  ??  Ss 0:52,38 /usr/sbin/moused -z 4 -p /dev/ums0 -t auto -I
/var/run/moused.ums0.pid

I'm running xorg-6.8.2

I didn't recompiled my ports, but I guess this shouldn't be the problem, hm ?

Any ideas anyone ?

best regards,
Marian
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Quality of FreeBSD

2005-07-21 Thread Karl Denninger

On Fri, Jul 22, 2005 at 01:38:40AM +0100, Robert Watson wrote:
 
 On Thu, 21 Jul 2005, Karl Denninger wrote:
 
 If Soren BELIEVES (2) is the case, I'll test against -BETA1, IF I can 
 have confirmation that -BETA1 has the ATA-NG code in it.
 
 Its trivially easy for me to reproduce this problem on my sandbox 
 machine.
 
 As has already been stated, Soren's changes are in 6.x.  If you are able 
 to test this workload against 6.0-BETA1 using the hardware in question, 
 that would be very helpful.  Depending on the nature of the workload and 
 problem, you might find you need to compile out the debugging features, 
 as they slow things down quite a bit, so might reduce the transaction 
 rate sufficiently to make the problem fail to occur.  If it requires 5.x 
 applications, you might find you have to wait for BETA2.
 
 Robert N M Watson

As I pointed out in my PR, make -j4 buildworld is more than sufficient
to demonstrate the problem.

This is why I don't understand why it has been ignored - it is easily
reproducable using stock Adaptec SATA controllers, standard SATA drives,
and a gmirror RAID 1 configuration.

This is pretty pedestrian stuff here Robert  Two disks on one adapter,
on a PCI bus.

I'll pull over 6.0-BETA1, rebuild the array (that is the time-consuming
part of this test - takes 6-8 hours for the rebuild to run) and see if it
fails during a buildworld.

--
-- 
Karl Denninger ([EMAIL PROTECTED]) Internet Consultant  Kids Rights Activist
http://www.denninger.netMy home on the net - links to everything I do!
http://scubaforum.org   Your UNCENSORED place to talk about DIVING!
http://homecuda.com Emerald Coast: Buy / sell homes, cars, boats!
http://genesis3.blogspot.comMusings Of A Sentient Mind


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Quality of FreeBSD

2005-07-21 Thread Janet Sullivan

I've been a BSD user since the mid-90s, and a FreeBSD user since the 
days 4.0 became STABLE.


Right now, I have 2 collocated servers, one home server, and a laptop 
all running 5.4 without any serious problems.  I've watched 5.x since 
its creation, and while there have been some rocky times, I do think 
things are getting better.  I refused to run 5.x on my servers until 
5.4, but I have not yet regretted the move.  I know that other people 
have had issues, but so far knock on wood 5.4 has been a solid release 
for me.


I do think some mistakes were made with the release engineering over 
5.x's lifetime, but folks, what's done is done.  Recently things do seem 
to be headed in a better direction, for which I'm thankful.


I know the developers don't hear it often enough, but thanks for all you 
do.  I'm not a programmer, and I currently don't have the funds to 
donate to the project, but you do have my heartfelt thanks for still 
turning out my favorite OS.


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Machine Replication

2005-07-21 Thread pcasidy

On 21 Jul, Dan Mack wrote:

 
 Is there a jumpstart (solaris), kickstart (redhat linux), roboinst (irix),
 or ignite (hpux) like auto-installer for BSD?
 
 If there was, then I wouldn't image the disk at all, I'd instead setup up 
 custom network images that I could blast to any system just by pxebooting 
 it.  I'm not sure if it is possible with FreeBSD though, anyone?
 

According to its manpage, 'sysinstall' is supposed to be able to read a
config file in order to be able to configure an installation with no
user interaction.

Philippe.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: Quality of FreeBSD

2005-07-21 Thread Vlad GALU

On 7/21/05, Matthias Buelow [EMAIL PROTECTED] wrote:
 [EMAIL PROTECTED] writes:
 
 My main problem, and to others after seeing the question from times to
 times, is to know which is a good (not necessarly the best) hardware to
 run FreeBSD on?
 When I buy a new motherboard, which chipset to choose/avoid, which 
 controllers
 ?
 
 Maybe some website like it is being done for notebooks (with
 Linux/FreeBSD support) would be in order. I'm thinking about something
 like http://www.linux-laptop.net/, only for FreeBSD and all kinds of
 machines, not just notebooks. (Or, if some collaboration would be ok,
 for *BSD in general, with people posting experience from NetBSD,
 OpenBSD, Dragonfly, even Darwin aswell. That way one could also compare
 support for hardware and see what problems the individual systems have.)
 
  There's this: http://gerda.univie.ac.at/freebsd-laptops/

 Make it a Wiki, or something similar, where people can freely post
 experiences they have with their hardware. That could be whole machines
 (Dell model xxx desktop, IBM yyy laptop, HP zzz server) aswell as
 components (Asus blah motherboard, 3Com wlan card model foobar, etc.)
 and make the thing searchable, and perhaps allow one to post comments on
 entries (easy with a Wiki). That way people can quickly search  review
 hardware, awell as test suggested workarounds by the posters, without
 having to google for obscured mailing list entries, or problem reports.
 
 mkb.
 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to [EMAIL PROTECTED]
 


-- 
If it's there, and you can see it, it's real.
If it's not there, and you can see it, it's virtual.
If it's there, and you can't see it, it's transparent.
If it's not there, and you can't see it, you erased it.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

91 matches

Mail list logo