Re: How to report bugs (Re: 6.2-STABLE deadlock?)

2007-04-27 Thread Marc G. Fournier
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



- --On Tuesday, April 24, 2007 23:53:16 -0400 Kris Kennaway
[EMAIL PROTECTED] 
wrote:

 On Wed, Apr 25, 2007 at 10:53:08AM +0800, LI Xin wrote:
 Hi, Oleg,

 Oleg Derevenetz wrote:
  ??? LI Xin [EMAIL PROTECTED]:
 [...]
  I'm not very sure if this is specific to one disk controller.  Actually
  I got some occasional reports about similar hangs on amd64 6.2-RELEASE
  (slightly patched version) that most of processes stuck in the 'ufs'
  state, under very light load, the box was equipped with amr(4) RAID.
 
  I was not able to reproduce the problem at my lab, though, it's still
  unknown that how to trigger the livelock :-(  Still need some
  investigate on their production system.
 
  I reported simular issue for FreeBSD 6.2 in audit-trail for kern/104406:
 
  http://www.freebsd.org/cgi/query-pr.cgi?pr=104406cat=
 
  and there should be a thread related to this. Briefly, I suspects that
  this is  related to nullfs filesystems on my server and when I cvsuped to
  FreeBSD 6.2- STABLE with Daichi's unionfs-related patches and replaced
  nullfs-mounted fs  with unionfs-mounted (that was done 10.03.07) problem
  is gone (seems to be so,  at least).

 Hmm...  Seems to be different issues.  The problem I have received was a
 pgsql server (no nullfs/unionfs involved), and the hang always happen
 when it is not being heavily loaded (usually in the morning, for
 instance, and there is no special configuration, like scheduled tasks
 which can generate disk load, etc., only the entropy harvesting), so
 this is quite confusing.

 Yes, a large part of the confusion is the unfortunate tendency of
 people to do the following:

 user1 my system hangs/panics/etc
 user2 my system hangs/panics/etc too; it must be the same problem!

 What we really need is for every FreeBSD user who encounters a
 hang/panic/etc to avoid jumping to conclusions -- no matter how many
 superficial similarities there may seem to you -- and instead go
 through the relevant steps described here:


 http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kernelde
 bug.html

 Until you (or a developer) have analyzed the resulting information,
 you cannot definitively determine whether or not your problem is the
 same as a given random other problem, and you may just confuse the
 issue by making claims of similarity when you are really reporting a
 completely separate problem.

What about those that don't have the benefit of being able to access the 
console? :(  I've recently started buying servers that have builtin, full 
remote console (ie. the HP servers), but, for instance, I have one box that I 
have to consistently reboot ever 3 days due to a 'No Buffer Space Available' 
...

A thought: how hard would it be to add some method of forcing a system crash, 
that would dump core, from the command line?  Something that, by default, would 
be disabled, but for remote debugging purposes, one could enable in the kernel 
and do a 'sysctl kernel.force_core_crash=1' to have it do it?  I imagine that 
having a core to analyze would allow providing more information then nothing at 
all, no?


- 
Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (FreeBSD)

iD8DBQFGMkj34QvfyHIvDvMRAnIsAJ42loBGh0TkX4mfWSrZrMq2FheBuQCgiu4l
B0PCLtLhd9ZiJ4oNLWZ6LT0=
=KK9Y
-END PGP SIGNATURE-

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: How to report bugs (Re: 6.2-STABLE deadlock?)

2007-04-27 Thread Nicolas Rachinsky
* Marc G. Fournier [EMAIL PROTECTED] [2007-04-27 16:03 -0300]:
 A thought: how hard would it be to add some method of forcing a system crash, 
 that would dump core, from the command line?  Something that, by default, 
 would 

Doesn't 'kill -6 1' work anymore?

Nicolas

-- 
http://www.rachinsky.de/nicolas
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: How to report bugs (Re: 6.2-STABLE deadlock?)

2007-04-27 Thread Marc G. Fournier
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



- --On Friday, April 27, 2007 22:57:29 +0200 Nicolas Rachinsky 
[EMAIL PROTECTED] wrote:

 * Marc G. Fournier [EMAIL PROTECTED] [2007-04-27 16:03 -0300]:
 A thought: how hard would it be to add some method of forcing a system
 crash,  that would dump core, from the command line?  Something that, by
 default, would

 Doesn't 'kill -6 1' work anymore?

I'd never heard of that one ... will it dump core if I do that?

Please note, in my case, with the Buffer Space issue ... I can login and 
cleanly reboot the server, so doing something like the above to get a core dump 
is definitely doable, I'd just never seen a reference to a 'kill -6 1' before 
for doing that ...

Side question to this though ... I remember awhile back using a 'client-server' 
mechanism that allowed me to dump core to a seperate server ... it was so long 
ago that my memory is faint, but there was a reason why I couldn't dump to the 
local server ... not sure whatever happened to that code, but, if one can do 
that for dumping core, shouldn't there be some method possible to connect to 
DDB over the Ethernet without having to have a serial console in place?  For 
the core dump case, the ethernet obviously stayed up while it dump'd, couldn't 
some sort of 'ddb.conf' file be setup that would allow it to ifconfig an IP 
within that shell so that you could connect to it remotely?  say with an 
'from-ip' directive?

Just a thought ...


- 
Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email . [EMAIL PROTECTED]  MSN . [EMAIL PROTECTED]
Yahoo . yscrappy   Skype: hub.orgICQ . 7615664
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (FreeBSD)

iD8DBQFGMmx04QvfyHIvDvMRAlNcAJ0QcIMoRnq+0T9yJVuMwZvTNQnNXwCfaEKK
JB4cHzSbiklD/sodWvNSSzE=
=BwuL
-END PGP SIGNATURE-

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: How to report bugs (Re: 6.2-STABLE deadlock?)

2007-04-27 Thread Daniel O'Connor
On Saturday 28 April 2007 04:33, Marc G. Fournier wrote:
 A thought: how hard would it be to add some method of forcing a
 system crash, that would dump core, from the command line?  Something
 that, by default, would be disabled, but for remote debugging
 purposes, one could enable in the kernel and do a 'sysctl
 kernel.force_core_crash=1' to have it do it?  I imagine that having a
 core to analyze would allow providing more information then nothing
 at all, no?

I think you can do this..
sysctl debug.kdb.panic=1

Alas that appears to be a -current thing. 6.x has debug.kdb.enter 
though.

-- 
Daniel O'Connor software and network engineer
for Genesis Software - http://www.gsoft.com.au
The nice thing about standards is that there
are so many of them to choose from.
  -- Andrew Tanenbaum
GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C


pgp1RMXdUwoh1.pgp
Description: PGP signature


Re: How to report bugs (Re: 6.2-STABLE deadlock?)

2007-04-25 Thread Oleg Derevenetz
Цитирую Kris Kennaway [EMAIL PROTECTED]:

  Oleg Derevenetz wrote:
   ??? LI Xin [EMAIL PROTECTED]:
  [...]
   I'm not very sure if this is specific to one disk controller. 
 Actually
   I got some occasional reports about similar hangs on amd64
 6.2-RELEASE
   (slightly patched version) that most of processes stuck in the
 'ufs'
   state, under very light load, the box was equipped with amr(4)
 RAID.
  
   I was not able to reproduce the problem at my lab, though, it's
 still
   unknown that how to trigger the livelock :-(  Still need some
   investigate on their production system.
   
   I reported simular issue for FreeBSD 6.2 in audit-trail for
 kern/104406:
   
   http://www.freebsd.org/cgi/query-pr.cgi?pr=104406cat=
   
   and there should be a thread related to this. Briefly, I suspects
 that this is 
   related to nullfs filesystems on my server and when I cvsuped to
 FreeBSD 6.2-
   STABLE with Daichi's unionfs-related patches and replaced
 nullfs-mounted fs 
   with unionfs-mounted (that was done 10.03.07) problem is gone (seems
 to be so, 
   at least).
  
  Hmm...  Seems to be different issues.  The problem I have received was
 a
  pgsql server (no nullfs/unionfs involved), and the hang always happen
  when it is not being heavily loaded (usually in the morning, for
  instance, and there is no special configuration, like scheduled tasks
  which can generate disk load, etc., only the entropy harvesting), so
  this is quite confusing.
 
 Yes, a large part of the confusion is the unfortunate tendency of
 people to do the following:
 
 user1 my system hangs/panics/etc
 user2 my system hangs/panics/etc too; it must be the same problem!
 
 What we really need is for every FreeBSD user who encounters a
 hang/panic/etc to avoid jumping to conclusions -- no matter how many
 superficial similarities there may seem to you -- and instead go
 through the relevant steps described here:
 
  
 http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-
handbook/kerneldebug.html
 
 Until you (or a developer) have analyzed the resulting information,
 you cannot definitively determine whether or not your problem is the
 same as a given random other problem, and you may just confuse the
 issue by making claims of similarity when you are really reporting a
 completely separate problem.

Not all people can do deadlock debugging, though. In my case turning on 
INVARIANTS and WITNESS leads to unacceptable performance penalty due to heavily 
loaded server. So I can only describe my case, actions and result without 
providing any debug information.


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: How to report bugs (Re: 6.2-STABLE deadlock?)

2007-04-25 Thread LI Xin
Oleg Derevenetz wrote:
[snip]
 Not all people can do deadlock debugging, though. In my case turning on 
 INVARIANTS and WITNESS leads to unacceptable performance penalty due to 
 heavily 
 loaded server. So I can only describe my case, actions and result without 
 providing any debug information.

I'd say that I completely agree with Kris because that it's very hard
for developers to investigate problems if there is no detailed
information available, especially for those problems that can not easily
reproduced.  Of course, deadlock debugging could be tricky, but having a
backtrace can usually save a lot of time (and fortunately that is not
that hard even for average users :)

What I wanted to suggest is that, we hope that the submitter can provide
detailed steps to reliably reproduce the problem whenever possible, if
they are not able to diagnose the problem themselves, so we will be able
to extract more information at lab, and possibly reach a fix.

The problem I have is that the reporter of the issue is not quite
cooperative as they did before, and what I wanted to say is that it's
possible to trigger the livelock without nullfs/unionfs, and I did not
figured out why (yet) because I can not reproduce it in my environment :-(

Cheers,
-- 
Xin LI [EMAIL PROTECTED]  http://www.delphij.net/
FreeBSD - The Power to Serve!



signature.asc
Description: OpenPGP digital signature


Re: How to report bugs (Re: 6.2-STABLE deadlock?)

2007-04-25 Thread Kris Kennaway
On Wed, Apr 25, 2007 at 12:14:20PM +0400, Oleg Derevenetz wrote:

  Until you (or a developer) have analyzed the resulting information,
  you cannot definitively determine whether or not your problem is the
  same as a given random other problem, and you may just confuse the
  issue by making claims of similarity when you are really reporting a
  completely separate problem.
 
 Not all people can do deadlock debugging, though. In my case turning on 
 INVARIANTS and WITNESS leads to unacceptable performance penalty due to 
 heavily 
 loaded server. So I can only describe my case, actions and result without 
 providing any debug information.

But you can still do *some* things, e.g. backtraces and/or a coredump:
every little bit helps.

Ultimately, though, you have to understand and accept that the less
information you provide, the less chance there is that a developer
will be able to track down your problem.  In fact a developer may have
to effectively ignore your problem report altogether, because of what
I explained about symptoms usually not being enough to tell one bug
from another.

In general, when you encounter a bug in FreeBSD, you have a little bit
of work to do on your side before we can start doing the rest.  I
understand that you may not be in a position to do that work, but that
means you also need to understand that we can't do it either.

Kris


pgpe7wGSIKiIP.pgp
Description: PGP signature


Re: How to report bugs (Re: 6.2-STABLE deadlock?)

2007-04-25 Thread Oleg Derevenetz
Цитирую Kris Kennaway [EMAIL PROTECTED]:

 On Wed, Apr 25, 2007 at 12:14:20PM +0400, Oleg Derevenetz wrote:
 
   Until you (or a developer) have analyzed the resulting information,
   you cannot definitively determine whether or not your problem is
 the
   same as a given random other problem, and you may just confuse the
   issue by making claims of similarity when you are really reporting
 a
   completely separate problem.
  
  Not all people can do deadlock debugging, though. In my case turning
 on 
  INVARIANTS and WITNESS leads to unacceptable performance penalty due
 to heavily 
  loaded server. So I can only describe my case, actions and result
 without 
  providing any debug information.
 
 But you can still do *some* things, e.g. backtraces and/or a coredump:
 every little bit helps.
 
 Ultimately, though, you have to understand and accept that the less
 information you provide, the less chance there is that a developer
 will be able to track down your problem.  In fact a developer may have
 to effectively ignore your problem report altogether, because of what
 I explained about symptoms usually not being enough to tell one bug
 from another.
 
 In general, when you encounter a bug in FreeBSD, you have a little bit
 of work to do on your side before we can start doing the rest.  I
 understand that you may not be in a position to do that work, but that
 means you also need to understand that we can't do it either.

In fact, I solved (or workarounded) this problem for me, so in this thread I 
provide my workaround as possible workaround for users that experiences the 
same problem. This only hint for them, and not a bugreport for you. I could not 
provide a full (or only partial) debug information because I will not back out 
cvsuped sources, will not replace unionfs with nullfs again and will not wait 
week or more for another stuck.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: How to report bugs (Re: 6.2-STABLE deadlock?)

2007-04-25 Thread Kris Kennaway
On Wed, Apr 25, 2007 at 10:20:25PM +0400, Oleg Derevenetz wrote:
 ??? Kris Kennaway [EMAIL PROTECTED]:
 
  On Wed, Apr 25, 2007 at 12:14:20PM +0400, Oleg Derevenetz wrote:
  
Until you (or a developer) have analyzed the resulting information,
you cannot definitively determine whether or not your problem is
  the
same as a given random other problem, and you may just confuse the
issue by making claims of similarity when you are really reporting
  a
completely separate problem.
   
   Not all people can do deadlock debugging, though. In my case turning
  on 
   INVARIANTS and WITNESS leads to unacceptable performance penalty due
  to heavily 
   loaded server. So I can only describe my case, actions and result
  without 
   providing any debug information.
  
  But you can still do *some* things, e.g. backtraces and/or a coredump:
  every little bit helps.
  
  Ultimately, though, you have to understand and accept that the less
  information you provide, the less chance there is that a developer
  will be able to track down your problem.  In fact a developer may have
  to effectively ignore your problem report altogether, because of what
  I explained about symptoms usually not being enough to tell one bug
  from another.
  
  In general, when you encounter a bug in FreeBSD, you have a little bit
  of work to do on your side before we can start doing the rest.  I
  understand that you may not be in a position to do that work, but that
  means you also need to understand that we can't do it either.
 
 In fact, I solved (or workarounded) this problem for me, so in this thread I 
 provide my workaround as possible workaround for users that experiences the 
 same problem. This only hint for them, and not a bugreport for you. I could 
 not 
 provide a full (or only partial) debug information because I will not back 
 out 
 cvsuped sources, will not replace unionfs with nullfs again and will not wait 
 week or more for another stuck.

OK.  FYI I use nullfs on a few dozen heavily loaded machines without
issue for the past year or so, so if you are seeing a nullfs issue it
is probably an obscure one.

Kris


pgpIUn3mCMoxg.pgp
Description: PGP signature


How to report bugs (Re: 6.2-STABLE deadlock?)

2007-04-24 Thread Kris Kennaway
On Wed, Apr 25, 2007 at 10:53:08AM +0800, LI Xin wrote:
 Hi, Oleg,
 
 Oleg Derevenetz wrote:
  ??? LI Xin [EMAIL PROTECTED]:
 [...]
  I'm not very sure if this is specific to one disk controller.  Actually
  I got some occasional reports about similar hangs on amd64 6.2-RELEASE
  (slightly patched version) that most of processes stuck in the 'ufs'
  state, under very light load, the box was equipped with amr(4) RAID.
 
  I was not able to reproduce the problem at my lab, though, it's still
  unknown that how to trigger the livelock :-(  Still need some
  investigate on their production system.
  
  I reported simular issue for FreeBSD 6.2 in audit-trail for kern/104406:
  
  http://www.freebsd.org/cgi/query-pr.cgi?pr=104406cat=
  
  and there should be a thread related to this. Briefly, I suspects that this 
  is 
  related to nullfs filesystems on my server and when I cvsuped to FreeBSD 
  6.2-
  STABLE with Daichi's unionfs-related patches and replaced nullfs-mounted fs 
  with unionfs-mounted (that was done 10.03.07) problem is gone (seems to be 
  so, 
  at least).
 
 Hmm...  Seems to be different issues.  The problem I have received was a
 pgsql server (no nullfs/unionfs involved), and the hang always happen
 when it is not being heavily loaded (usually in the morning, for
 instance, and there is no special configuration, like scheduled tasks
 which can generate disk load, etc., only the entropy harvesting), so
 this is quite confusing.

Yes, a large part of the confusion is the unfortunate tendency of
people to do the following:

user1 my system hangs/panics/etc
user2 my system hangs/panics/etc too; it must be the same problem!

What we really need is for every FreeBSD user who encounters a
hang/panic/etc to avoid jumping to conclusions -- no matter how many
superficial similarities there may seem to you -- and instead go
through the relevant steps described here:

  
http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html

Until you (or a developer) have analyzed the resulting information,
you cannot definitively determine whether or not your problem is the
same as a given random other problem, and you may just confuse the
issue by making claims of similarity when you are really reporting a
completely separate problem.

Thanks,
Kris

pgp3OkN96LYEW.pgp
Description: PGP signature