Re: How to report bugs (Re: 6.2-STABLE deadlock?)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 - --On Tuesday, April 24, 2007 23:53:16 -0400 Kris Kennaway [EMAIL PROTECTED] wrote: On Wed, Apr 25, 2007 at 10:53:08AM +0800, LI Xin wrote: Hi, Oleg, Oleg Derevenetz wrote: ??? LI Xin [EMAIL PROTECTED]: [...] I'm not very sure if this is specific to one disk controller. Actually I got some occasional reports about similar hangs on amd64 6.2-RELEASE (slightly patched version) that most of processes stuck in the 'ufs' state, under very light load, the box was equipped with amr(4) RAID. I was not able to reproduce the problem at my lab, though, it's still unknown that how to trigger the livelock :-( Still need some investigate on their production system. I reported simular issue for FreeBSD 6.2 in audit-trail for kern/104406: http://www.freebsd.org/cgi/query-pr.cgi?pr=104406cat= and there should be a thread related to this. Briefly, I suspects that this is related to nullfs filesystems on my server and when I cvsuped to FreeBSD 6.2- STABLE with Daichi's unionfs-related patches and replaced nullfs-mounted fs with unionfs-mounted (that was done 10.03.07) problem is gone (seems to be so, at least). Hmm... Seems to be different issues. The problem I have received was a pgsql server (no nullfs/unionfs involved), and the hang always happen when it is not being heavily loaded (usually in the morning, for instance, and there is no special configuration, like scheduled tasks which can generate disk load, etc., only the entropy harvesting), so this is quite confusing. Yes, a large part of the confusion is the unfortunate tendency of people to do the following: user1 my system hangs/panics/etc user2 my system hangs/panics/etc too; it must be the same problem! What we really need is for every FreeBSD user who encounters a hang/panic/etc to avoid jumping to conclusions -- no matter how many superficial similarities there may seem to you -- and instead go through the relevant steps described here: http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kernelde bug.html Until you (or a developer) have analyzed the resulting information, you cannot definitively determine whether or not your problem is the same as a given random other problem, and you may just confuse the issue by making claims of similarity when you are really reporting a completely separate problem. What about those that don't have the benefit of being able to access the console? :( I've recently started buying servers that have builtin, full remote console (ie. the HP servers), but, for instance, I have one box that I have to consistently reboot ever 3 days due to a 'No Buffer Space Available' ... A thought: how hard would it be to add some method of forcing a system crash, that would dump core, from the command line? Something that, by default, would be disabled, but for remote debugging purposes, one could enable in the kernel and do a 'sysctl kernel.force_core_crash=1' to have it do it? I imagine that having a core to analyze would allow providing more information then nothing at all, no? - Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (FreeBSD) iD8DBQFGMkj34QvfyHIvDvMRAnIsAJ42loBGh0TkX4mfWSrZrMq2FheBuQCgiu4l B0PCLtLhd9ZiJ4oNLWZ6LT0= =KK9Y -END PGP SIGNATURE- ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: How to report bugs (Re: 6.2-STABLE deadlock?)
* Marc G. Fournier [EMAIL PROTECTED] [2007-04-27 16:03 -0300]: A thought: how hard would it be to add some method of forcing a system crash, that would dump core, from the command line? Something that, by default, would Doesn't 'kill -6 1' work anymore? Nicolas -- http://www.rachinsky.de/nicolas ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: How to report bugs (Re: 6.2-STABLE deadlock?)
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 - --On Friday, April 27, 2007 22:57:29 +0200 Nicolas Rachinsky [EMAIL PROTECTED] wrote: * Marc G. Fournier [EMAIL PROTECTED] [2007-04-27 16:03 -0300]: A thought: how hard would it be to add some method of forcing a system crash, that would dump core, from the command line? Something that, by default, would Doesn't 'kill -6 1' work anymore? I'd never heard of that one ... will it dump core if I do that? Please note, in my case, with the Buffer Space issue ... I can login and cleanly reboot the server, so doing something like the above to get a core dump is definitely doable, I'd just never seen a reference to a 'kill -6 1' before for doing that ... Side question to this though ... I remember awhile back using a 'client-server' mechanism that allowed me to dump core to a seperate server ... it was so long ago that my memory is faint, but there was a reason why I couldn't dump to the local server ... not sure whatever happened to that code, but, if one can do that for dumping core, shouldn't there be some method possible to connect to DDB over the Ethernet without having to have a serial console in place? For the core dump case, the ethernet obviously stayed up while it dump'd, couldn't some sort of 'ddb.conf' file be setup that would allow it to ifconfig an IP within that shell so that you could connect to it remotely? say with an 'from-ip' directive? Just a thought ... - Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . [EMAIL PROTECTED] MSN . [EMAIL PROTECTED] Yahoo . yscrappy Skype: hub.orgICQ . 7615664 -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (FreeBSD) iD8DBQFGMmx04QvfyHIvDvMRAlNcAJ0QcIMoRnq+0T9yJVuMwZvTNQnNXwCfaEKK JB4cHzSbiklD/sodWvNSSzE= =BwuL -END PGP SIGNATURE- ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: How to report bugs (Re: 6.2-STABLE deadlock?)
On Saturday 28 April 2007 04:33, Marc G. Fournier wrote: A thought: how hard would it be to add some method of forcing a system crash, that would dump core, from the command line? Something that, by default, would be disabled, but for remote debugging purposes, one could enable in the kernel and do a 'sysctl kernel.force_core_crash=1' to have it do it? I imagine that having a core to analyze would allow providing more information then nothing at all, no? I think you can do this.. sysctl debug.kdb.panic=1 Alas that appears to be a -current thing. 6.x has debug.kdb.enter though. -- Daniel O'Connor software and network engineer for Genesis Software - http://www.gsoft.com.au The nice thing about standards is that there are so many of them to choose from. -- Andrew Tanenbaum GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C pgp1RMXdUwoh1.pgp Description: PGP signature
Re: How to report bugs (Re: 6.2-STABLE deadlock?)
Цитирую Kris Kennaway [EMAIL PROTECTED]: Oleg Derevenetz wrote: ??? LI Xin [EMAIL PROTECTED]: [...] I'm not very sure if this is specific to one disk controller. Actually I got some occasional reports about similar hangs on amd64 6.2-RELEASE (slightly patched version) that most of processes stuck in the 'ufs' state, under very light load, the box was equipped with amr(4) RAID. I was not able to reproduce the problem at my lab, though, it's still unknown that how to trigger the livelock :-( Still need some investigate on their production system. I reported simular issue for FreeBSD 6.2 in audit-trail for kern/104406: http://www.freebsd.org/cgi/query-pr.cgi?pr=104406cat= and there should be a thread related to this. Briefly, I suspects that this is related to nullfs filesystems on my server and when I cvsuped to FreeBSD 6.2- STABLE with Daichi's unionfs-related patches and replaced nullfs-mounted fs with unionfs-mounted (that was done 10.03.07) problem is gone (seems to be so, at least). Hmm... Seems to be different issues. The problem I have received was a pgsql server (no nullfs/unionfs involved), and the hang always happen when it is not being heavily loaded (usually in the morning, for instance, and there is no special configuration, like scheduled tasks which can generate disk load, etc., only the entropy harvesting), so this is quite confusing. Yes, a large part of the confusion is the unfortunate tendency of people to do the following: user1 my system hangs/panics/etc user2 my system hangs/panics/etc too; it must be the same problem! What we really need is for every FreeBSD user who encounters a hang/panic/etc to avoid jumping to conclusions -- no matter how many superficial similarities there may seem to you -- and instead go through the relevant steps described here: http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers- handbook/kerneldebug.html Until you (or a developer) have analyzed the resulting information, you cannot definitively determine whether or not your problem is the same as a given random other problem, and you may just confuse the issue by making claims of similarity when you are really reporting a completely separate problem. Not all people can do deadlock debugging, though. In my case turning on INVARIANTS and WITNESS leads to unacceptable performance penalty due to heavily loaded server. So I can only describe my case, actions and result without providing any debug information. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: How to report bugs (Re: 6.2-STABLE deadlock?)
Oleg Derevenetz wrote: [snip] Not all people can do deadlock debugging, though. In my case turning on INVARIANTS and WITNESS leads to unacceptable performance penalty due to heavily loaded server. So I can only describe my case, actions and result without providing any debug information. I'd say that I completely agree with Kris because that it's very hard for developers to investigate problems if there is no detailed information available, especially for those problems that can not easily reproduced. Of course, deadlock debugging could be tricky, but having a backtrace can usually save a lot of time (and fortunately that is not that hard even for average users :) What I wanted to suggest is that, we hope that the submitter can provide detailed steps to reliably reproduce the problem whenever possible, if they are not able to diagnose the problem themselves, so we will be able to extract more information at lab, and possibly reach a fix. The problem I have is that the reporter of the issue is not quite cooperative as they did before, and what I wanted to say is that it's possible to trigger the livelock without nullfs/unionfs, and I did not figured out why (yet) because I can not reproduce it in my environment :-( Cheers, -- Xin LI [EMAIL PROTECTED] http://www.delphij.net/ FreeBSD - The Power to Serve! signature.asc Description: OpenPGP digital signature
Re: How to report bugs (Re: 6.2-STABLE deadlock?)
On Wed, Apr 25, 2007 at 12:14:20PM +0400, Oleg Derevenetz wrote: Until you (or a developer) have analyzed the resulting information, you cannot definitively determine whether or not your problem is the same as a given random other problem, and you may just confuse the issue by making claims of similarity when you are really reporting a completely separate problem. Not all people can do deadlock debugging, though. In my case turning on INVARIANTS and WITNESS leads to unacceptable performance penalty due to heavily loaded server. So I can only describe my case, actions and result without providing any debug information. But you can still do *some* things, e.g. backtraces and/or a coredump: every little bit helps. Ultimately, though, you have to understand and accept that the less information you provide, the less chance there is that a developer will be able to track down your problem. In fact a developer may have to effectively ignore your problem report altogether, because of what I explained about symptoms usually not being enough to tell one bug from another. In general, when you encounter a bug in FreeBSD, you have a little bit of work to do on your side before we can start doing the rest. I understand that you may not be in a position to do that work, but that means you also need to understand that we can't do it either. Kris pgpe7wGSIKiIP.pgp Description: PGP signature
Re: How to report bugs (Re: 6.2-STABLE deadlock?)
Цитирую Kris Kennaway [EMAIL PROTECTED]: On Wed, Apr 25, 2007 at 12:14:20PM +0400, Oleg Derevenetz wrote: Until you (or a developer) have analyzed the resulting information, you cannot definitively determine whether or not your problem is the same as a given random other problem, and you may just confuse the issue by making claims of similarity when you are really reporting a completely separate problem. Not all people can do deadlock debugging, though. In my case turning on INVARIANTS and WITNESS leads to unacceptable performance penalty due to heavily loaded server. So I can only describe my case, actions and result without providing any debug information. But you can still do *some* things, e.g. backtraces and/or a coredump: every little bit helps. Ultimately, though, you have to understand and accept that the less information you provide, the less chance there is that a developer will be able to track down your problem. In fact a developer may have to effectively ignore your problem report altogether, because of what I explained about symptoms usually not being enough to tell one bug from another. In general, when you encounter a bug in FreeBSD, you have a little bit of work to do on your side before we can start doing the rest. I understand that you may not be in a position to do that work, but that means you also need to understand that we can't do it either. In fact, I solved (or workarounded) this problem for me, so in this thread I provide my workaround as possible workaround for users that experiences the same problem. This only hint for them, and not a bugreport for you. I could not provide a full (or only partial) debug information because I will not back out cvsuped sources, will not replace unionfs with nullfs again and will not wait week or more for another stuck. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: How to report bugs (Re: 6.2-STABLE deadlock?)
On Wed, Apr 25, 2007 at 10:20:25PM +0400, Oleg Derevenetz wrote: ??? Kris Kennaway [EMAIL PROTECTED]: On Wed, Apr 25, 2007 at 12:14:20PM +0400, Oleg Derevenetz wrote: Until you (or a developer) have analyzed the resulting information, you cannot definitively determine whether or not your problem is the same as a given random other problem, and you may just confuse the issue by making claims of similarity when you are really reporting a completely separate problem. Not all people can do deadlock debugging, though. In my case turning on INVARIANTS and WITNESS leads to unacceptable performance penalty due to heavily loaded server. So I can only describe my case, actions and result without providing any debug information. But you can still do *some* things, e.g. backtraces and/or a coredump: every little bit helps. Ultimately, though, you have to understand and accept that the less information you provide, the less chance there is that a developer will be able to track down your problem. In fact a developer may have to effectively ignore your problem report altogether, because of what I explained about symptoms usually not being enough to tell one bug from another. In general, when you encounter a bug in FreeBSD, you have a little bit of work to do on your side before we can start doing the rest. I understand that you may not be in a position to do that work, but that means you also need to understand that we can't do it either. In fact, I solved (or workarounded) this problem for me, so in this thread I provide my workaround as possible workaround for users that experiences the same problem. This only hint for them, and not a bugreport for you. I could not provide a full (or only partial) debug information because I will not back out cvsuped sources, will not replace unionfs with nullfs again and will not wait week or more for another stuck. OK. FYI I use nullfs on a few dozen heavily loaded machines without issue for the past year or so, so if you are seeing a nullfs issue it is probably an obscure one. Kris pgpIUn3mCMoxg.pgp Description: PGP signature
How to report bugs (Re: 6.2-STABLE deadlock?)
On Wed, Apr 25, 2007 at 10:53:08AM +0800, LI Xin wrote: Hi, Oleg, Oleg Derevenetz wrote: ??? LI Xin [EMAIL PROTECTED]: [...] I'm not very sure if this is specific to one disk controller. Actually I got some occasional reports about similar hangs on amd64 6.2-RELEASE (slightly patched version) that most of processes stuck in the 'ufs' state, under very light load, the box was equipped with amr(4) RAID. I was not able to reproduce the problem at my lab, though, it's still unknown that how to trigger the livelock :-( Still need some investigate on their production system. I reported simular issue for FreeBSD 6.2 in audit-trail for kern/104406: http://www.freebsd.org/cgi/query-pr.cgi?pr=104406cat= and there should be a thread related to this. Briefly, I suspects that this is related to nullfs filesystems on my server and when I cvsuped to FreeBSD 6.2- STABLE with Daichi's unionfs-related patches and replaced nullfs-mounted fs with unionfs-mounted (that was done 10.03.07) problem is gone (seems to be so, at least). Hmm... Seems to be different issues. The problem I have received was a pgsql server (no nullfs/unionfs involved), and the hang always happen when it is not being heavily loaded (usually in the morning, for instance, and there is no special configuration, like scheduled tasks which can generate disk load, etc., only the entropy harvesting), so this is quite confusing. Yes, a large part of the confusion is the unfortunate tendency of people to do the following: user1 my system hangs/panics/etc user2 my system hangs/panics/etc too; it must be the same problem! What we really need is for every FreeBSD user who encounters a hang/panic/etc to avoid jumping to conclusions -- no matter how many superficial similarities there may seem to you -- and instead go through the relevant steps described here: http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html Until you (or a developer) have analyzed the resulting information, you cannot definitively determine whether or not your problem is the same as a given random other problem, and you may just confuse the issue by making claims of similarity when you are really reporting a completely separate problem. Thanks, Kris pgp3OkN96LYEW.pgp Description: PGP signature