Re: Various route locking fixes merged to stable/7 (was: Re: Big problems with 7.1 locking up :-()

2009-03-08 Thread Ruben van Staveren
Hi, On 26 Feb 2009, at 2:28, Charles Sprickman wrote: On Wed, 25 Feb 2009, Robert Watson wrote: Just a minor heads up: I've merged both Kip Macy's lock order fixes to the kernel routing code, and the route locking and reference counting fixes from kern/130652 to stable/7. These fixes

Re: Big problems with 7.1 locking up :-(

2009-02-25 Thread Pete French
FYI, I'm currently awaiting testing results from Pete on the MFC of a number of routing table locking fixes, and once that's merged (hopefully tomorrow?) I'll start on the patches in the above PR. I've taken a crash-course in routing table locking in the last few days... :-) Just to let

Re: Big problems with 7.1 locking up :-(

2009-02-25 Thread Robert Watson
On Wed, 25 Feb 2009, Pete French wrote: FYI, I'm currently awaiting testing results from Pete on the MFC of a number of routing table locking fixes, and once that's merged (hopefully tomorrow?) I'll start on the patches in the above PR. I've taken a crash-course in routing table locking in

Various route locking fixes merged to stable/7 (was: Re: Big problems with 7.1 locking up :-()

2009-02-25 Thread Robert Watson
Just a minor heads up: I've merged both Kip Macy's lock order fixes to the kernel routing code, and the route locking and reference counting fixes from kern/130652 to stable/7. These fixes should correct a number of reported network-related hangs. We might want to release a subset of these

Re: Big problems with 7.1 locking up :-(

2009-02-25 Thread cpghost
On Wed, Feb 25, 2009 at 11:04:29AM +, Robert Watson wrote: On Wed, 25 Feb 2009, Pete French wrote: FYI, I'm currently awaiting testing results from Pete on the MFC of a number of routing table locking fixes, and once that's merged (hopefully tomorrow?) I'll start on the patches in

Re: Various route locking fixes merged to stable/7 (was: Re: Big problems with 7.1 locking up :-()

2009-02-25 Thread Charles Sprickman
On Wed, 25 Feb 2009, Robert Watson wrote: Just a minor heads up: I've merged both Kip Macy's lock order fixes to the kernel routing code, and the route locking and reference counting fixes from kern/130652 to stable/7. These fixes should correct a number of reported network-related hangs.

Re: Big problems with 7.1 locking up :-(

2009-02-24 Thread Robert Watson
On Mon, 23 Feb 2009, aneeth wrote: http://www.freebsd.org/cgi/query-pr.cgi?pr=130652cat= OK, will give this a try, unless anyone else wants any traces from this locked machine ? Is there a known way to tickle this bug when I've rebooted, to make sure it's fixed ? We'v been having similar

Re: Big problems with 7.1 locking up :-(

2009-02-23 Thread aneeth
Pete French-2 wrote: Probably it is your case, try please. http://www.freebsd.org/cgi/query-pr.cgi?pr=130652cat= OK, will give this a try, unless anyone else wants any traces from this locked machine ? Is there a known way to tickle this bug when I've rebooted, to make sure it's fixed

Re: Big problems with 7.1 locking up :-(

2009-02-21 Thread Robert Watson
On Tue, 17 Feb 2009, Mike Tancsa wrote: Do you have any other details about these issues ? Were the fixes ever MFC'd Earlier today I handed off some patches for Pete to test (attached below), which he's running alongside the patches in kern/130652. When I run with the patches,

Re: Big problems with 7.1 locking up :-(

2009-02-18 Thread Robert Watson
On Tue, 17 Feb 2009, Mike Tancsa wrote: At 05:38 PM 1/29/2009, Robert Watson wrote: On Fri, 9 Jan 2009, Pete French wrote: I have a number of HP 1U servers, all of which were running 7.0 perfectly happily. I have been testing 7.1 in it's various incarnations for the last couple of months

Re: Big problems with 7.1 locking up :-(

2009-02-17 Thread Mike Tancsa
At 05:38 PM 1/29/2009, Robert Watson wrote: On Fri, 9 Jan 2009, Pete French wrote: I have a number of HP 1U servers, all of which were running 7.0 perfectly happily. I have been testing 7.1 in it's various incarnations for the last couple of months on our test server and it has performed

Re: Big problems with 7.1 locking up :-(

2009-02-15 Thread Stefan Lambrev
Hi, Just to let you know what's going on with the issue. I tried kern.hz=100 on GENERIC 7.1, but the soekris started rebooting with ethernet only traffic. I made a custom kernel with RELENG_7 from 13.Feb and: options CPU_SOEKRIS options CPU_GEODE The soekris is quite stable

Re: Big problems with 7.1 locking up :-(

2009-02-13 Thread Guy Helmer
Guy Helmer wrote: FWIW, I think I have tracked down the changes just prior to 7.1-RELEASE that is causing my Supermicro dual Xeon machines to wedge. I did the binary search between 2008-10-02 and 2008-11-24 without reproducing any lockups, and then I went on to search between 2008-11-24 and

Re: Big problems with 7.1 locking up :-(

2009-02-12 Thread Guy Helmer
Guy Helmer wrote: Pete French wrote: I have a number of HP 1U servers, all of which were running 7.0 perfectly happily. I have been testing 7.1 in it's various incarnations for the last couple of months on our test server and it has performed perfectly. So the last two days I have been round

Re: Big problems with 7.1 locking up :-(

2009-02-08 Thread Pete French
load. Kip Macy has corrected at least one (both?) problems in head, and plans to MFC the fixes in the near future. We'll follow up further once the fixes are merged, and if any further problems transpire. Hi, just wondering if we are any closer to having the MFC for this yet, or if there are

Re: Big problems with 7.1 locking up :-(

2009-02-08 Thread Stefan Lambrev
Hi all, In this thread someone mention a problem with soekris devices. I personally have one of those new soekris devices and installed 7.1R and it is very easy to freeze it. All that I have to do is to copy big file vfer WIFI (atheros) with speed higher then 1-2MB/s. It takes less then 2

Re: Big problems with 7.1 locking up :-(

2009-02-08 Thread cpghost
On Sun, Feb 08, 2009 at 05:11:02PM +0200, Stefan Lambrev wrote: Hi all, In this thread someone mention a problem with soekris devices. I personally have one of those new soekris devices and installed 7.1R and it is very easy to freeze it. All that I have to do is to copy big file vfer

Re: Big problems with 7.1 locking up :-(

2009-02-08 Thread Mike Tancsa
At 10:11 AM 2/8/2009, Stefan Lambrev wrote: Hi all, In this thread someone mention a problem with soekris devices. I personally have one of those new soekris devices and installed 7.1R and it is very easy to freeze it. All that I have to do is to copy big file vfer WIFI (atheros) with speed

Re: Big problems with 7.1 locking up :-(

2009-01-29 Thread Robert Watson
On Fri, 9 Jan 2009, Pete French wrote: I have a number of HP 1U servers, all of which were running 7.0 perfectly happily. I have been testing 7.1 in it's various incarnations for the last couple of months on our test server and it has performed perfectly. So the last two days I have been

Re: Big problems with 7.1 locking up :-(

2009-01-25 Thread Cian Hughes
Pete, Have you considered enabling serial console emulation in the BIOS on the machines. I have got my iLo cards set up to redirect the serial ports on my HP servers so that I can ssh into the ILO cards and by typing Esc-Q access what I would otherwise see on a serial console.

Re: Big problems with 7.1 locking up :-(

2009-01-23 Thread Kris Kennaway
Pete Carah wrote: Well, following up on my own reply earlier, I csup'd releng_7 with a date of last dec 1; the result works fine in the laptop. I'll reload the eastern soekris tonight and see how it does. If the soekris is fine also then this gives a data point for whenever the bad commit(s)

Re: Big problems with 7.1 locking up :-(

2009-01-21 Thread Doug Barton
Pete Carah wrote: I have done some (lots of) kernel debugging in the past. I have several points: 1. I shouldn't *have* to kernel debug for a normal usage of an official release. Um, why not? We certainly put every possible effort into making sure that releases (and in fact stable

Re: Big problems with 7.1 locking up :-(

2009-01-19 Thread Pete French
yes, do ps - threads in state L or LL and RUN are especially interesting, trace of pids 28, 27, and threads wich L on locked chan. heres the output of alllocks, http://toybox.twisted.org.uk/~pete/71_show_alllocks.png here are the pages of PS:

Re: Big problems with 7.1 locking up :-(

2009-01-19 Thread Chagin Dmitry
On Mon, Jan 19, 2009 at 11:39:08AM +, Pete French wrote: yes, do ps - threads in state L or LL and RUN are especially interesting, trace of pids 28, 27, and threads wich L on locked chan. heres the output of alllocks, http://toybox.twisted.org.uk/~pete/71_show_alllocks.png

Re: Big problems with 7.1 locking up :-(

2009-01-19 Thread Pete French
Probably it is your case, try please. http://www.freebsd.org/cgi/query-pr.cgi?pr=130652cat= OK, will give this a try, unless anyone else wants any traces from this locked machine ? Is there a known way to tickle this bug when I've rebooted, to make sure it's fixed ? thanks, -pete.

Re: Big problems with 7.1 locking up :-(

2009-01-19 Thread Pete Carah
Kris writes: You and anyone else seeing performance problems should try to work through the advice given here: [1]http://people.freebsd.org/~kris/scaling/Help_my_system_is_slow.pdf Well, all the people in this thread have noticed that WITH NO CONFIG CHANGES f rom configs that worked

Re: Big problems with 7.1 locking up :-(

2009-01-19 Thread Pete French
Probably it is your case, try please. http://www.freebsd.org/cgi/query-pr.cgi?pr=130652cat= Well, I have been running this for a while now. I still get this: http://toybox.twisted.org.uk/~pete/71_lor3.png On the console, but so far the machine has not crashed. Obviously it's only been

Re: Big problems with 7.1 locking up :-(

2009-01-19 Thread Pete French
http://www.freebsd.org/cgi/query-pr.cgi?pr=130652cat= Looks like I spoke too soon - It just locked up again I am afraid. Sitting there now at the debug prompt. It does, however, look very different this time: For example here is 'show alllocks':

Re: Big problems with 7.1 locking up :-(

2009-01-19 Thread Pete French
There are significant changes in UDP locking between 7.0 and 7.1, so it could be that we're looking at a regression there. If you're able to reproduce this reliably, it might well be worth doing a little search-and-replace in udp_usrreq.c along the following lines: INP_RLOCK_ASSERT -

Re: Big problems with 7.1 locking up :-(

2009-01-19 Thread Kris Kennaway
Pete Carah wrote: Kris writes: You and anyone else seeing performance problems should try to work through the advice given here: http://people.freebsd.org/~kris/scaling/Help_my_system_is_slow.pdf http://people.freebsd.org/%7Ekris/scaling/Help_my_system_is_slow.pdf Well, all the people

Re: Big problems with 7.1 locking up :-(

2009-01-19 Thread Pete Carah
I have done some (lots of) kernel debugging in the past. I have several points: 1. I shouldn't *have* to kernel debug for a normal usage of an official release. 2. One of the soekris boxes is 2800 MILES away, in a remote location, with noone present that is a skilled (or, indeed, any kind of)

Re: Big problems with 7.1 locking up :-(

2009-01-19 Thread Mark Linimon
On Mon, Jan 19, 2009 at 04:59:59PM -0500, Pete Carah wrote: I shouldn't *have* to kernel debug for a normal usage of an official release. Agreed, but the problems that people are having do not seem to have arisen on any of the systems that ran prelease tests for 7.1. Although I'm sure it does

Re: Big problems with 7.1 locking up :-(

2009-01-19 Thread Pete Carah
Well, following up on my own reply earlier, I csup'd releng_7 with a date of last dec 1; the result works fine in the laptop. I'll reload the eastern soekris tonight and see how it does. If the soekris is fine also then this gives a data point for whenever the bad commit(s) happened. I had

Re: Big problems with 7.1 locking up :-(

2009-01-18 Thread Kris Kennaway
Tomas Randa wrote: Hello, I have similar problems. The last good kernel I have from stable brach, october the 8. Then in next upgrade, I saw big problems with performance. I tried ULE, 4BSD etc, but nothing helps, only downgrading system back. Now I am trying 7.1-p1 and problems are here

Re: Big problems with 7.1 locking up :-(

2009-01-18 Thread Michel Talon
Tomas Randa wrote: Hello, I have similar problems. The last good kernel I have from stable brach, october the 8. Then in next upgrade, I saw big problems with performance. I can add a me too here. This is on my desktop, very lightly loaded. This computer never had a single problem under

Re: Big problems with 7.1 locking up :-(

2009-01-18 Thread dick hoogendijk
On Sun, 18 Jan 2009 13:21:17 +0100 Michel Talon ta...@lpthe.jussieu.fr wrote: My previous upgrade was FreeBSD 7.0-STABLE #0: Tue Jul 22, and worked perfectly fine with exactly the same software configuration. Now i have FreeBSD 7.1-STABLE #0: Mon Jan5, and the situation is disastrous. Makes

Re: Big problems with 7.1 locking up :-(

2009-01-18 Thread Claus Guttesen
My previous upgrade was FreeBSD 7.0-STABLE #0: Tue Jul 22, and worked perfectly fine with exactly the same software configuration. Now i have FreeBSD 7.1-STABLE #0: Mon Jan5, and the situation is disastrous. Makes you wonder on on earth could have changed that much between 7.0/7.1 Nice

Re: Big problems with 7.1 locking up :-(

2009-01-16 Thread Pete French
If you are able to get into the debugger, the normal commands would be most helpful, especially if you can log the results: It finally locked up, and ctrl-alt-esc got me into the debugger at last! is there anything else you want me to get whilst it is like that aside from: ps show

Re: Big problems with 7.1 locking up :-(

2009-01-16 Thread Pete French
ps output from 'ps' is here: http://toybox.twisted.org.uk/~pete/71_lock_ps/ there are a lot of processes as this machine runes the same webservices as the actual webservers, just that nobody connects to them. show lockedvnods nothing - there are no locked vnodes show alllocks this

Re: Big problems with 7.1 locking up :-(

2009-01-16 Thread Chagin Dmitry
On Fri, Jan 16, 2009 at 12:35:49PM +, Pete French wrote: ps output from 'ps' is here: http://toybox.twisted.org.uk/~pete/71_lock_ps/ there are a lot of processes as this machine runes the same webservices as the actual webservers, just that nobody connects to them. show

Re: Big problems with 7.1 locking up :-(

2009-01-16 Thread Pete French
hi, please type: show lock 0xff0001254d20 and then show thread 0xXXX where X is 'owner' of previous output. http://toybox.twisted.org.uk/~pete/71_pdns_lock.png That's in Power DNS - which is interesting because the one difference between the boxes that lock and those which

Re: Big problems with 7.1 locking up :-(

2009-01-16 Thread Chagin Dmitry
On Fri, Jan 16, 2009 at 01:34:14PM +, Pete French wrote: hi, please type: show lock 0xff0001254d20 and then show thread 0xXXX where X is 'owner' of previous output. http://toybox.twisted.org.uk/~pete/71_pdns_lock.png That's in Power DNS - which is interesting

Re: Big problems with 7.1 locking up :-(

2009-01-16 Thread Robert Watson
On Fri, 16 Jan 2009, Pete French wrote: hi, please type: show lock 0xff0001254d20 and then show thread 0xXXX where X is 'owner' of previous output. http://toybox.twisted.org.uk/~pete/71_pdns_lock.png That's in Power DNS - which is interesting because the one difference

Re: Big problems with 7.1 locking up :-(

2009-01-16 Thread Pete French
trace 832 http://toybox.twisted.org.uk/~pete/71_trace_832_1.png http://toybox.twisted.org.uk/~pete/71_trace_832_2.png -pete. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any

Re: Big problems with 7.1 locking up :-(

2009-01-16 Thread Pete French
I rather feared as much. Let's run down the path of perhaps there's a problem with the new UDP locking code for a bit and see where it takes us. Is it possible to run those boxes with WITNESS -- I believe that the fact that show alllocks is failing is because WITNESS isn't present. Yes, I

Re: Big problems with 7.1 locking up :-(

2009-01-16 Thread Robert Watson
On Fri, 16 Jan 2009, Pete French wrote: I rather feared as much. Let's run down the path of perhaps there's a problem with the new UDP locking code for a bit and see where it takes us. Is it possible to run those boxes with WITNESS -- I believe that the fact that show alllocks is failing is

Re: Big problems with 7.1 locking up :-(

2009-01-16 Thread Pete French
If you do INVARIANTS + WITNESS + WITNESS_SKIPSPIN, that should be good. WITNESS does a number of things, including tracking (and being judgemental about) lock order. One nice side effect of that tracking is that we keep track of a lot more lock state explicitly, so DDB's show allocks, show

Re: Big problems with 7.1 locking up :-(

2009-01-16 Thread Pete French
Just confinuing to look at this with the help of Dimity, and the output from 'bt' is here: http://toybox.twisted.org.uk/~pete/71_bt.png The top bit of that is from my 'show alllocks' the full version of whih is here: http://toybox.twisted.org.uk/~pete/71_show_alllocks.png -pete.

Re: Big problems with 7.1 locking up :-(

2009-01-15 Thread Pete French
Just an update on this - I tried the various kernels, but now the machine is not locking up at all. As I havent actually chnaged anything then this does not make me as happy as you might expect. I don;t know what to do now - I daare not upgrade the machines to an OS that I know locks, but if I

Re: Big problems with 7.1 locking up :-(

2009-01-15 Thread Robert Watson
On Thu, 15 Jan 2009, Pete French wrote: Just an update on this - I tried the various kernels, but now the machine is not locking up at all. As I havent actually chnaged anything then this does not make me as happy as you might expect. I don;t know what to do now - I daare not upgrade the

Re: Big problems with 7.1 locking up :-(

2009-01-15 Thread Pete French
Given the inconsistency of the symptoms, I wouldn't preclude something environmental: could it be that it was the bottom, or more likely, top box in a rack and that your air conditioning isn't quite as effective there when the outside temperature is above/below some threshold? It's a

Re: Big problems with 7.1 locking up :-(

2009-01-15 Thread Robert Watson
On Thu, 15 Jan 2009, Pete French wrote: In any case, if it starts to reproduceably recur, send out mail and we can see if we can track it down some more. BTW, did you establish if the version of iLo you have has a remote NMI? I seem to recall that some do, and being able to deliver an NMI

Re: Big problems with 7.1 locking up :-(

2009-01-15 Thread Pete French
desirable. You might want to give the NMI a test run just to make sure it behaves as you think it should, though -- be aware that if DDB/KDB aren't compiled into the kernel, then an NMI will panic the box. Unfortunately it does this... http://toybox.twisted.org.uk/~pete/71_nmi1.png That

Re: Big problems with 7.1 locking up :-(

2009-01-15 Thread Robert Watson
On Thu, 15 Jan 2009, Pete French wrote: desirable. You might want to give the NMI a test run just to make sure it behaves as you think it should, though -- be aware that if DDB/KDB aren't compiled into the kernel, then an NMI will panic the box. Unfortunately it does this...

Re: Big problems with 7.1 locking up :-(

2009-01-15 Thread John Baldwin
On Thursday 15 January 2009 12:49:11 pm Robert Watson wrote: On Thu, 15 Jan 2009, Pete French wrote: desirable. You might want to give the NMI a test run just to make sure it behaves as you think it should, though -- be aware that if DDB/KDB aren't compiled into the kernel, then an

Re: Big problems with 7.1 locking up :-(

2009-01-14 Thread Pete French
If you have BREAK_TO_DEBUGGER compiled into the kernel, then try pressing ctrl-alt-break on the console to see if you can drop into the debugger, or issue a serial break on a serial console. Well, I added BREAK_TO_DEBUGGER to the kernel config I had which contained all the other stuff

Re: Big problems with 7.1 locking up :-(

2009-01-14 Thread Robert Watson
On Wed, 14 Jan 2009, Pete French wrote: If you have BREAK_TO_DEBUGGER compiled into the kernel, then try pressing ctrl-alt-break on the console to see if you can drop into the debugger, or issue a serial break on a serial console. Well, I added BREAK_TO_DEBUGGER to the kernel config I had

Re: Big problems with 7.1 locking up :-(

2009-01-14 Thread Pete French
effect on control flow, unlike, say, WITNESS, which significantly distorts timing. Is there any chance you picked up any of the recent fixes that went into RELENG_7 without noticing, and that perhaps one of those did it? With I'm pretty certian of that - I hav just been changing kernel

Re: Big problems with 7.1 locking up :-(

2009-01-14 Thread Claus Guttesen
my problem in with others under the asusmption that it's all the same. This is onbiously pretty rare - out of 24 of the HP servers the problems only crops up on 4 of them. But there is nothing dfferent about those 4. Could it be different bios/firmware on the hp-servers? Mr. Aliyev was unable

Re: Big problems with 7.1 locking up :-(

2009-01-13 Thread Doug Barton
Pete French wrote: Mine never lock up doing buildworlds either. They only lock up when they are sitting there more of less idle! The machines which have never locked up are the webservers, which are fairly heavlt loaded. The machine which locks up the most frequently is a box sitting there

Re: Big problems with 7.1 locking up :-(

2009-01-13 Thread Claus Guttesen
Mine never lock up doing buildworlds either. They only lock up when they are sitting there more of less idle! The machines which have never locked up are the webservers, which are fairly heavlt loaded. The machine which locks up the most frequently is a box sitting there doing nothing but DNS,

Re: Big problems with 7.1 locking up :-(

2009-01-13 Thread Gavin Atkinson
On Mon, 2009-01-12 at 19:00 +, Pete French wrote: I'm not sure if you've done this already, but the normal suggestions apply: have you compiled with INVARIANTS/WITNESS/DDB/KDB/BREAK_TO_DEBUGGER, and do any results / panics / etc result? Sometimes these debugging tools are able

Re: Big problems with 7.1 locking up :-(

2009-01-13 Thread Pete French
Lock order reversals are warnings of potential deadlock due to a lock cycle, but deadlocks may not actually result, either because it's a false positive (some locking construct that is deadlock free but involves lock cycles), or because a cycle didn't actually form. The message is

Re: Big problems with 7.1 locking up :-(

2009-01-13 Thread Pete French
It was mentioned previous in this thread that CPUTYPE could be an issue. Did you change this if you customized your kernel? Actually, I think thats been ruled out as a possible cause, along with the scheduler. Certainly I have tried it both ways and there is no difference, and I think i saw

Re: Big problems with 7.1 locking up :-(

2009-01-13 Thread Robert Watson
On Tue, 13 Jan 2009, Pete French wrote: Features like WITNESS and INVARIANTS may change the timing of the kernel making certain race conditions less likely; I'd run with them for a bit and see if you can reproduce the hang with them present, as they will make debugging the problem a lot

Re: Big problems with 7.1 locking up :-(

2009-01-13 Thread Pete French
Can you break into the debugger with Ctrl-Alt-Esc, or by sending a break over the serial line? No, ctrl-alt-esc doesnt work, and there is no serial line on the machine (not that I can access anyway) -pete. ___ freebsd-stable@freebsd.org mailing list

Re: Big problems with 7.1 locking up :-(

2009-01-13 Thread Pete French
Silly question but do you have powerd enabled on that server? If so, does disabling it help? Also do you have any of these in /etc/rc.conf (i.e., they are not the same as the default values in /etc/defaults/rc.conf): performance_cx_lowest=HIGH# Online CPU idle state

Re: Big problems with 7.1 locking up :-(

2009-01-13 Thread Pete French
I can't (fortunately) make it lock up. I have a DL360 G5 which is unused atm. and can test on it if needed. Would it be possible to install that under amd64 and hammer it with DNS requests ? I have been trying to think what the difference might be between my webservers and the machines which

RE: Big problems with 7.1 locking up :-(

2009-01-13 Thread Nathan Way
I also am experiencing lock-ups on a server recently upgraded from 7.0-RELEASE to 7.1-STABLE. This server is a Supermicro 6022 dual-Xeon box running a GENERIC i386 SMP kernel. Since upgrading to 7.1-STABLE it has started locking up daily. I see similar symptoms that Pete is seeing - no ping

Re: Big problems with 7.1 locking up :-(

2009-01-13 Thread Robert Watson
On Tue, 13 Jan 2009, Pete French wrote: I can't (fortunately) make it lock up. I have a DL360 G5 which is unused atm. and can test on it if needed. Would it be possible to install that under amd64 and hammer it with DNS requests ? I have been trying to think what the difference might be

Re: Big problems with 7.1 locking up :-(

2009-01-13 Thread Ken Smith
On Mon, 2009-01-12 at 21:35 +0100, Tomas Randa wrote: I have similar problems. The last good kernel I have from stable brach, october the 8. Then in next upgrade, I saw big problems with performance. I tried ULE, 4BSD etc, but nothing helps, only downgrading system back. Now I am trying

Re: Big problems with 7.1 locking up :-(

2009-01-12 Thread Claus Guttesen
I've updagraded a test-webserver to 7.1 when it was released. After a few days I upgraded a production-webserver to 7.1 on Jan. 8'th and it has been running without any problems. The webserver is not heavily loaded (load at 2-3 on average). I have made a buildworld -j 8 and it runs fine. If

Re: Big problems with 7.1 locking up :-(

2009-01-12 Thread Claus Guttesen
I am also surprised that this isn't more widely reported, as the hardware is very common. The only oddity with ym compile is that I set the CPUTYPE to 'core2' - that shouldnt have an effect, but I will remove it anyway, just so I am actually building a completely vanilla amd64. That way I

Re: Big problems with 7.1 locking up :-(

2009-01-12 Thread Claus Guttesen
It has performed a buildworld without problems and I'll be doing some buildworlds throughout the day. This is on a HP c-class-blade with 8 GB ram, 2 x quad-core and the build-in p200-controller with 64 MB ram. I've performed five buildworlds decrementing -j from 16 to 6 and I can't lock up

Re: Big problems with 7.1 locking up :-(

2009-01-12 Thread Pete French
I've performed five buildworlds decrementing -j from 16 to 6 and I can't lock up the server. Mine never lock up doing buildworlds either. They only lock up when they are sitting there more of less idle! The machines which have never locked up are the webservers, which are fairly heavlt loaded.

Re: Big problems with 7.1 locking up :-(

2009-01-12 Thread Robert Watson
On Fri, 9 Jan 2009, Garance A Drosihn wrote: At 2:39 PM -0500 1/9/09, Robert Blayzor wrote: On Jan 8, 2009, at 8:58 PM, Pete French wrote: I have a number of HP 1U servers, all of which were running 7.0 perfectly happily. I have been testing 7.1 in it's various incarnations for the last

Re: Big problems with 7.1 locking up :-(

2009-01-12 Thread Robert Watson
On Sat, 10 Jan 2009, Pete French wrote: FWIW, the other guy I know who is having this problem had already switched to using ULE under 7.0-release, and did not have any problems with it. So *his* problem was probably not related to SCHED_ULE, unless something has recently changed there.

Re: Big problems with 7.1 locking up :-(

2009-01-12 Thread Pete French
I'm not sure if you've done this already, but the normal suggestions apply: have you compiled with INVARIANTS/WITNESS/DDB/KDB/BREAK_TO_DEBUGGER, and do any results / panics / etc result? Sometimes these debugging tools are able to convert hangs into panics, which gives us much more ability

Re: Big problems with 7.1 locking up :-(

2009-01-12 Thread Garance A Drosihn
At 2:55 PM + 1/12/09, Robert Watson wrote: On Fri, 9 Jan 2009, Garance A Drosihn wrote: At 2:39 PM -0500 1/9/09, Robert Blayzor wrote: On Jan 8, 2009, at 8:58 PM, Pete French wrote: I have a number of HP 1U servers, all of which were running 7.0 perfectly happily. I have been testing 7.1

Re: Big problems with 7.1 locking up :-(

2009-01-12 Thread Pete French
I'm not sure if you've done this already, but the normal suggestions apply: have you compiled with INVARIANTS/WITNESS/DDB/KDB/BREAK_TO_DEBUGGER, and do any results / panics / etc result? Sometimes these debugging tools are able to convert hangs into panics, which gives us much more ability

Re: Big problems with 7.1 locking up :-(

2009-01-12 Thread Pete French
Just to followup on this: My friend did switch back to a 7.1 kernel with SCHED_4BSD, and he still ran into problems. The error messages weren't Acually, I dont know if I posted it, but that was the same for me too. The scheduler makes no difference, nor do CPU copile settings. -pete.

Re: Big problems with 7.1 locking up :-(

2009-01-12 Thread Tomas Randa
Hello, I have similar problems. The last good kernel I have from stable brach, october the 8. Then in next upgrade, I saw big problems with performance. I tried ULE, 4BSD etc, but nothing helps, only downgrading system back. Now I am trying 7.1-p1 and problems are here again. Mysql is

Re: Big problems with 7.1 locking up :-(

2009-01-12 Thread Claus Guttesen
I have similar problems. The last good kernel I have from stable brach, october the 8. Then in next upgrade, I saw big problems with performance. I tried ULE, 4BSD etc, but nothing helps, only downgrading system back. Now I am trying 7.1-p1 and problems are here again. Mysql is waiting a lot

Re: Big problems with 7.1 locking up :-(

2009-01-12 Thread Robert Watson
On Mon, 12 Jan 2009, Tomas Randa wrote: I have similar problems. The last good kernel I have from stable brach, october the 8. Then in next upgrade, I saw big problems with performance. I tried ULE, 4BSD etc, but nothing helps, only downgrading system back. Now I am trying 7.1-p1 and

Re: Big problems with 7.1 locking up :-(

2009-01-12 Thread Robert Watson
On Mon, 12 Jan 2009, Pete French wrote: I'm not sure if you've done this already, but the normal suggestions apply: have you compiled with INVARIANTS/WITNESS/DDB/KDB/BREAK_TO_DEBUGGER, and do any results / panics / etc result? Sometimes these debugging tools are able to convert hangs into

Re: Big problems with 7.1 locking up :-(

2009-01-12 Thread Robert Watson
On Mon, 12 Jan 2009, Garance A Drosihn wrote: He is not eager to do a whole lot of experiments to track down the problem, since this is happening on busy production machines and he can't afford to have a lot of downtime on them (especially now that the semester at RPI has started up). The

Re: Big problems with 7.1 locking up :-(

2009-01-11 Thread Pete French
I noticed a similar problem testing 7.1-RC1, It seemed to be a deep deadlock, as it was triggered by lighttpd doing kern_sendfile, and never returning. The side effects (being unable to create processes, etc) is similar. Interesting - did you get any responses from anyone else regarding this

Re: Big problems with 7.1 locking up :-(

2009-01-11 Thread Pete French
My kernconf is below, try building the kernel, and send an email containing the backtrace from any process that has blocked (in my Well, I havent managed to get a backtrace, but immediately upon booting the system halts with the following: http://www.twisted.org.uk/~pete/71_lor1.jpg

Re: Big problems with 7.1 locking up :-(

2009-01-11 Thread Dylan Cochran
On Sun, Jan 11, 2009 at 11:27 AM, Pete French petefre...@ticketswitch.com wrote: My kernconf is below, try building the kernel, and send an email containing the backtrace from any process that has blocked (in my Well, I havent managed to get a backtrace, but immediately upon booting the

Re: Big problems with 7.1 locking up :-(

2009-01-11 Thread Pete French
Not Found sorry, see the subsequent email, there are more links there to working PNG's -pete. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to

Re: Big problems with 7.1 locking up :-(

2009-01-11 Thread Garrett Cooper
On Sun, Jan 11, 2009 at 4:45 AM, Pete French petefre...@ticketswitch.com wrote: I noticed a similar problem testing 7.1-RC1, It seemed to be a deep deadlock, as it was triggered by lighttpd doing kern_sendfile, and never returning. The side effects (being unable to create processes, etc) is

Re: Big problems with 7.1 locking up :-(

2009-01-10 Thread Pete French
FWIW, the other guy I know who is having this problem had already switched to using ULE under 7.0-release, and did not have any problems with it. So *his* problem was probably not related to SCHED_ULE, unless something has recently changed there. Well, one of my machines just locked up

Re: Big problems with 7.1 locking up :-(

2009-01-10 Thread heliocentric
I noticed a similar problem testing 7.1-RC1, It seemed to be a deep deadlock, as it was triggered by lighttpd doing kern_sendfile, and never returning. The side effects (being unable to create processes, etc) is similar. My kernconf is below, try building the kernel, and send an email containing

Re: Big problems with 7.1 locking up :-(

2009-01-09 Thread Guy Helmer
Pete French wrote: I have a number of HP 1U servers, all of which were running 7.0 perfectly happily. I have been testing 7.1 in it's various incarnations for the last couple of months on our test server and it has performed perfectly. So the last two days I have been round upgrading all our

Re: Big problems with 7.1 locking up :-(

2009-01-09 Thread Mike Tancsa
At 09:49 AM 1/9/2009, Guy Helmer wrote: RAID controller with a pair of mirrored drives connected. Each one has both ethernets connected, bundled using lagg and LACP. I can't tell whether my situation is related, but I am seeing lockups on SMP Supermicro servers with both older (NetBurst-ish)

Re: Big problems with 7.1 locking up :-(

2009-01-09 Thread Pete French
Are you using the same disk controller as Peter ? Do both of you run with quotas on the file system ? By lockup, do you mean it doesnt respond to the network either or just anything that needs disk IO ? I dont think he can be using yhe same controller, as mine is an embedded HPO unit. they

Re: Big problems with 7.1 locking up :-(

2009-01-09 Thread Guy Helmer
Pete French wrote: Are you using the same disk controller as Peter ? Do both of you run with quotas on the file system ? By lockup, do you mean it doesnt respond to the network either or just anything that needs disk IO ? I dont think he can be using yhe same controller, as mine is an

Re: Big problems with 7.1 locking up :-(

2009-01-09 Thread Robert Blayzor
On Jan 8, 2009, at 8:58 PM, Pete French wrote: I have a number of HP 1U servers, all of which were running 7.0 perfectly happily. I have been testing 7.1 in it's various incarnations for the last couple of months on our test server and it has performed perfectly. I noticed a problem with

Re: Big problems with 7.1 locking up :-(

2009-01-09 Thread Pete French
Since ULE is now default in 7.1 and not in 7.0, perhaps you can try that? Actually you might be on to something there one of the main differences between out test GL360 and the live ones is that the test one has less cores in it, and is under less load. So multiprocessing problems may

Re: Big problems with 7.1 locking up :-(

2009-01-09 Thread Garance A Drosihn
At 1:58 AM + 1/9/09, Pete French wrote: I have a number of HP 1U servers, all of which were running 7.0 perfectly happily. I have been testing 7.1 in it's various incarnations for the last couple of months on our test server and it has performed perfectly. So the last two days I have been

  1   2   >