Re: LSI 9240-4i 4K alignment

2012-08-16 Thread Steven Hartland
- Original Message - From: George Kontostanos gkontos.m...@gmail.com It still really smells like something higher up the layers than the controller tbh. We have tried many combinations with different drives. Any other suggestions? See below Confused as your 9240-4i is a SAS2008

Re: LSI 9240-4i 4K alignment

2012-08-16 Thread Steven Hartland
- Original Message - From: George Kontostanos gkontos.m...@gmail.com You are right, the chip specs say: LSISAS2108 RAID-on-Chip The drives are identified as mfisyspd0, mfisyspd1, etc. The following might be interesting to you:-

Re: AHCI Timeout errors on Intel Patsburg

2012-07-29 Thread Steven Hartland
- Original Message - From: Alexander Motin m...@freebsd.org is cs ss 0001 rs 0001 tfd 40 serr 0088 This line (ss and rs fields) tells me that device haven't confirmed completion of one NCQ command. Bits set in serr field mean 10b to 8b Decode Error and

AHCI Timeout errors on Intel Patsburg

2012-07-27 Thread Steven Hartland
We're seeing some strange timeout errors on some new Supermicro X9DRT-HF MB's we here when combined with KINGSTON HyperX 3K SSD's It seems that when connnected to the second channel reads often timeout stalling all IO under 8.3-RELEASE-p3 When this happens we see:- Jul 27 14:35:59 lon059

Re: ZFS causing panic

2012-07-23 Thread Steven Hartland
- Original Message - From: Clayton Milos c...@milos.co.za Hi guys I've had an issue for some time now. When I'm copying a lot of files over to ZFS usually using SMB it causes a panic and locks up the server. I'm running FreeBSD 9.0-RELEASE with a custom kernel. I've just pulled

Re: Checksum errors across ZFS array

2012-07-20 Thread Steven Hartland
- Original Message - From: Dr Josef Karthauser j...@tao.org.uk So, take care if the memory doesn't report any failures, it might still be faulty. p.s. It was my fault that I wasn't running ECC memory on the system! :/. We've even seen this with ECC memory. Running the memory in a

Re: Checksum errors across ZFS array

2012-07-19 Thread Steven Hartland
- Original Message - From: James Snow s...@teardrop.org I have a ZFS server on which I've seen periodic checksum errors on almost every drive. While scrubbing the pool last night, it began to report unrecoverable data errors on a single file. I compared an md5 of the supposedly

Re: Checksum errors across ZFS array

2012-07-19 Thread Steven Hartland
- Original Message - From: James Snow s...@teardrop.org On Thu, Jul 19, 2012 at 06:05:32PM +0100, Dr Joe Karthauser wrote: Hi James, It's almost definitely a memory problem. I'd change it ASAP if I were you. I lost about 70mb from my zfs pool for this very reason just a few weeks

Re: 8.2 -8.3 regression on disk writes

2012-07-16 Thread Steven Hartland
- Original Message - From: Michael Ross g...@ross.cx To: freebsd-stable@freebsd.org Sent: Monday, July 16, 2012 2:23 PM Subject: 8.2 -8.3 regression on disk writes Hello, using 8.2 the machine runs fine, using 8.3 or higher, not so much. In laymans terms, if I do too many

Re: bge problems in RELENG_9, bge0: watchdog timeout -- resetting

2012-07-13 Thread Steven Hartland
- Original Message - From: Sean Bruno sean...@yahoo-inc.com No real change. I suspect something else is going on here that I don't understand. I note that when the system malfunctions now, the system cannot boot and requires me to enter the bios to check my settings. We've had a

Re: 9-stabe: cd device gone, ATA_CAM panics

2012-06-29 Thread Steven Hartland
- Original Message - From: Oliver Fromme o...@lurza.secnetix.de I need a working DVD drive, so I'm now considering to downgrade to 8-stable. But then again, TMPFS didn't work a well for me as it does in 9-stable (which was the main reason for me to upgrade), so I'm kind of stuck in

Re: FreeBSD and IPMI how-to (was Re: su problem)

2012-06-15 Thread Steven Hartland
Daniel Braniss writes: Would some kind soul point me to a howto for configuring IPMI on FreeBSD? I have a Dell PowerEdge 840 that supports IPMI, but I have no idea how to set it up - either in the BIOS or in FreeBSD. I've messed around with ipmitools a little, but I haven't gotten it to work.

Re: Why Are You NOT Using FreeBSD ?

2012-06-01 Thread Steven Hartland
- Original Message - From: Mehmet Erol Sanliturk m.e.sanlit...@gmail.com If you are NOT using FreeBSD for any area or some areas , would you please list those areas with most important first to least important last ? Although we would like to we cant use FreeBSD to run some Linux based

Re: Why Are You Using FreeBSD?

2012-05-30 Thread Steven Hartland
1. The community - Unlike Linux which is very fragmented by all the different flavours and hence individual communities, FreeBSD has one community who are always happy to help with hints tips and advice. This simply cant be beaten! 2. Stability - There's always issue with any OS but in our many

Re: 9-STABLE, ZFS, NFS, ggatec - suspected memory leak

2012-04-26 Thread Steven Hartland
- Original Message - From: Rick Macklem rmack...@uoguelph.ca To: Oliver Brandmueller o...@e-gitt.net Cc: freebsd-stable@freebsd.org Sent: Thursday, April 26, 2012 1:24 AM Subject: Re: 9-STABLE, ZFS, NFS, ggatec - suspected memory leak Oliver Brandmueller wrote: Hi, After figuring an

Re: 9-STABLE, ZFS, NFS, ggatec - suspected memory leak

2012-04-26 Thread Steven Hartland
Original Message - From: Rick Macklem rmack...@uoguelph.ca At a glance, it looks to me like 8.x is affected. Note that the bug only affects the new NFS server (the experimental one for 8.x) when exporting ZFS volumes. (UFS exported volumes don't leak) If you are running a server that

Re: ahci hangs on Supermicro MicroCloud second channel

2012-03-21 Thread Steven Hartland
- Original Message - From: Dmitry Morozovsky ma...@rinet.ru Quick update: I have received 1.0b from Mar 19, flashed it, but nothing changes regarding disk subsystem; moreover, now kernel is constantly whining about acpi_tz0. Will investigate further. Which version of FreeBSD is

Re: ahci hangs on Supermicro MicroCloud second channel

2012-03-19 Thread Steven Hartland
- Original Message - From: Dmitry Morozovsky ma...@rinet.ru To: Steven Hartland kill...@multiplay.co.uk Cc: m...@freebsd.org; freebsd-stable@freebsd.org Sent: Monday, March 19, 2012 8:42 AM Subject: Re: ahci hangs on Supermicro MicroCloud second channel On Sun, 18 Mar 2012, Steven

Re: ahci hangs on Supermicro MicroCloud second channel

2012-03-18 Thread Steven Hartland
- Original Message - From: Dmitry Morozovsky ma...@rinet.ru To: freebsd-stable@FreeBSD.org Cc: m...@freebsd.org Sent: Sunday, March 18, 2012 4:10 PM Subject: ahci hangs on Supermicro MicroCloud second channel Dear colleagues, I've start testing SuperMicro MicroCloud[1] to have

Re: ahci hangs on Supermicro MicroCloud second channel

2012-03-18 Thread Steven Hartland
- Original Message - From: Dmitry Morozovsky ma...@rinet.ru Well, ahci problem solved, but I still have much worse performance (and different on ada0 and ada1!): ada0, MC 50-60 MBps ada1, MC 13-25 MBps ada*, 5017 130+ MBps Could you please post SATA/AHCI BIOS settings from your

Re: ahci hangs on Supermicro MicroCloud second channel

2012-03-18 Thread Steven Hartland
- Original Message - From: Dmitry Morozovsky ma...@rinet.ru To: Steven Hartland kill...@multiplay.co.uk Cc: m...@freebsd.org; freebsd-stable@freebsd.org Sent: Sunday, March 18, 2012 8:45 PM Subject: Re: ahci hangs on Supermicro MicroCloud second channel On Sun, 18 Mar 2012, Steven

Re: Troube with SSD

2012-03-11 Thread Steven Hartland
- Original Message - From: Willem Jan Withagen w...@digiware.nl Just as a followup. I reported the above problem Today it occurred again. But this time I was able to find a firmware upgrade for the Corsair Force GT from 1.2 to 1.3.3 (Need Win7 to be able to upgrade)

What ZFS version will be in 8.3?

2012-03-11 Thread Steven Hartland
Hi guys which version of ZFS support will be included in 8.3? Regards Steve This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is

ahci / ada hiding disk errors?

2012-02-16 Thread Steven Hartland
We've got a machine here with a suspected failed disk but the ahci driver seems to be hiding the details of any failure and only displaying Synchronize cache failed to the console. Switching to IDE mode in the bios and using the old adX devices show info such as:- ad6: 953869MB Seagate

Re: ahci / ada hiding disk errors?

2012-02-16 Thread Steven Hartland
- Original Message - From: Jeremy Chadwick free...@jdc.parodius.com ... The long test is still running, as I stated above. Also, just as a data point: folks should remember to completely ignore the remaining percentage shown -- it is hardly ever accurate, especially on Western

Re: serious packet routing issue causing ntpd high load?

2012-02-09 Thread Steven Hartland
- Original Message - From: Qing Li qin...@freebsd.org Sorry about the delayed response. No, this one just fell through the cracks. Has anyone responded ? Does it still exist in 9.x ? We discovered yesterday that adding the following routes, which are present in:

ZFS: i/o error - all block copies unavailable on large disk number machines

2012-01-23 Thread Steven Hartland
We did a minor kernel update on a large storage machine here today which runs FreeBSD 8.2 and to our surprise it failed to boot at the loader with ZFS: i/o error - all block copies unavailable. After some digging we discovered that this was likely due to the fact that the BIOS only enumerates

Re: ZFS: i/o error - all block copies unavailable on large disk number machines

2012-01-23 Thread Steven Hartland
- Original Message - From: Chuck Swiger On Jan 23, 2012, at 9:04 AM, Steven Hartland wrote: After some digging we discovered that this was likely due to the fact that the BIOS only enumerates the first 12 disks and this machine has more than that in the root zpool which was a striped

Re: ZFS: i/o error - all block copies unavailable on large disk number machines

2012-01-23 Thread Steven Hartland
- Original Message - From: Matthew Seaman Even if you do split up your pool into vdevs using 8 drives, you will still run into the problem with zfs being unable to assemble the pool unless it sees all of the drives in it. Interesting that this only appeared as part of a minor kernel

Re: ZFS: i/o error - all block copies unavailable on large disk number machines

2012-01-23 Thread Steven Hartland
- Original Message - From: Olivier Smedts In my case, I fixed it by having a separate /boot on some USB sticks -- this was only ever accessed to read the kernel, kernel modules and bootloader at boot time, so no worries over performance. Out of interest whats the procedure you used

Re: HPN-SSH question

2012-01-14 Thread Steven Hartland
On a similar note I'm actually quite concerned by the inclusion of these patches by default in the OS version of ssh as we've seen several cases of it causing noticeable performance degradation instead of improvement. I've not tested on 9, but this was certainly the case on 8.2. Regards

8.2 EoL Schedule?

2012-01-13 Thread Steven Hartland
Currently http://www.freebsd.org/security/ states 8.2 is Estimated EoL July 31, 2012. Given 9.0 has only just been released can we assume this is just out of date and 8.2 will be supported for longer than this? Along those lines are there any more plans for additional point releases to 8 or

Re: SVN checkout

2012-01-13 Thread Steven Hartland
You can do this using csup for example:- cp /usr/share/examples/cvsup/stable-supfile /usr/share/examples/cvsup/9.0-release-supfile Edit the following to the relavent values *default host=CHANGE_THIS.FreeBSD.org *default release=cvs tag=RELENG_9 e.g. *default host=cvs.uk.FreeBSD.org *default

Re: Benchmark (Phoronix): FreeBSD 9.0-RC2 vs. Oracle Linux 6.1Server

2011-12-15 Thread Steven Hartland
- Original Message - From: Michael Larabel michael.lara...@phoronix.com I was the on that carried out the testing and know that it was on the same system. All of the testing, including the system tables, is fully automated. Under FreeBSD sometimes the parsing of some component

Re: Benchmark (Phoronix): FreeBSD 9.0-RC2 vs. Oracle Linux 6.1Server

2011-12-15 Thread Steven Hartland
Having a quick look at those results aren't there a few annomolies e.g. THREADED I/O TESTER for Oracle reports 10255.75MB/s Which is clearly impossible for a single HD system meaning its basically caching the entire data set? Regards Steve

Re: SCHED_ULE should not be the default

2011-12-15 Thread Steven Hartland
With all the discussion I thought I'd give a buildworld benchmark a go here on a spare 24 core machine. ULE tested fine but with 4BSD it wont even boot panicing with the following:- http://screensnapr.com/v/hwysGV.png This is on a clean 8.2-RELEASE-p4 Upgrading to RELENG_9 fixed this but its a

Re: SCHED_ULE should not be the default

2011-12-15 Thread Steven Hartland
Lars Engels wrote: 9.0 ships with gcc and clang which both need to be compiled, 8.2 only has gcc. Ahh, any reason we need both, and is it possible to disable clang? Regards Steve This e.mail is private and confidential between Multiplay

Re: 8.2 + apache == a LOT of sigprocmask

2011-11-15 Thread Steven Hartland
- Original Message - From: Daniil Cherednik dchered...@masterhost.ru I am not trying to start a holy war, but I really need to increase performance of our hosting in FreeBSD. Is there something you need from apache that means you cant use nginx for instead? nginx + php-fpm is much

Re: serious packet routing issue causing ntpd high load?

2011-11-03 Thread Steven Hartland
- Original Message - From: Alexander V. Chernikov melif...@freebsd.org RTM_MISS: Lookup failed on this address: len 184, pid: 0, seq 0, errno 0, flags:DONE locks: inits: sockaddrs: DST ::A.B.C.D I'm unable to reproduce an issue on (nearly) GENERIC 8-S, but I see nearly the same

Re: ntpd couldn't resolve host name on system boot

2011-10-24 Thread Steven Hartland
- Original Message - From: Jeremy Chadwick free...@jdc.parodius.com The problem is that the networking layer is not TRULY available by the time ntpd starts. This does have to do with NIC drivers, but the same behaviour can be seen on all NICs, including excellent ones like em(4).

Re: zfs parition probing causing long delay at BTX loader

2011-10-21 Thread Steven Hartland
- Original Message - From: Peter Maloney peter.malo...@brockmann-consult.de To: freebsd-stable@freebsd.org Sent: Friday, October 21, 2011 11:17 AM Subject: Re: zfs parition probing causing long delay at BTX loader On 10/20/2011 07:23 PM, Steven Hartland wrote: Installing a new

Re: zfs parition probing causing long delay at BTX loader

2011-10-21 Thread Steven Hartland
- Original Message - From: Mark Saad nones...@longcount.org Yes after the beastie menu we also have a noticable pause before the kernel start booting, just white square top left hand corner, not tracked that one down yet, any ideas? Steve the change you need is in HEAD not sure if it

zfs parition probing causing long delay at BTX loader

2011-10-20 Thread Steven Hartland
Installing a new machine here which has 10+ disks we're seeing BTX loader take 50+ seconds to enumerate the disks. After doing some digging I found the following thread on the forums which hinted that r198420 maybe the cause. http://forums.freebsd.org/showthread.php?t=12705 A quick change to

Re: High cpu usage when using ZFS cache device

2011-10-11 Thread Steven Hartland
- Original Message - From: Mickaƫl Maillot mickael.mail...@gmail.com same problem here after ~ 30 days with a production server and 2 SSD Intel X25M as L2. so we update and reboot the 8-STABLE server every month. Old thread but also seeing this on 8.2-RELEASE so looks like this may

Re: High cpu usage when using ZFS cache device

2011-10-11 Thread Steven Hartland
- Original Message - From: Artem Belevich a...@freebsd.org No, there was no PR. L2arc CPU hogging after ~24 days was fixed in r218180 in -HEAD and was MFC'ed to 8-stable in r218429 early in February '11. If you're using 8-RELEASE, upgrading to 8-STABLE would be something to

Re: High cpu usage when using ZFS cache device

2011-10-11 Thread Steven Hartland
- Original Message - From: Artem Belevich a...@freebsd.org On Tue, Oct 11, 2011 at 10:21 AM, Steven Hartland kill...@multiplay.co.uk wrote: Thanks for the confirmation there Artem, we currently can't use 8-STABLE due to the serious routing issue, seem like every packet generates

Re: serious packet routing issue causing ntpd high load?

2011-10-10 Thread Steven Hartland
- Original Message - From: Li, Qing qing...@bluecoat.com RTM_MISS: Lookup failed on this address: len 184, pid: 0, seq 0, errno 0, flags:DONE locks: inits: sockaddrs: DST ::A.B.C.D Would it be possible for you to email me what exactly does ::A.B.C.D map into WRT your system or

Re: serious packet routing issue causing ntpd high load?

2011-10-05 Thread Steven Hartland
- Original Message - From: Jeremy Chadwick free...@jdc.parodius.com 1428 root1 440 11900K 2860K select 0 0:17 0.00% /usr/sbin/ntpd -c /conf/ME/ntp.conf -p /var/run/ntpd.pid -f And route -n monitor shows no anomalies here. Maybe you should tcpdump to find out if

serious packet routing issue causing ntpd high load?

2011-10-04 Thread Steven Hartland
We just updated a machine to 8-STABLE and I've noticed that ntpd is using notible amounts of CPU 5-7% which is very high for such a trivial daemon. 8.2-STABLE FreeBSD 8.2-STABLE #16: Tue Oct 4 09:53:17 UTC 2011 truss indicates its constantly checking and reading from a socket 0.047297485

Re: serious packet routing issue causing ntpd high load?

2011-10-04 Thread Steven Hartland
- Original Message - From: Steven Hartland kill...@multiplay.co.uk .. This seems very much like the following pr which was fixed:- Remove a bogusly introduced rtalloc_ign() in rev. 1.335/SVN 178029, generating an RTM_MISS for every IP packet forwarded making user space routing daemons

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-21 Thread Steven Hartland
- Original Message - From: Jamie Gritton ja...@freebsd.org In essence I think we can get the following flow where 1# = process1 and 2# = process2 1#1. prison1.pr_uref = 1 (single process jail) 1#2. prison_deref( prison1,... 1#3. prison1.pr_uref-- (prison1.pr_uref = 0) 1#3.

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-20 Thread Steven Hartland
- Original Message - From: Andriy Gapon a...@freebsd.org BTW, I suspect the following scenario, but I am not able to verify it either via testing or in the code: - last process in a dying jail exits - pr_uref of the jail reaches zero - pr_uref of prison0 gets decremented - you attach

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-20 Thread Steven Hartland
- Original Message - From: Andriy Gapon a...@freebsd.org BTW, I suspect the following scenario, but I am not able to verify it either via testing or in the code: - last process in a dying jail exits - pr_uref of the jail reaches zero - pr_uref of prison0 gets decremented - you attach

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-20 Thread Steven Hartland
- Original Message - From: Roger Marquis marq...@roble.com To: freebsd-j...@freebsd.org; freebsd-stable@FreeBSD.org Sent: Saturday, August 20, 2011 7:10 PM Subject: Re: debugging frequent kernel panics on 8.2-RELEASE Repeat this enough times and prison0.pr_uref reaches zero. To reach

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-20 Thread Steven Hartland
- Original Message - From: Andriy Gapon a...@freebsd.org thanks for doing this! I'll reiterate my suspicion just in case - I think that you should look for the cases where you stop a jail, but then re-attach and resurrect the jail before it's completely dead. Yer that's where I

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-20 Thread Steven Hartland
- Original Message - From: Steven Hartland kill...@multiplay.co.uk Looking through the code I believe I may have noticed a scenario which could trigger the problem. Given the following code:- static void prison_deref(struct prison *pr, int flags) { struct prison *ppr, *tpr; int

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-20 Thread Steven Hartland
- Original Message - From: Andriy Gapon a...@freebsd.org diff -u sys/kern/kern_jail.c.orig sys/kern/kern_jail.c --- sys/kern/kern_jail.c.orig 2011-08-20 21:17:14.856618854 +0100 +++ sys/kern/kern_jail.c2011-08-20 21:18:35.307201425 +0100 @@ -2455,7 +2455,8 @@

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-20 Thread Steven Hartland
- Original Message - From: Steven Hartland kill...@multiplay.co.uk Something else you many be more interested in Andriy:- I added in debugging options DDB INVARIANTS to see if I can get a more useful info and the panic results in a looping panic constantly scrolling up the console

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-20 Thread Steven Hartland
- Original Message - From: Andriy Gapon a...@freebsd.org on 20/08/2011 23:24 Steven Hartland said the following: - Original Message - From: Steven Hartland Looking through the code I believe I may have noticed a scenario which could trigger the problem. Given the following

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-18 Thread Steven Hartland
- Original Message - From: Andriy Gapon a...@freebsd.org Thats interesting, are you using http as an example or is that something thats been gleaned from the debugging of our output? I ask as there's only one process running in each of our jails and thats a single java process. It's

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-18 Thread Steven Hartland
- Original Message - From: Andriy Gapon a...@freebsd.org Probably I have mistakenly assumed that the 'prison' in prison_derefer() has something to do with an actual jail, while it could have been just prison0 where all non-jailed processes belong. That makes sense as this particular

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-17 Thread Steven Hartland
- Original Message - From: Andriy Gapon a...@freebsd.org To: Steven Hartland kill...@multiplay.co.uk Cc: freebsd-stable@FreeBSD.org Sent: Wednesday, August 17, 2011 12:12 PM Subject: Re: debugging frequent kernel panics on 8.2-RELEASE on 16/08/2011 23:43 Steven Hartland said

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-17 Thread Steven Hartland
- Original Message - From: Andriy Gapon a...@freebsd.org To: Steven Hartland kill...@multiplay.co.uk Cc: freebsd-stable@FreeBSD.org Sent: Wednesday, August 17, 2011 1:56 PM Subject: Re: debugging frequent kernel panics on 8.2-RELEASE on 17/08/2011 15:15 Steven Hartland said

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-17 Thread Steven Hartland
- Original Message - From: Andriy Gapon a...@freebsd.org Thanks to the debug that Steven provided and to the help that I received from Kostik, I think that now I understand the basic mechanics of this panic, but, unfortunately, not the details of its root cause. It seems like

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-16 Thread Steven Hartland
- Original Message - From: Andriy Gapon a...@freebsd.org To: Steven Hartland kill...@multiplay.co.uk Cc: freebsd-stable@FreeBSD.org Sent: Tuesday, August 16, 2011 9:30 PM Subject: Re: debugging frequent kernel panics on 8.2-RELEASE on 15/08/2011 17:56 Steven Hartland said

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-15 Thread Steven Hartland
- Original Message - From: Andriy Gapon a...@freebsd.org We have 352 thread entries starting with:- #0 sched_switch (td=0x8083e4e0, newtd=0xff0012d838c0, flags=Variable flags is not available. 23 with:- cpustop_handler () at atomic.h:285 and 16 with:- #0 fork_trampoline ()

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-15 Thread Steven Hartland
- Original Message - From: Andriy Gapon a...@freebsd.org on 15/08/2011 13:34 Steven Hartland said the following: (kgdb) list *0x8053b691 0x8053b691 is in vm_fault (/usr/src/sys/vm/vm_fault.c:239). 234 /* 235 * Find the backing store object

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-15 Thread Steven Hartland
- Original Message - From: Andriy Gapon a...@freebsd.org To: Steven Hartland kill...@multiplay.co.uk Cc: freebsd-stable@FreeBSD.org Sent: Monday, August 15, 2011 2:20 PM Subject: Re: debugging frequent kernel panics on 8.2-RELEASE on 15/08/2011 15:51 Steven Hartland said

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-15 Thread Steven Hartland
- Original Message - From: Andriy Gapon a...@freebsd.org To: Steven Hartland kill...@multiplay.co.uk Cc: freebsd-stable@FreeBSD.org Sent: Monday, August 15, 2011 4:36 PM Subject: Re: debugging frequent kernel panics on 8.2-RELEASE on 15/08/2011 17:56 Steven Hartland said

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-14 Thread Steven Hartland
- Original Message - From: Andriy Gapon a...@freebsd.org Maybe test it on couple of machines first just in case I overlooked something essential, although I have a report from another use that the patch didn't break anything for him (it was tested for an unrelated issue). We've got

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-14 Thread Steven Hartland
- Original Message - From: Attilio Rao atti...@freebsd.org Anyway, we really would need much more information in order to take a proactive action. Would it be possible to access to one of the panic'ing machine? Is it always the same panic which is happening or it is variadic (like:

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-11 Thread Steven Hartland
That's not the issue as its happening across board over 130 machines :( Regards Steve - Original Message - From: Attilio Rao atti...@freebsd.org I'd really point the finger to faulty hw. Please run all the necessary diagnostic tools for catching it. Attilio

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-11 Thread Steven Hartland
- Original Message - From: Jeremy Chadwick free...@jdc.parodius.com On Thu, Aug 11, 2011 at 09:59:36AM +0100, Steven Hartland wrote: That's not the issue as its happening across board over 130 machines :( Agreed, bad hardware sounds unlikely here. I could believe some strange

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-11 Thread Steven Hartland
- Original Message - From: Andriy Gapon a...@freebsd.org on 10/08/2011 18:35 Steven Hartland said the following: Fatal double fault ... #14 0x803a2cc9 in sched_switch (td=0x0, newtd=0x0, flags=Variable flags is not available. ) at /usr/src/sys/kern/sched_ule.c:1852

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-11 Thread Steven Hartland
- Original Message - From: Andriy Gapon a...@freebsd.org I would really appreciate if you could try to reproduce the problem with the patch that I sent earlier. Hi Andriy, what's the risk of this patch causing other issues? I ask as to get results from this we've going to have to

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-11 Thread Steven Hartland
- Original Message - From: Andriy Gapon a...@freebsd.org I would really appreciate if you could try to reproduce the problem with the patch that I sent earlier. Hi Andriy, what's the risk of this patch causing other issues? I can not estimate. The code is supposed to affect only

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-11 Thread Steven Hartland
- Original Message - From: Rick Macklem rmack...@uoguelph.ca Just a random thought that is probably not relevent, but... Is it possible that some change for the upgrade is making the machines run hotter and they're failing when they overhead? The machines have full HW monitoring and

debugging frequent kernel panics on 8.2-RELEASE

2011-08-10 Thread Steven Hartland
We're currently experiencing a large number of kernel panics on FreeBSD 8.2-RELEASE across a large number of machines here. The base stack reported is a double fault with no additional details and CTRL+ALT+ESC fails to break to the debugger as does and NMI, even though it at least tries printing

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-10 Thread Steven Hartland
- Original Message - From: Steven Hartland kill...@multiplay.co.uk To: freebsd-stable@freebsd.org Sent: Wednesday, August 10, 2011 3:22 PM Subject: debugging frequent kernel panics on 8.2-RELEASE We're currently experiencing a large number of kernel panics on FreeBSD 8.2-RELEASE

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-10 Thread Steven Hartland
- Original Message - From: Jeremy Chadwick free...@jdc.parodius.com On Wed, Aug 10, 2011 at 03:22:52PM +0100, Steven Hartland wrote: The base stack reported is a double fault with no additional details and CTRL+ALT+ESC fails to break to the debugger as does and NMI, even though

Re: debugging frequent kernel panics on 8.2-RELEASE

2011-08-10 Thread Steven Hartland
- Original Message - From: Jeremy Chadwick free...@jdc.parodius.com In combination with this, we use the following in /etc/rc.conf (the dumpdev line is important, else savecore won't pick up anything): dumpdev=auto I thought this was ment to be the default from back in the 6.x days

Re: arcmsr panic runnig 8.2 of 2011

2011-06-26 Thread Steven Hartland
- Original Message - From: Willem Jan Withagen w...@digiware.nl ... So I tried opgrading my firmware to 1.49, but to no avail. The system keeps panicing. So I guess that there is still a coding error somewhere in the driver. But I'm not into this enough to even know where to start

Re: arcmsr panic runnig 8.2 of 2011

2011-06-26 Thread Steven Hartland
- Original Message - From: Willem Jan Withagen w...@digiware.nl Well the main key to the problem is that on 2011/06/06 the new version from Areca got imported. So if you have all your boxes with kernels predating 06-06, you're not running the new code. It the above is true and if you

Re: PCIe SATA HBA for ZFS on -STABLE

2011-05-31 Thread Steven Hartland
Areca's work well. The ARC-1220 (8 ports) should do you, not the cheapest but good support and performance. Regards Steve - Original Message - From: Matt Thyer matt.th...@gmail.com To: sta...@freebsd.org Sent: Tuesday, May 31, 2011 1:48 PM Subject: PCIe SATA HBA for ZFS on

Re: ZFS I/O errors

2011-05-30 Thread Steven Hartland
- Original Message - From: Dan Nelson dnel...@allantgroup.com The zfs IO code overloads the EILSEQ error code and uses it as a checksum error code. Returning that error for the same block on all disks is definitely weird. Could you have run a partitioning tool, or some other program

Re: mountlate not late enough for nfe0 with dhcp

2011-05-26 Thread Steven Hartland
- Original Message - From: Clifton Royston clift...@volcano.org This has been discussed at length in the past, causing me to write an rc.d script to work around the problem. You can drop this script into /usr/local/etc/rc.d, chmod 755 it, and make use of it appropriately. The comments

Re: background fsck high load on 8.1

2011-04-12 Thread Steven Hartland
The cpu requirements are usually quite low for fsck, what your most likely seeing is disk contention due to the amount of IO. Personally I would recommend to consider moving to 8.2 + ZFS as our filing system as it removes fsck from the equation, as well as giving lots of other benefits.

Re: Kernel memory leak in 8.2-PRERELEASE?

2011-04-05 Thread Steven Hartland
- Original Message - From: Pete French petefre...@ingresso.co.uk This is why I got rid of it - my application is a lot of CGI scripts. The overload condition is that we run out of memory - and we run *way* out of memory its never just a little overflow, it;s either handleable or

Re: deadlock or bad disk ? RELENG_8

2011-03-25 Thread Steven Hartland
- Original Message - From: Jeremy Chadwick free...@jdc.parodius.com Was there any conclusion from this guys, was there a bad disk causing the issue? Regards Steve This e.mail is private and confidential between Multiplay (UK) Ltd.

Re: deadlock or bad disk ? RELENG_8

2011-03-25 Thread Steven Hartland
- Original Message - From: Mike Tancsa m...@sentex.net I would say probably the disk mostly. Perhaps a driver or firmware bug on the Areca. Hard to say. The drive totally failed a month or so later. Also, moved to a later firmware on the areaca controller after that and all has

Re: deadlock or bad disk ? RELENG_8

2011-03-25 Thread Steven Hartland
- Original Message - From: Jeremy Chadwick free...@jdc.parodius.com I apologise in advance if I have already reviewed your situation, but if you could please provide full smartctl -a output for the disk, I can review the data to see if anything looks out of place. An example: on some

Re: deadlock or bad disk ? RELENG_8

2011-03-25 Thread Steven Hartland
- Original Message - From: Jeremy Chadwick free...@jdc.parodius.com Bummer. Competitor's drivers make use of pass(4) and/or xpt(4), the result being that you can see (and talk to directly) all the disks which are on the RAID card. No need for a CLI utility getting in the way, etc..

Re: em0 with latest driver hangs again and again (without Watchdogtimeout message!)

2011-03-04 Thread Steven Hartland
Silly question but have you checked your ram for issues, we had a machine with seemingly unexplained problems and hangs and it turned out to be a duff stick of ram which wasn't being chip killed. - Original Message - From: Lev Serebryakov l...@serebryakov.spb.ru To: Brandon Gooch

Re: Freebsd-update and release candidates

2011-02-22 Thread Steven Hartland
However, I'm a wee bit curious of whether I will be able to upgrade from 8.2RC3 or if I should wait until 8.2 is actually released with the setup (I _CAN_ wait a week or two). Looks like its already been tagged so should be any time now:- /usr/src/UPDATING:- ... 20110221: 8.2-RELEASE.

Re: machdep.hlt_cpus not safe with ULE?

2011-02-21 Thread Steven Hartland
- Original Message - From: Garrett Cooper gcoo...@freebsd.org As a followup to this and based on discussions with other folks, the fact that it's using hlt to halt CPUs without rescheduling tasks / masking interrupts, etc is not good. So none of the *hlt* sysctls are really doing the

machdep.hlt_cpus not safe with ULE?

2011-02-19 Thread Steven Hartland
I'm trying to debug a possibly failing CPU, so I thought it would be easy just disable the cores using machdep.hlt_cpus and see if we see the panic's we've been seeing. The problem is it seems ULE doesnt properly support machdep.hlt_cpus and still schedules processes onto the halted cpus which

bge0 watchdog timeout -- resetting on 8.2-PREREL never recovers

2011-02-19 Thread Steven Hartland
Just updated a box to the 8.2-PREREL as of friday and now when we do any serious amounts of network traffice we see:- bge0: watchdog timeout -- resetting bge0: link state changed to DOWN bge0: link state changed to UP The interface never recovers, we have to use remote console to down, wait 30

Re: machdep.hlt_cpus not safe with ULE?

2011-02-19 Thread Steven Hartland
- Original Message - From: Gary Jennejohn gljennj...@googlemail.com Looking at the kernel source it appears that only sched_4bsd.c makes use of hlt_cpus_mask. Given ULE is default do these need to be either removed totally or at least conditionally based on the scheduler choice as

Re: bge0 watchdog timeout -- resetting on 8.2-PREREL never recovers

2011-02-19 Thread Steven Hartland
This may be totally unrelated to bge, investigating a potential failing stick of ram in the machine in question so until we've ruled this out as the cause don't want to waste anyone's time. I did however notice the logic between the two fixes for DMA on 5704's on PCIX in svn differ so wondering

Re: machdep.hlt_cpus not safe with ULE?

2011-02-19 Thread Steven Hartland
For reference I've found that an alternative is to set the following in loader.conf:- hint.lapic.2.disabled=1 hint.lapic.3.disabled=1 2 and 3 here are the apic numbers displayed by dmesg on boot for the cpu's Obviously this requires a reboot so no perfect for all uses but it does work for what

<    1   2   3   4   5   6   >