Re: HP DL 585 / ACPI ID / ECC Memory / Panic
Hi, On 2016-05-12 21:03, Steven Hartland wrote: I wouldn't rule out a bad cpu as we had a very similar issue and that's what it was. Quick way to confirm is to move all the dram from the disabled CPU to one of the other CPUs and see if the issue stays away with the current CPU still disabled. One core is still running seemingly without problems it is only one core I disabled not the entire cpu. APIC 1 and 2 I believe are on the same chip. I am not a super CPU design expert, but if the two cores are on the same cpu chip do they not share the same memory bus with this model of the AMD cpu? If that's the case it's likely the on chip memory controller has developed a fault Or you could just move around two cpu cards and se if the error jumps from apic 1+2(err) to apic 3+4(err). If these are issued in order by FreeBSD? Or is the ordering random? I suppose I could move all of the boards one step to the right and test it that way regardless. If it does it is probably a DIMM or, as you say, the memory bus if not it is probably the cpuboard slot on the mainboard itself. I will try this and post my findings. Offtopic: I cannot belive how poor the onboard bios diagnostics are on this server compared to my old IBM netfinity 5000. rgrds Nikolaj Hansen smime.p7s Description: S/MIME Cryptographic Signature
Re: HP DL 585 / ACPI ID / ECC Memory / Panic
> Am 12.05.2016 um 21:03 schrieb Steven Hartland: > > I wouldn't rule out a bad cpu as we had a very similar issue and that's > what it was. >> IIRC, the AMD-servers of HP had numerous problems for the first few generations. Some worked well (I think we have a handful of 385 G1/G2/G5 still running), but other would just hang or crash from time to time. May boss was never too keen on them anyway, so we never had that many to begin with. Plus, HP servers had and have a way of popping when you remove the power from a long-running one (that’s probably servers in general). Most times, it’s only the PSU or a disk, but we’ve also fried NICs by simply powering the damn thing off… ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: HP DL 585 / ACPI ID / ECC Memory / Panic
I wouldn't rule out a bad cpu as we had a very similar issue and that's what it was. Quick way to confirm is to move all the dram from the disabled CPU to one of the other CPUs and see if the issue stays away with the current CPU still disabled. If that's the case it's likely the on chip memory controller has developed a fault On Thursday, 12 May 2016, Nikolaj Hansenwrote: > Hi, > > I recently added a zfs disk array to my old HP 585 G1 Server. > Immediately there was kernel panics and I have spent quite a bit of time > figuring out what was really wrong. > > The system has 4 cpu cards with opteron double core processors. Each > card has 4x2 gigabyte memory 4x2x4 = 32 gigabyte of total system mem. > The memory is DDR400 ECC mem. > > The panic was very easily reproducable. I just had to issue enough reads > to the system up until the faulty mem was accessed. > > Strangely I can run memtest86+ with the DDR setting on and I find no > error what so ever. > > Adding > > hint.lapic.2.disabled=1 > /boot/loader.conf > > Immediately mitigates the error for FreeBSD. So here is my conclusion: > > If you can make the system stable by disabling one core on one cpu card: > > 1) The other cards / mem must be ok. > 2) The mainboard must be ok since one of the cores on the cpu is still > running / not barfing panics. > 3) the cpu core with acpi 2 is probably also ok. it is on the same chip > as a non disabled core. > 4) It is likely down to a rotten DIMM. > > In place of mindlessly trying to find the culprit by switching dimms I > would really like to identify the CPU, card and mem module from the os. > > Info here: > > http://pastebin.com/jqufNKck > > Thank you for your time and help. > > -- > > > Med venlig hilsen / with regards > > Nikolaj Hansen > > > > > > ___ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"