Hi folks,

as we know, the KGPE-D16 is likely to hang during PCI init, especially if the 
serial console is enabled (Timothy mentioned that he did not observe failures 
with the debug level of the console lowered - however, for me this did not 
work). Typical symptoms look like the following:

ERROR: PNP: 002e.b 70 irq size: 0x0000000001 not assigned

After discussing the issue with Timothy and doing a (huge) number of 
experiments in different settings I am of the opinion that the issue *does* 
seem to be memory/clock related. I tried various memory configurations and 
found some interesting correlations between memory configuration and the rate 
of failures. Also, I found another trigger which makes the hang *much* more 
likely.

For testing, I used the current coreboot master with the proposed revert which 
made the MC4 errors go away. After some experimenting, I found two settings 
which made the PCI-hangs go away:

1. Setting minimum memory voltage to 1.5V (probably unrelated)
2. Setting maximum memory clock down to 400 MHz (DDR3-800 instead of DDR3-1600)

With this setting, the number of PCI hangs went considerably down on a number 
of different configurations (all using two Opteron 6276 CPUs). The numbers are 
(#hangs / #boot attempts):

1xCK0 (in slot A2): warm (0/8), cold (0/3) => no failures!
1xYK0 (in slot A2): warm (0/6), cold (0/3) => no failures!
8xYK0 (in all orange slots): warm (1/8), cold (1/5) => rare
16xYK0 (in all slots): warm (3/8), cold (1/5) => more likely

Specs of the memory modules (all from Samsung):

CK0: 8GB DDR3-1600, 1.5V
YK0: 8GB DDR3L-1600, 1.35V

However, there is one more and quite severe issue: If I set the following 
option, I am get a hang in like 70% of the boot attempts:

Chipset => Enable PCI-E MMCONFIG Support

I wouldn't care too much about leaving this disabled. Unfortunately, I am 
getting a "force reboot" of the Kernel each time it tries to initialize the 
nouveau driver. Not sure if this is really related to this option or the fact 
that I'm running the RDIMMs at such low settings, but we'll have to find out.

Cheers, Daniel

-- 
coreboot mailing list: coreboot@coreboot.org
https://www.coreboot.org/mailman/listinfo/coreboot

Reply via email to