Hello Gerry, About your former post :
> The patch is still based on the assumption that memory node with > bigger node id will have higher memory address with it. That assumption is > true for most current platforms, but things change fast and that assumption > may become broken with future platform. This patch proposal address a problem for a given set of server. It doesn't address to a RFE for future evolution of ACPI and/or future NUMA architecture. nevada is broken since build 88 on these platform. s10 is broken since u6 and will stay broken until u8 (at least). I do think it's worth fixing this particular problem, even if we know a problem could arise on future platform or acpi specs. But the later could be addressed by a separate RFE. It's less urgent since these cases do not exist yet. > 1) According to ACPI spec, there's no guarantee that domain ids > will be continuous starting from 0. On a NUMA platform with unpolulated > socket, there may be domains existing in SLIT/SRAT but disabled/unused. I think this situation is already addressed by current code ("exists" property of different objects). > According to my understanding, Gavin's patch should fixed on design defect > in x86 lgrp implemention. The fix referred by Gavin in this thread doesn't work. According to Kit Chow in a discussion we had by email with Jonathan Chew : >>> mnode 0 contains a higher physical address range than mnode 1. This >>> breaks various assumptions made by software that deal with physical >>> memory. Very likely the reason for the panic... >>> >>> Jonathan, is this ordering problem caused by what you had previously >>> described to me (something like srat index info starting at 1 instead >>> of 0 and you grabbed the info from index 2 first because 2%2 = 0)? >> >> Yes. If possible, I want to confirm what you suspect and make sure >> that we really find the root cause because there seems to be a bunch >> of issues associated with 6745357 and none of them seem to have been >> root caused (or at least they aren't documented very well). >> >> Is there some way to tell based on where the kernel died and examining >> the relevant data structures to determine what's going on and pinpoint >> the root cause? >> >mem_node_config and mnoderanges referenced below has a range of >0x80000-f57f5 in slot 0. This is bad and needs to be addressed first and >foremost even if there could be other issues. The one assertion that I >saw about the calculation of a pfn not matching its mnode is very very >likely because of the ordering problem. > >Kit which lead me to think that changing the code to support situations where mnodes are not is ascending order should be addressed in a separate RFE. Thank you Best regards Guy -- This message posted from opensolaris.org _______________________________________________ opensolaris-code mailing list opensolaris-code@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/opensolaris-code