Hi Simon, On Sat, Aug 8, 2015 at 3:09 AM, Simon Glass <[email protected]> wrote: > Hi Bin, > > On 5 August 2015 at 02:43, Bin Meng <[email protected]> wrote: >> Hi Simon, Tom, >> >> On Tue, Aug 4, 2015 at 3:27 AM, Simon Glass <[email protected]> wrote: >>> Hi Tom, >>> >>> On 3 August 2015 at 13:06, Tom Rini <[email protected]> wrote: >>>> On Mon, Aug 03, 2015 at 12:52:19PM -0600, Simon Glass wrote: >>>>> Hi Tom, >>>>> >>>>> On 31 July 2015 at 08:31, Tom Rini <[email protected]> wrote: >>>>> > On Thu, Jul 30, 2015 at 12:12:03PM +0800, Bin Meng wrote: >>>>> > >>>>> >> Hi Simon, >>>>> >> >>>>> >> When adding x86 multi-cpu initialization on a board with 4 cores, I >>>>> >> found: >>>>> >> >>>>> >> => cpu list >>>>> >> 0: cpu@0 Genuine Intel(R) CPU @ 1.58GHz >>>>> >> 1: cpu@1 Genuine Intel(R) CPU @ 1.58GHz >>>>> >> 2: cpu@2 Genuine Intel(R) CPU @ 1.58GHz >>>>> >> 2: cpu@3 Genuine Intel(R) CPU @ 1.58GHz >>>>> >> >>>>> >> cpu@2 and cpu@3 have the same sequence number, which indicates they >>>>> >> are running parallelly to get the same sequence number. The call chain >>>>> >> on an ap is: mp_init_cpu() -> device_probe() -> uclass_resolve_seq(). >>>>> >> Apparently ap2 and ap3 are running at the same time to get the same >>>>> >> number. >>>>> >> >>>>> >> Note so far all x86 boards that we have enabled x86 multi-cpu >>>>> >> initialization on only have 2 cores, which will not expose such issue >>>>> >> as there is no parallel execution among aps. >>>>> > >>>>> > So what exactly are we doing with these additional cores? My >>>>> > recollection of what we do on other arches when we even deal with other >>>>> > cores is that we bring them "up" and then usually put them in a holding >>>>> > pattern for the real OS to deal with _or_ it's one of those cases where >>>>> > we have multiple OSes running and we do what we need to load and release >>>>> > those other OSes. >>>>> >>>>> In this case they end up at stop_this_cpu() which is just a hlt >>>>> instruction in each case. >>>> >>>> So do we really have to be doing anything here? Or is this just >>>> pre-emptive work for an async MP type setup down the road? We could >>>> probably live with this with a big comment noting why we know it's >>>> misbehaving. >>> >>> I think we should fix it - I suggested some options above and Bin may >>> have ideas also. Bin may be able to send a patch since he can repeat >>> the problem. >>> >> >> Yes we should fix it. But IMHO, just fixing the seq number only >> resolves the surface problem. What concerns me is that multiple cpu >> running the same piece of codes (in this case, the DM core codes) >> without any protection. I have no idea whether these core structures >> (like the device list) still look good from the DM core perspective. >> Although right now it seems that it only exposes the seq number issue, >> we don't know if there are other potential DM issues. Thus I was >> thinking fundamentally we are using DM CPU uclass in a wrong way. > > We don't add devices when running on the AP CPUs - we only scan lists. > So long as the boot CPU creates all the devices and then waits for > them to populate, we are OK. I don't see any fundamental problem. >
OK, that makes me feel better, if we only need to resolve the seq number issue. I will submit a patch for that. Regards, Bin _______________________________________________ U-Boot mailing list [email protected] http://lists.denx.de/mailman/listinfo/u-boot

