On 09/22/2017 12:08 PM, David Gibson wrote: > On Thu, Sep 21, 2017 at 08:04:55AM +0200, Cédric Le Goater wrote: >> On 09/21/2017 05:54 AM, Nikunj A Dadhania wrote: >>> David Gibson <da...@gibson.dropbear.id.au> writes: >>> >>>> On Wed, Sep 20, 2017 at 12:48:55PM +0530, Nikunj A Dadhania wrote: >>>>> David Gibson <da...@gibson.dropbear.id.au> writes: >>>>> >>>>>> On Wed, Sep 20, 2017 at 12:10:48PM +0530, Nikunj A Dadhania wrote: >>>>>>> David Gibson <da...@gibson.dropbear.id.au> writes: >>>>>>> >>>>>>>> On Wed, Sep 20, 2017 at 10:43:19AM +0530, Nikunj A Dadhania wrote: >>>>>>>>> David Gibson <da...@gibson.dropbear.id.au> writes: >>>>>>>>> >>>>>>>>>> On Wed, Sep 20, 2017 at 09:50:24AM +0530, Nikunj A Dadhania wrote: >>>>>>>>>>> David Gibson <da...@gibson.dropbear.id.au> writes: >>>>>>>>>>> >>>>>>>>>>>> On Fri, Sep 15, 2017 at 02:39:16PM +0530, Nikunj A Dadhania wrote: >>>>>>>>>>>>> David Gibson <da...@gibson.dropbear.id.au> writes: >>>>>>>>>>>>> >>>>>>>>>>>>>> On Fri, Sep 15, 2017 at 01:53:15PM +0530, Nikunj A Dadhania >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> David Gibson <da...@gibson.dropbear.id.au> writes: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I thought, I am doing the same here for PowerNV, number of >>>>>>>>>>>>>>>>> online cores >>>>>>>>>>>>>>>>> is equal to initial online vcpus / threads per core >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> int boot_cores_nr = smp_cpus / smp_threads; >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Only difference that I see in PowerNV is that we have >>>>>>>>>>>>>>>>> multiple chips >>>>>>>>>>>>>>>>> (max 2, at the moment) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> cores_per_chip = smp_cpus / (smp_threads * >>>>>>>>>>>>>>>>> pnv->num_chips); >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> This doesn't make sense to me. Cores per chip should *always* >>>>>>>>>>>>>>>> equal >>>>>>>>>>>>>>>> smp_cores, you shouldn't need another calculation for it. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> And in case user has provided sane smp_cores, we use it. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> If smp_cores isn't sane, you should simply reject it, not try >>>>>>>>>>>>>>>> to fix >>>>>>>>>>>>>>>> it. That's just asking for confusion. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This is the case where the user does not provide a >>>>>>>>>>>>>>> topology(which is a >>>>>>>>>>>>>>> valid scenario), not sure we should reject it. So qemu defaults >>>>>>>>>>>>>>> smp_cores/smt_threads to 1. I think it makes sense to over-ride. >>>>>>>>>>>>>> >>>>>>>>>>>>>> If you can find a way to override it by altering smp_cores when >>>>>>>>>>>>>> it's >>>>>>>>>>>>>> not explicitly specified, then ok. >>>>>>>>>>>>> >>>>>>>>>>>>> Should I change the global smp_cores here as well ? >>>>>>>>>>>> >>>>>>>>>>>> I'm pretty uneasy with that option. >>>>>>>>>>> >>>>>>>>>>> Me too. >>>>>>>>>>> >>>>>>>>>>>> It would take a fair bit of checking to ensure that changing >>>>>>>>>>>> smp_cores >>>>>>>>>>>> is safe here. An easier to verify option would be to make the >>>>>>>>>>>> generic >>>>>>>>>>>> logic which splits up an unspecified -smp N into cores and sockets >>>>>>>>>>>> more flexible, possibly based on machine options for max values. >>>>>>>>>>>> >>>>>>>>>>>> That might still be more trouble than its worth. >>>>>>>>>>> >>>>>>>>>>> I think the current approach is the simplest and less intrusive, as >>>>>>>>>>> we >>>>>>>>>>> are handling a case where user has not bothered to provide a >>>>>>>>>>> detailed >>>>>>>>>>> topology, the best we can do is create single threaded cores equal >>>>>>>>>>> to >>>>>>>>>>> number of cores. >>>>>>>>>> >>>>>>>>>> No, sorry. Having smp_cores not correspond to the number of cores >>>>>>>>>> per >>>>>>>>>> chip in all cases is just not ok. Add an error message if the >>>>>>>>>> topology isn't workable for powernv by all means. But users having >>>>>>>>>> to >>>>>>>>>> use a longer command line is better than breaking basic assumptions >>>>>>>>>> about what numbers reflect what topology. >>>>>>>>> >>>>>>>>> Sorry to ask again, as I am still not convinced, we do similar >>>>>>>>> adjustment in spapr where the user did not provide the number of >>>>>>>>> cores, >>>>>>>>> but qemu assumes them as single threaded cores and created >>>>>>>>> cores(boot_cores_nr) that were not same as smp_cores ? >>>>>>>> >>>>>>>> What? boot_cores_nr has absolutely nothing to do with adjusting the >>>>>>>> topology, and it certainly doesn't assume they're single threaded. >>>>>>> >>>>>>> When we start a TCG guest and user provides following commandline, e.g. >>>>>>> "-smp 4", smt_threads is set to 1 by default in vl.c. So the guest boots >>>>>>> with 4 cores, each having 1 thread. >>>>>> >>>>>> Ok.. and what's the problem with that behaviour on powernv? >>>>> >>>>> As smp_thread defaults to 1 in vl.c, similarly smp_cores also has the >>>>> default value of 1 in vl.c. In powernv, we were setting nr-cores like >>>>> this: >>>>> >>>>> object_property_set_int(chip, smp_cores, "nr-cores", >>>>> &error_fatal); >>>>> >>>>> Even when there were multiple cpus (-smp 4), when the guest boots up, we >>>>> just get one core (i.e. smp_cores was 1) with single thread(smp_threads >>>>> was 1), which is wrong as per the command-line that was provided. >>>> >>>> Right, so, -smp 4 defaults to 4 sockets, each with 1 core of 1 >>>> thread. If you can't supply 4 sockets you should error, but you >>>> shouldn't go and change the number of cores per socket. >>> >>> OK, that makes sense now. And I do see that smp_cpus is 4 in the above >>> case. Now looking more into it, i see that powernv has something called >>> "num_chips", isnt this same as sockets ? Do we need num_chips separately? >> >> yes that would do for cpus, but how do we retrieve the number of >> sockets ? I don't see a smp_sockets. > > # sockets = smp_cpus / smp_threads / smp_cores > > Or, if you want the maximum possible number of sockets (for a fully > populated system) > # sockets = max_cpus / smp_threads / smp_cores ok. that would do for a default setting. Sorry for the noise.
>> If we start looking at such issues, we should also take into account >> memory distribution : >> >> -numa node[,mem=size][,cpus=firstcpu[-lastcpu]][,nodeid=node] >> >> would allow us to define a set of cpus per node, cpus should be evenly >> distributed on the nodes though, and also define memory per node, but >> some nodes could be without memory. > > I don't really see what that has to do with anything. We already have > ways to assign memory or cpus to specific nodes if we want. We will see when the needs come if numa options fit the requirements. C.