On Fri, Apr 30, 2021 at 04:59:36PM +0800, wangyanan (Y) wrote: > > On 2021/4/30 14:41, Andrew Jones wrote: > > On Fri, Apr 30, 2021 at 01:09:00PM +0800, wangyanan (Y) wrote: > > > Hi Drew, > > > > > > On 2021/4/29 19:02, Andrew Jones wrote: > > > > On Thu, Apr 29, 2021 at 04:56:06PM +0800, wangyanan (Y) wrote: > > > > > On 2021/4/29 15:16, Andrew Jones wrote: > > > > > > On Thu, Apr 29, 2021 at 10:14:37AM +0800, wangyanan (Y) wrote: > > > > > > > On 2021/4/28 18:31, Andrew Jones wrote: > > > > > > > > On Tue, Apr 13, 2021 at 04:31:45PM +0800, Yanan Wang wrote: > > > > > > > > > } else if (sockets == 0) { > > > > > > > > > threads = threads > 0 ? threads : 1; > > > > > > > > > - sockets = cpus / (cores * threads); > > > > > > > > > + sockets = cpus / (clusters * cores * threads); > > > > > > > > > sockets = sockets > 0 ? sockets : 1; > > > > > > > > If we initialize clusters to zero instead of one and add lines > > > > > > > > in > > > > > > > > 'cpus == 0 || cores == 0' and 'sockets == 0' like > > > > > > > > 'clusters = clusters > 0 ? clusters : 1' as needed, then I > > > > > > > > think we can > > > > > > > > add > > > > > > > > > > > > > > > > } else if (clusters == 0) { > > > > > > > > threads = threads > 0 ? threads : 1; > > > > > > > > clusters = cpus / (sockets * cores * thread); > > > > > > > > clusters = clusters > 0 ? clusters : 1; > > > > > > > > } > > > > > > > > > > > > > > > > here. > > > > > > > I have thought about this kind of format before, but there is a > > > > > > > little bit > > > > > > > difference between these two ways. Let's chose the better and more > > > > > > > reasonable one of the two. > > > > > > > > > > > > > > Way A currently in this patch: > > > > > > > If value of clusters is not explicitly specified in -smp command > > > > > > > line, we > > > > > > > assume > > > > > > > that users don't want to support clusters, for compatibility we > > > > > > > initialized > > > > > > > the > > > > > > > value to 1. So that with cmdline "-smp cpus=24, sockets=2, > > > > > > > cores=6", we will > > > > > > > parse out the topology description like below: > > > > > > > cpus=24, sockets=2, clusters=1, cores=6, threads=2 > > > > > > > > > > > > > > Way B that you suggested for this patch: > > > > > > > Whether value of clusters is explicitly specified in -smp command > > > > > > > line or > > > > > > > not, > > > > > > > we assume that clusters are supported and calculate the value. So > > > > > > > that with > > > > > > > cmdline "-smp cpus=24, sockets=2, cores=6", we will parse out the > > > > > > > topology > > > > > > > description like below: > > > > > > > cpus =24, sockets=2, clusters=2, cores=6, threads=1 > > > > > > > > > > > > > > But I think maybe we should not assume too much about what users > > > > > > > think > > > > > > > through the -smp command line. We should just assume that all > > > > > > > levels of > > > > > > > cpu topology are supported and calculate them, and users should > > > > > > > be more > > > > > > > careful if they want to get the expected results with not so > > > > > > > complete > > > > > > > cmdline. > > > > > > > If I'm right, then Way B should be better. :) > > > > > > > > > > > > > Hi Yanan, > > > > > > > > > > > > We're already assuming the user wants to describe clusters to the > > > > > > guest > > > > > > because we require at least one per socket. If we want the user to > > > > > > have a > > > > > > choice between using clusters or not, then I guess we need to > > > > > > change the > > > > > > logic in the PPTT and the cpu-map to only generate the cluster > > > > > > level when > > > > > > the number of clusters is not zero. And then change this parser to > > > > > > not > > > > > > require clusters at all. > > > > > Hi Drew, > > > > > > > > > > I think this kind of change will introduce more complexity and > > > > > actually is > > > > > not necessary. > > > > > Not generating cluster level at all and generating cluster level (one > > > > > per > > > > > socket) are same > > > > > to kernel. Without cluster description provided, kernel will > > > > > initialize all > > > > > cores in the same > > > > > cluster which also means one cluster per socket. > > > > Which kernel? All kernels? One without the cluster support patches [1]? > > > > > > > > [1] > > > > https://lore.kernel.org/lkml/20210420001844.9116-1-song.bao....@hisilicon.com/#t > > > I'm sorry, I didn't make it clear. :) > > > I actually mean the ARM64 kernel, with or without [1]. > > > > > > Without [1]: Kernel doesn't care about cluster. When populating cpu > > > topology, it directly > > > finds the hierarchy node with "physical package flag" as package when > > > parsing ACPI, and > > > finds the core node's parent as package when parsing DT. So even we > > > provide > > > cluster level > > > description (one per socket), the parsing results will be the same as not > > > providing at all. > > > > > > With [1]: Kernel finds the core hierarchy node's parent as cluster when > > > parsing ACPI. So if > > > we don't provide cluster level description, package will be taken as > > > cluster. And if we provide > > > the description (one per socket), the parsing result will also be the > > > same. > > > > > > That's why I said that we just need to provide description of cluster (one > > > per socket) if we > > > don't want to make use of it in VMs. > > OK, that sounds good. > > > > > [1] > > > https://lore.kernel.org/lkml/20210420001844.9116-1-song.bao....@hisilicon.com/#t > > > > > So we should only ensure value of clusters per socket is one if we > > > > > don't > > > > > want to use clusters, > > > > > and don't need to care about whether or not to generate description > > > > > in PPTT > > > > > and cpu-map. > > > > > Is this right? > > > > Depends on your answer to my 'which kernel' questions. > > > > > > > > > > I'm not a big fan of these auto-calculated values either, but the > > > > > > documentation says that it'll do that, and it's been done that way > > > > > > forever, so I think we're stuck with it for the -smp option. Hmm, I > > > > > > was > > > > > > just about to say that x86 computes all its values, but I see the > > > > > > most > > > > > > recently added one, 'dies', is implemented the way you're proposing > > > > > > we > > > > > > implement 'clusters', i.e. default to one and don't calculate it > > > > > > when it's > > > > > > missing. I actually consider that either a documentation bug or an > > > > > > smp > > > > > > parsing bug, though. > > > > > My propose originally came from implementation of x86. > > > > > > Another possible option, for Arm, because only the cpus and maxcpus > > > > > > parameters of -smp have ever worked, is to document, for Arm, that > > > > > > if even > > > > > > one parameter other than cpus or maxcpus is provided, then all > > > > > > parameters > > > > > > must be provided. We can still decide if clusters=0 is valid, but > > > > > > we'll > > > > > > enforce that everything is explicit and that the product (with or > > > > > > without > > > > > > clusters) matches maxcpus. > > > > > Requiring every parameter explicitly will be most stable but indeed > > > > > strict. > > > > > > > > > > Currently all the parsers use way B to calculate value of thread if > > > > > it is > > > > > not provided explicitly. > > > > > So users should ensure the -smp cmdline they provided can result in > > > > > that > > > > > parsed threads will > > > > > be 1 if they don't want to support multiple threads in one core. > > > > > > > > > > Very similar to thread, users should also ensure the provided cmdline > > > > > can > > > > > result in that parsed > > > > > clusters will be 1 if they don't want to support multiple clusters in > > > > > one > > > > > socket. > > > > > > > > > > So I'm wondering if we can just add some commit in the documentation > > > > > to tell > > > > > users that they > > > > > should ensure this if they don't want support it. And as for > > > > > calculation of > > > > > clusters, we follow > > > > > the logic of other parameters as you suggested in way B. > > > > > > > > > > Thanks, > > > > > Yanan > > > > > > Requiring every parameter might be stricter than necessary, though, > > > > > > I > > > > > > think we're mostly concerned with cpus/maxcpus, sockets, and cores. > > > > > > clusters can default to one or zero (whatever we choose and > > > > > > document), > > > > > > threads can default to one, and cpus can default to maxcpus or > > > > > > maxcpus can > > > > > > default to cpus, but at least one of those must be provided. And, if > > > > > > sockets are provided, then cores must be provided and vice versa. If > > > > > > neither sockets nor cores are provided, then nothing else besides > > > > > > cpus and > > > > > > maxcpus may be provided, and that would mean to not generate any > > > > > > topology > > > > > > descriptions for the guest. > > > > I still don't know. I think I prefer making -smp more strict (even for > > > > other architectures, but that's more difficult to do than for Arm.) > > > > What I > > > > wrote above isn't that bad. We only require one of cpus or maxcpus > > > > (pretty > > > > much everybody already does that anyway), and then, if given sockets > > > > or cores, the other will also be required. I assume anybody who bothers > > > > to > > > > specify one or the other would already specify both anyway. > > > I agree to make -smp more strict. We want to expose the cpu topology > > > information > > > to guest kernel, so that it can take advantage of the information for > > > better > > > scheduling. > > > From this point of view, we hope the topology descriptions are accurately > > > specified > > > but not automatically populated. > > > > > > But I think the requirement for ARM "if even one parameter other than cpus > > > or maxcpus > > > is provided then all parameters must be provided" will be better. This can > > > ensure the > > > whole accurate users-specified topology. As you mentioned, if anybody who > > > bothers > > > to specify one, why not also specify the others. > > > > > > I can add the requirement for ARM in the documentation, and also check the > > > parameters > > > in virt_smp_parse. Will this be fine? > > We sort of have to support command lines that are missing 'maxcpus' and > > 'clusters', unless we work together with libvirt to make the change. > > Currently libvirt will generate '-smp 16,sockets=16,cores=1,threads=1' > > from '<vcpu placement='static'>16</vcpu>'. > I see. And libvirt currently doesn't support cluster in xml, which means > we can not generate complete cmdlines with cluster in it through > <topology ...> specification in xml. > > That's sufficient for our > > stricter, but not completely strict requirements. And, I still think > > 'threads' could be optional, because there's a good chance the user > > doesn't want to describe them, so a default of 1 is good enough. > So the parsing logic can be repeated like this: > We require at least one of cpus and maxcpus specified explicitly, and > default > cluster/thread to 1 if not explicitly specified. And require both of sockets > and > cores if one of them is specified. > > This is consistent with what you mentioned yesterday. >
Yup, I think this should work for both compatibility concerns, concerns with bad assumptions, and with supporting users which would like more terse command lines. Thanks, drew