On Wed, 15 Sep 2021 22:32:13 -0300 Daniel Henrique Barboza <danielhb...@gmail.com> wrote:
> Greg, > > > On 9/14/21 16:58, Daniel Henrique Barboza wrote: > > > > > > On 9/14/21 08:55, Greg Kurz wrote: > >> On Fri, 10 Sep 2021 16:55:36 -0300 > >> Daniel Henrique Barboza <danielhb...@gmail.com> wrote: > >> > > [...] > > > >>> } > >>> @@ -167,6 +167,11 @@ static void > >>> spapr_numa_FORM1_affinity_init(SpaprMachineState *spapr, > >>> int nb_numa_nodes = machine->numa_state->num_nodes; > >>> int i, j, max_nodes_with_gpus; > >>> + /* init FORM1_assoc_array */ > >>> + for (i = 0; i < MAX_NODES + NVGPU_MAX_NUM; i++) { > >>> + spapr->FORM1_assoc_array[i] = g_new0(uint32_t, NUMA_ASSOC_SIZE); > >> > >> Why dynamic allocation since you have fixed size ? > > > > If I use static allocation the C compiler complains that I can't assign a > > uint32_t** pointer to a uint32_t[MAX_NODES + NVGPU_MAX_NUM][NUMA_ASSOC_SIZE] > > array type. > > > > And given that the FORM2 array is a [MAX_NODES + NVGPU_MAX_NUM][2] array, > > the > > way I worked around that here is to use dynamic allocation. Then C > > considers valid > > to use numa_assoc_array as an uint32_t** pointer for both FORM1 and FORM2 > > 2D arrays. I'm fairly certain that there might be a way of doing static > > allocation > > and keeping the uint32_t** pointer as is, but didn't find any. Tips welcome > > :D > > > > An alternative that I considered, without the need for this dynamic > > allocation hack, > > is to make both FORM1 and FORM2 data structures the same size (i.e. > > an [MAX_NODES + NVGPU_MAX_NUM][NUMA_ASSOC_SIZE] uint32_t array) and then > > numa_assoc_array > > can be a pointer of the same array type for both. Since we're controlling > > FORM1 and > > FORM2 sizes separately inside the functions this would work. The downside > > is that > > FORM2 2D array would be bigger than necessary. > > I took a look and didn't find any neat way of doing a pointer switch > between 2 static arrays without either allocating dynamic mem on the > pointer and then g_memdup'ing it, or make all arrays the same size > (i.e. numa_assoc_array would also be a statically allocated array > with FORM1 size) and then memcpy() the chosen mode. > > I just posted a new version in which I'm not relying on 'numa_assoc_array'. > Instead, the DT functions will use a get_associativity() to retrieve > the current associativity array based on CAS choice, in a similar > manner like it is already done with the array sizes. This also allowed > us to get rid of associativity_reset(). > Hi Daniel, I've vaguely started at looking at the new version and it seems to look better indeed. Now that KVM Forum 2021 and its too many captivating talks are over :-) , I'll try to find some time to review. Cheers, -- Greg > > Thanks, > > > Daniel > > > > > > > > > I don't have strong opinions about which way to do it, so I'm all ears. > > > > > > Thanks, > > > > > > Daniel > > > > > > > >> > >>> + } > >>> + > >>> /* > >>> * For all associativity arrays: first position is the size, > >>> * position MAX_DISTANCE_REF_POINTS is always the numa_id, > >>> @@ -177,8 +182,8 @@ static void > >>> spapr_numa_FORM1_affinity_init(SpaprMachineState *spapr, > >>> * 'i' will be a valid node_id set by the user. > >>> */ > >>> for (i = 0; i < nb_numa_nodes; i++) { > >>> - spapr->numa_assoc_array[i][0] = > >>> cpu_to_be32(MAX_DISTANCE_REF_POINTS); > >>> - spapr->numa_assoc_array[i][MAX_DISTANCE_REF_POINTS] = > >>> cpu_to_be32(i); > >>> + spapr->FORM1_assoc_array[i][0] = > >>> cpu_to_be32(MAX_DISTANCE_REF_POINTS); > >>> + spapr->FORM1_assoc_array[i][MAX_DISTANCE_REF_POINTS] = > >>> cpu_to_be32(i); > >>> } > >>> /* > >>> @@ -192,15 +197,15 @@ static void > >>> spapr_numa_FORM1_affinity_init(SpaprMachineState *spapr, > >>> max_nodes_with_gpus = nb_numa_nodes + NVGPU_MAX_NUM; > >>> for (i = nb_numa_nodes; i < max_nodes_with_gpus; i++) { > >>> - spapr->numa_assoc_array[i][0] = > >>> cpu_to_be32(MAX_DISTANCE_REF_POINTS); > >>> + spapr->FORM1_assoc_array[i][0] = > >>> cpu_to_be32(MAX_DISTANCE_REF_POINTS); > >>> for (j = 1; j < MAX_DISTANCE_REF_POINTS; j++) { > >>> uint32_t gpu_assoc = smc->pre_5_1_assoc_refpoints ? > >>> SPAPR_GPU_NUMA_ID : cpu_to_be32(i); > >>> - spapr->numa_assoc_array[i][j] = gpu_assoc; > >>> + spapr->FORM1_assoc_array[i][j] = gpu_assoc; > >>> } > >>> - spapr->numa_assoc_array[i][MAX_DISTANCE_REF_POINTS] = > >>> cpu_to_be32(i); > >>> + spapr->FORM1_assoc_array[i][MAX_DISTANCE_REF_POINTS] = > >>> cpu_to_be32(i); > >>> } > >>> /* > >>> @@ -227,14 +232,33 @@ void > >>> spapr_numa_associativity_init(SpaprMachineState *spapr, > >>> MachineState *machine) > >>> { > >>> spapr_numa_FORM1_affinity_init(spapr, machine); > >>> + > >>> + /* > >>> + * Default to FORM1 affinity until CAS. We'll call affinity_reset() > >>> + * during CAS when we're sure about which NUMA affinity the guest > >>> + * is going to use. > >>> + * > >>> + * This step is a failsafe - guests in the wild were able to read > >>> + * FORM1 affinity info before CAS for a long time. Since > >>> affinity_reset() > >>> + * is just a pointer switch between data that was already populated > >>> + * here, this is an acceptable overhead to be on the safer side. > >>> + */ > >>> + spapr->numa_assoc_array = spapr->FORM1_assoc_array; > >> > >> The right way to do that is to call spapr_numa_associativity_reset() from > >> spapr_machine_reset() because we want to revert to FORM1 each time the > >> guest is rebooted. > >> > >>> +} > >>> + > >>> +void spapr_numa_associativity_reset(SpaprMachineState *spapr) > >>> +{ > >>> + /* No FORM2 affinity implemented yet */ > >> > >> This seems quite obvious at this point, not sure the comment helps. > >> > >>> + spapr->numa_assoc_array = spapr->FORM1_assoc_array; > >>> } > >>> void spapr_numa_write_associativity_dt(SpaprMachineState *spapr, void > >>> *fdt, > >>> int offset, int nodeid) > >>> { > >>> + /* Hardcode the size of FORM1 associativity array for now */ > >>> _FDT((fdt_setprop(fdt, offset, "ibm,associativity", > >>> spapr->numa_assoc_array[nodeid], > >>> - sizeof(spapr->numa_assoc_array[nodeid])))); > >>> + NUMA_ASSOC_SIZE * sizeof(uint32_t)))); > >> > >> Rather than doing this temporary change that gets undone in > >> a later patch, I suggest you introduce get_numa_assoc_size() > >> in a preliminary patch and use it here already : > >> > >> - NUMA_ASSOC_SIZE * sizeof(uint32_t)))); > >> + get_numa_assoc_size(spapr) * sizeof(uint32_t)))); > >> > >> This will simplify the reviewing. > >> > >>> } > >>> static uint32_t *spapr_numa_get_vcpu_assoc(SpaprMachineState *spapr, > >>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h > >>> index 637652ad16..8a9490f0bf 100644 > >>> --- a/include/hw/ppc/spapr.h > >>> +++ b/include/hw/ppc/spapr.h > >>> @@ -249,7 +249,8 @@ struct SpaprMachineState { > >>> unsigned gpu_numa_id; > >>> SpaprTpmProxy *tpm_proxy; > >>> - uint32_t numa_assoc_array[MAX_NODES + > >>> NVGPU_MAX_NUM][NUMA_ASSOC_SIZE]; > >>> + uint32_t *FORM1_assoc_array[MAX_NODES + NVGPU_MAX_NUM]; > >> > >> As said above, I really don't see the point in not having : > >> > >> uint32_t *FORM1_assoc_array[MAX_NODES + > >> NVGPU_MAX_NUM][NUMA_ASSOC_SIZE]; > >> > >>> + uint32_t **numa_assoc_array; > >>> Error *fwnmi_migration_blocker; > >>> }; > >>> diff --git a/include/hw/ppc/spapr_numa.h b/include/hw/ppc/spapr_numa.h > >>> index 6f9f02d3de..ccf3e4eae8 100644 > >>> --- a/include/hw/ppc/spapr_numa.h > >>> +++ b/include/hw/ppc/spapr_numa.h > >>> @@ -24,6 +24,7 @@ > >>> */ > >>> void spapr_numa_associativity_init(SpaprMachineState *spapr, > >>> MachineState *machine); > >>> +void spapr_numa_associativity_reset(SpaprMachineState *spapr); > >>> void spapr_numa_write_rtas_dt(SpaprMachineState *spapr, void *fdt, int > >>> rtas); > >>> void spapr_numa_write_associativity_dt(SpaprMachineState *spapr, void > >>> *fdt, > >>> int offset, int nodeid); > >>