Re: [PATCH v03] powerpc/numa: Perform full re-add of CPU for PRRN/VPHN topology update
On 2/7/19 11:44 PM, Srikar Dronamraju wrote: >> >> int arch_update_cpu_topology(void) >> { >> -return numa_update_cpu_topology(true); >> +int changed = topology_changed; >> + >> +topology_changed = 0; >> +return changed; >> } >> > > Do we need Powerpc override for arch_update_cpu_topology() now? That > topology_changed sometime back doesn't seem to have help. The scheduler > atleast now is neglecting whether the topology changed or not. I was dealing with a a concurrency problem. Revisiting again. > > Also we can do away with the new topology_changed. > >> static void topology_work_fn(struct work_struct *work) >> { >> -rebuild_sched_domains(); >> +lock_device_hotplug(); >> +if (numa_update_cpu_topology(true)) >> +rebuild_sched_domains(); >> +unlock_device_hotplug(); >> } > > Should this hunk be a separate patch by itself to say why > rebuild_sched_domains with a changelog that explains why it should be under > lock_device_hotplug? rebuild_sched_domains already takes cpuset_mutex. > So I am not sure if we need to take device_hotplug_lock. topology_work_fn runs in its own thread like the DLPAR operations. This patch adds calls to Nathan's 'dlpar_cpu_readd' from the topology_work_fn thread. The lock/unlock_device_hotplug guard against concurrency issues with the DLPAR operations, grabbing that lock here to avoid overlap with those other operations. This mod is dependent upon using dlpar_cpu_readd. > >> static DECLARE_WORK(topology_work, topology_work_fn); >> >> -static void topology_schedule_update(void) >> +void topology_schedule_update(void) >> { >> -schedule_work(_work); >> +if (!topology_update_in_progress) >> +schedule_work(_work); >> } >> >> static void topology_timer_fn(struct timer_list *unused) >> { >> +bool sdo = false; > > Is sdo any abbrevation? 'for do the schedule update'. Will remove per below. > >> + >> +if (topology_scans < 1) >> +bitmap_fill(cpumask_bits(_associativity_changes_mask), >> +nr_cpumask_bits); > > Why do we need topology_scan? Just to make sure > cpu_associativity_changes_mask is populated only once? > cant we use a static bool inside the function for the same? I was running into a race condition. On one of my test systems, start_topology_update via shared_proc_topology_init and the PHYP did not provide any change info about the CPUs that early in the boot. The first run erased the cpu bits in cpu_associativity_changes_mask, and subsequent runs did not pay attention to the reported updates. Taking another look. > > >> + >> if (prrn_enabled && cpumask_weight(_associativity_changes_mask)) >> -topology_schedule_update(); >> -else if (vphn_enabled) { >> +sdo = true; >> +if (vphn_enabled) { > > Any reason to remove the else above? When vphn_enabled and prrn_enabled, it was not calling 'update_cpu_associativity_changes_mask()', so was not getting the necessary change info. >> if (update_cpu_associativity_changes_mask() > 0) >> -topology_schedule_update(); >> +sdo = true; >> reset_topology_timer(); >> } >> +if (sdo) >> +topology_schedule_update(); >> +topology_scans++; >> } > > Are the above two hunks necessary? Not getting how the current changes are > different from the previous. Not important. Will undo. > -- Michael W. Bringmann Linux Technology Center IBM Corporation Tie-Line 363-5196 External: (512) 286-5196 Cell: (512) 466-0650 m...@linux.vnet.ibm.com
Re: [PATCH v03] powerpc/numa: Perform full re-add of CPU for PRRN/VPHN topology update
> > int arch_update_cpu_topology(void) > { > - return numa_update_cpu_topology(true); > + int changed = topology_changed; > + > + topology_changed = 0; > + return changed; > } > Do we need Powerpc override for arch_update_cpu_topology() now? That topology_changed sometime back doesn't seem to have help. The scheduler atleast now is neglecting whether the topology changed or not. Also we can do away with the new topology_changed. > static void topology_work_fn(struct work_struct *work) > { > - rebuild_sched_domains(); > + lock_device_hotplug(); > + if (numa_update_cpu_topology(true)) > + rebuild_sched_domains(); > + unlock_device_hotplug(); > } Should this hunk be a separate patch by itself to say why rebuild_sched_domains with a changelog that explains why it should be under lock_device_hotplug? rebuild_sched_domains already takes cpuset_mutex. So I am not sure if we need to take device_hotplug_lock. > static DECLARE_WORK(topology_work, topology_work_fn); > > -static void topology_schedule_update(void) > +void topology_schedule_update(void) > { > - schedule_work(_work); > + if (!topology_update_in_progress) > + schedule_work(_work); > } > > static void topology_timer_fn(struct timer_list *unused) > { > + bool sdo = false; Is sdo any abbrevation? > + > + if (topology_scans < 1) > + bitmap_fill(cpumask_bits(_associativity_changes_mask), > + nr_cpumask_bits); Why do we need topology_scan? Just to make sure cpu_associativity_changes_mask is populated only once? cant we use a static bool inside the function for the same? > + > if (prrn_enabled && cpumask_weight(_associativity_changes_mask)) > - topology_schedule_update(); > - else if (vphn_enabled) { > + sdo = true; > + if (vphn_enabled) { Any reason to remove the else above? > if (update_cpu_associativity_changes_mask() > 0) > - topology_schedule_update(); > + sdo = true; > reset_topology_timer(); > } > + if (sdo) > + topology_schedule_update(); > + topology_scans++; > } Are the above two hunks necessary? Not getting how the current changes are different from the previous. -- Thanks and Regards Srikar Dronamraju
Re: [PATCH v03] powerpc/numa: Perform full re-add of CPU for PRRN/VPHN topology update
Hi Michael, Thank you for the patch! Yet something to improve: [auto build test ERROR on powerpc/next] [also build test ERROR on v5.0-rc4 next-20190206] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Michael-Bringmann/powerpc-numa-Perform-full-re-add-of-CPU-for-PRRN-VPHN-topology-update/20190207-101545 base: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next config: powerpc-allyesconfig (attached as .config) compiler: powerpc64-linux-gnu-gcc (Debian 8.2.0-11) 8.2.0 reproduce: wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree GCC_VERSION=8.2.0 make.cross ARCH=powerpc All errors (new ones prefixed by >>): arch/powerpc/mm/numa.c: In function 'numa_update_cpu_topology': >> arch/powerpc/mm/numa.c:1361:4: error: implicit declaration of function >> 'dlpar_cpu_readd'; did you mean 'raw_cpu_read'? >> [-Werror=implicit-function-declaration] dlpar_cpu_readd(cpu); ^~~ raw_cpu_read cc1: some warnings being treated as errors vim +1361 arch/powerpc/mm/numa.c 1298 1299 /* 1300 * Update the node maps and sysfs entries for each cpu whose home node 1301 * has changed. Returns 1 when the topology has changed, and 0 otherwise. 1302 * 1303 * readd_cpus: Also readd any CPUs that have changed affinity 1304 */ 1305 static int numa_update_cpu_topology(bool readd_cpus) 1306 { 1307 unsigned int cpu, sibling, changed = 0; 1308 struct topology_update_data *updates, *ud; 1309 cpumask_t updated_cpus; 1310 struct device *dev; 1311 int weight, new_nid, i = 0; 1312 1313 if ((!prrn_enabled && !vphn_enabled && topology_inited) || 1314 topology_update_in_progress) 1315 return 0; 1316 1317 weight = cpumask_weight(_associativity_changes_mask); 1318 if (!weight) 1319 return 0; 1320 1321 updates = kcalloc(weight, sizeof(*updates), GFP_KERNEL); 1322 if (!updates) 1323 return 0; 1324 1325 topology_update_in_progress = 1; 1326 1327 cpumask_clear(_cpus); 1328 1329 for_each_cpu(cpu, _associativity_changes_mask) { 1330 /* 1331 * If siblings aren't flagged for changes, updates list 1332 * will be too short. Skip on this update and set for next 1333 * update. 1334 */ 1335 if (!cpumask_subset(cpu_sibling_mask(cpu), 1336 _associativity_changes_mask)) { 1337 pr_info("Sibling bits not set for associativity " 1338 "change, cpu%d\n", cpu); 1339 cpumask_or(_associativity_changes_mask, 1340 _associativity_changes_mask, 1341 cpu_sibling_mask(cpu)); 1342 cpu = cpu_last_thread_sibling(cpu); 1343 continue; 1344 } 1345 1346 new_nid = find_and_online_cpu_nid(cpu); 1347 1348 if ((new_nid == numa_cpu_lookup_table[cpu]) || 1349 !cpu_present(cpu)) { 1350 cpumask_andnot(_associativity_changes_mask, 1351 _associativity_changes_mask, 1352 cpu_sibling_mask(cpu)); 1353 if (cpu_present(cpu)) 1354 dbg("Assoc chg gives same node %d for cpu%d\n", 1355 new_nid, cpu); 1356 cpu = cpu_last_thread_sibling(cpu); 1357 continue; 1358 } 1359 1360 if (readd_cpus) > 1361 dlpar_cpu_readd(cpu); 1362 1363 for_each_cpu(sibling, cpu_sibling_mask(cpu)) { 1364 ud = [i++]; 1365 ud->next = [i]; 1366 ud->cpu = sibling; 1367 ud->new_nid = new_nid; 1368 ud->old_nid = numa_cpu_lookup_table[sibling]; 1369 cpumask_set_cpu(sibling, _cpus); 1370 } 1371 cpu = cpu_last_thread_sibling(cpu); 1372 } 1373 1374 /* 1375 * Prevent processing of 'updates' from overflowing array 1376 * where last entry filled in a 'next' pointer. 1377