From: Biaoxiang Ye <[email protected]> Hi: We found on NUMA machines the kworker of iscsi created always jump around across node boundaries. If it work on the different node even different cpu package with the softirq of network interface, memcpy within iscsi_tcp_segment_recv will be slow down, and iscsi got an terrible performance. Iscsi use create_singlethread_dynamic_workqueue to create an order workqueue, unfortunately the order workqueue only have single pwq, all of works are queued the same workerpool. This is not optimal on NUMA machines, will cause workers jump around across node.
The first patch add a new wq flags __WQ_DYNAMIC, and a new macros create_singlethread_dynamic_workqueue, this new kind of single thread workqueue creates a separate pwq covering the intersecting CPUS for each NUMA node which has online CPUS in @attrs->cpumask instead of mapping all entries of numa_pwq_tbl[] to the same pwq. After this, we can specify the @cpu of queue_work_on, so the work can be executed on the same NUMA node of the specified @cpu. The second patch, we trace the cpu of softirq, and tell queue_work_on to execute iscsi_xmitworker on the same NUMA node. Any advice is welcome. Thanks in advance! ------------------------------------------------------------------ The performance data as below: [cpu info]: Architecture: aarch64 Byte Order: Little Endian CPU(s): 128 On-line CPU(s) list: 0-127 Thread(s) per core: 1 Core(s) per socket: 64 Socket(s): 2 NUMA node(s): 4 Model: 0 CPU max MHz: 2600.0000 CPU min MHz: 200.0000 BogoMIPS: 200.00 L1d cache: 64K L1i cache: 64K L2 cache: 512K L3 cache: 32768K NUMA node0 CPU(s): 0-31 NUMA node1 CPU(s): 32-63 NUMA node2 CPU(s): 64-95 NUMA node3 CPU(s): 96-127 [test cmd]: fio -filename=/dev/disk/by-id/wwn-0x6883fd3100a2ad260036281700000000 -direct=1 -iodepth=32 -rw=read -bs=64k -size=30G -ioengine=libaio -numjobs=1 -group_reporting -name=mytest -time_based -ramp_time=60 -runtime=60 1.bad (the kworker and the irqsoft are work on different cpu package) Jobs: 1 (f=1): [R] [52.5% done] [852.3MB/0KB/0KB /s] [13.7K/0/0 iops] [eta 00m:57s] Jobs: 1 (f=1): [R] [53.3% done] [861.4MB/0KB/0KB /s] [13.8K/0/0 iops] [eta 00m:56s] Jobs: 1 (f=1): [R] [54.2% done] [868.2MB/0KB/0KB /s] [13.9K/0/0 iops] [eta 00m:55s] 2.good (after pactched, they are work on the same NUMA node) Jobs: 1 (f=1): [R] [53.3% done] [1070MB/0KB/0KB /s] [17.2K/0/0 iops] [eta 00m:56s] Jobs: 1 (f=1): [R] [55.0% done] [1064MB/0KB/0KB /s] [17.3K/0/0 iops] [eta 00m:54s] Jobs: 1 (f=1): [R] [56.7% done] [1069MB/0KB/0KB /s] [17.1K/0/0 iops] [eta 00m:52s] Biaoxiang Ye (2): workqueue: implement NUMA affinity for single thread workqueue iscsi: use dynamic single thread workqueue to improve performance drivers/scsi/iscsi_tcp.c | 8 ++++++++ drivers/scsi/libiscsi.c | 12 +++++++++--- include/linux/workqueue.h | 7 +++++++ kernel/workqueue.c | 40 ++++++++++++++++++++++++++++++++++------ 4 files changed, 58 insertions(+), 9 deletions(-) -- 1.8.3.1 -- You received this message because you are subscribed to the Google Groups "open-iscsi" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/open-iscsi/1563991180-11532-1-git-send-email-yebiaoxiang%40huawei.com.
