From: Biaoxiang Ye <[email protected]>

Hi:
We found on NUMA machines the kworker of iscsi created always jump around
across node boundaries. If it work on the different node even
different cpu package with the softirq of network interface, memcpy
within iscsi_tcp_segment_recv will be slow down, and iscsi got an
terrible performance. Iscsi use create_singlethread_dynamic_workqueue
to create an order workqueue, unfortunately the order workqueue only
have single pwq, all of works are queued the same workerpool. This is
not optimal on NUMA machines, will cause workers jump around across node.

The first patch add a new wq flags __WQ_DYNAMIC, and a new macros
create_singlethread_dynamic_workqueue, this new kind of single thread 
workqueue creates a separate pwq covering the intersecting CPUS for 
each NUMA node which has online CPUS in @attrs->cpumask instead of 
mapping all entries of numa_pwq_tbl[] to the same pwq. After this, 
we can specify the @cpu of queue_work_on, so the work can be executed 
on the same NUMA node of the specified @cpu.

The second patch, we trace the cpu of softirq, and tell queue_work_on
to execute iscsi_xmitworker on the same NUMA node.

Any advice is welcome.

Thanks in advance!
------------------------------------------------------------------
The performance data as below:
[cpu info]:
Architecture:          aarch64
Byte Order:            Little Endian
CPU(s):                128
On-line CPU(s) list:   0-127
Thread(s) per core:    1
Core(s) per socket:    64
Socket(s):             2
NUMA node(s):          4
Model:                 0
CPU max MHz:           2600.0000
CPU min MHz:           200.0000
BogoMIPS:              200.00
L1d cache:             64K
L1i cache:             64K
L2 cache:              512K
L3 cache:              32768K
NUMA node0 CPU(s):     0-31
NUMA node1 CPU(s):     32-63
NUMA node2 CPU(s):     64-95
NUMA node3 CPU(s):     96-127

[test cmd]:
fio -filename=/dev/disk/by-id/wwn-0x6883fd3100a2ad260036281700000000 
-direct=1 -iodepth=32 -rw=read -bs=64k -size=30G -ioengine=libaio -numjobs=1
-group_reporting -name=mytest  -time_based -ramp_time=60 -runtime=60

1.bad (the kworker and the irqsoft are work on different cpu package)
Jobs: 1 (f=1): [R] [52.5% done] [852.3MB/0KB/0KB /s] [13.7K/0/0 iops] [eta
00m:57s]
Jobs: 1 (f=1): [R] [53.3% done] [861.4MB/0KB/0KB /s] [13.8K/0/0 iops] [eta
00m:56s]
Jobs: 1 (f=1): [R] [54.2% done] [868.2MB/0KB/0KB /s] [13.9K/0/0 iops] [eta
00m:55s]

2.good  (after pactched, they are work on the same NUMA node)
Jobs: 1 (f=1): [R] [53.3% done] [1070MB/0KB/0KB /s] [17.2K/0/0 iops] [eta
00m:56s]
Jobs: 1 (f=1): [R] [55.0% done] [1064MB/0KB/0KB /s] [17.3K/0/0 iops] [eta
00m:54s]
Jobs: 1 (f=1): [R] [56.7% done] [1069MB/0KB/0KB /s] [17.1K/0/0 iops] [eta
00m:52s]

Biaoxiang Ye (2):
  workqueue: implement NUMA affinity for single thread workqueue
  iscsi: use dynamic single thread workqueue to improve performance

 drivers/scsi/iscsi_tcp.c  |  8 ++++++++
 drivers/scsi/libiscsi.c   | 12 +++++++++---
 include/linux/workqueue.h |  7 +++++++
 kernel/workqueue.c        | 40 ++++++++++++++++++++++++++++++++++------
 4 files changed, 58 insertions(+), 9 deletions(-)

-- 
1.8.3.1


-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/open-iscsi/1563991180-11532-1-git-send-email-yebiaoxiang%40huawei.com.

Reply via email to