Re: [RESEND PATCH] libiscsi: Fix race between iscsi_xmit_task and iscsi_complete_task
Bob, > When a target sends Check Condition, whilst initiator is busy xmiting > re-queued data, could lead to race between iscsi_complete_task() and > iscsi_xmit_task() and eventually crashing with the following kernel > backtrace. Applied to 5.0/scsi-fixes, thanks! -- Martin K. Petersen Oracle Linux Engineering -- You received this message because you are subscribed to the Google Groups "open-iscsi" group. To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscr...@googlegroups.com. To post to this group, send email to open-iscsi@googlegroups.com. Visit this group at https://groups.google.com/group/open-iscsi. For more options, visit https://groups.google.com/d/optout.
Re: [RESEND PATCH] libiscsi: Fix race between iscsi_xmit_task and iscsi_complete_task
On 2/12/19 9:21 PM, Bob Liu wrote: > From: Anoob Soman > > When a target sends Check Condition, whilst initiator is busy xmiting > re-queued data, could lead to race between iscsi_complete_task() and > iscsi_xmit_task() and eventually crashing with the following kernel > backtrace. > > [3326150.987523] ALERT: BUG: unable to handle kernel NULL pointer dereference > at 0078 > [3326150.987549] ALERT: IP: [] iscsi_xmit_task+0x2d/0xc0 > [libiscsi] > [3326150.987571] WARN: PGD 569c8067 PUD 569c9067 PMD 0 > [3326150.987582] WARN: Oops: 0002 [#1] SMP > [3326150.987593] WARN: Modules linked in: tun nfsv3 nfs fscache dm_round_robin > [3326150.987762] WARN: CPU: 2 PID: 8399 Comm: kworker/u32:1 Tainted: G O > 4.4.0+2 #1 > [3326150.987774] WARN: Hardware name: Dell Inc. PowerEdge R720/0W7JN5, BIOS > 2.5.4 01/22/2016 > [3326150.987790] WARN: Workqueue: iscsi_q_13 iscsi_xmitworker [libiscsi] > [3326150.987799] WARN: task: 8801d50f3800 ti: 8801f5458000 task.ti: > 8801f5458000 > [3326150.987810] WARN: RIP: e030:[] [] > iscsi_xmit_task+0x2d/0xc0 [libiscsi] > [3326150.987825] WARN: RSP: e02b:8801f545bdb0 EFLAGS: 00010246 > [3326150.987831] WARN: RAX: ffc3 RBX: 880282d2ab20 RCX: > 88026b6ac480 > [3326150.987842] WARN: RDX: RSI: fe01 RDI: > 880282d2ab20 > [3326150.987852] WARN: RBP: 8801f545bdc8 R08: R09: > 0008 > [3326150.987862] WARN: R10: R11: fe88 R12: > > [3326150.987872] WARN: R13: 880282d2abe8 R14: 880282d2abd8 R15: > 880282d2ac08 > [3326150.987890] WARN: FS: 7f5a866b4840() GS:88028a64() > knlGS: > [3326150.987900] WARN: CS: e033 DS: ES: CR0: 80050033 > [3326150.987907] WARN: CR2: 0078 CR3: 70244000 CR4: > 00042660 > [3326150.987918] WARN: Stack: > [3326150.987924] WARN: 880282d2ad58 880282d2ab20 880282d2abe8 > 8801f545be18 > [3326150.987938] WARN: a05cea90 880282d2abf8 88026b59cc80 > 88026b59cc00 > [3326150.987951] WARN: 88022acf32c0 880289491800 880255a80800 > 0400 > [3326150.987964] WARN: Call Trace: > [3326150.987975] WARN: [] iscsi_xmitworker+0x2f0/0x360 > [libiscsi] > [3326150.987988] WARN: [] process_one_work+0x1fc/0x3b0 > [3326150.987997] WARN: [] worker_thread+0x2a5/0x470 > [3326150.988006] WARN: [] ? __schedule+0x648/0x870 > [3326150.988015] WARN: [] ? rescuer_thread+0x300/0x300 > [3326150.988023] WARN: [] kthread+0xd5/0xe0 > [3326150.988031] WARN: [] ? kthread_stop+0x110/0x110 > [3326150.988040] WARN: [] ret_from_fork+0x3f/0x70 > [3326150.988048] WARN: [] ? kthread_stop+0x110/0x110 > [3326150.988127] ALERT: RIP [] iscsi_xmit_task+0x2d/0xc0 > [libiscsi] > [3326150.988138] WARN: RSP > [3326150.988144] WARN: CR2: 0078 > [3326151.020366] WARN: ---[ end trace 1c60974d4678d81b ]--- > > Commit 6f8830f5bbab (scsi: libiscsi: add lock around task lists to fix list > corruption regression) introduced "taskqueuelock" to fix list corruption > during > the race, but this wasn't enough. > > Re-setting of conn->task to NULL, could race with iscsi_xmit_task(). > iscsi_complete_task() > { > > if (conn->task == task) > conn->task = NULL; > } > > conn->task in iscsi_xmit_task() could be NULL and so will be task. > __iscsi_get_task(task) will crash (NullPtr de-ref), trying to access refcount. > > iscsi_xmit_task() > { > struct iscsi_task *task = conn->task; > > __iscsi_get_task(task); > } > > This commit will take extra conn->session->back_lock in iscsi_xmit_task() > to ensure iscsi_xmit_task() waits for iscsi_complete_task(), if > iscsi_complete_task() wins the race. > If iscsi_xmit_task() wins the race, iscsi_xmit_task() increments > task->refcount > (__iscsi_get_task) ensuring iscsi_complete_task() will not iscsi_free_task(). > > Signed-off-by: Anoob Soman > Signed-off-by: Bob Liu > --- > drivers/scsi/libiscsi.c | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c > index b8d325ce8754..120fc520f27a 100644 > --- a/drivers/scsi/libiscsi.c > +++ b/drivers/scsi/libiscsi.c > @@ -1459,7 +1459,13 @@ static int iscsi_xmit_task(struct iscsi_conn *conn) > if (test_bit(ISCSI_SUSPEND_BIT, >suspend_tx)) > return -ENODATA; > > + spin_lock_bh(>session->back_lock); > + if (conn->task == NULL) { > + spin_unlock_bh(>session->back_lock); > + return -ENODATA; > + } > __iscsi_get_task(task); > + spin_unlock_bh(>session->back_lock); > spin_unlock_bh(>session->frwd_lock); > rc = conn->session->tt->xmit_task(task); > spin_lock_bh(>session->frwd_lock); > Signed-off-by: Lee Duncan -- You received this message because you are subscribed to the Google Groups "open-iscsi" group. To unsubscribe from this group and stop
Re: [RFC] Add source address binding to iSER
On Wednesday, February 6, 2019 at 9:11:45 AM UTC-8, Nikhil Valluru wrote: > > Hi Robert, > > I am looking for a similar fix and wanted to know if this patch was > accepted into the linux kernel? > > Thanks, > Nikhil > > As a quick look at linux upstream shows, this code is not in the kernel. More over, looking at this old thread, I believe the original problem turned out to be a non-problem when Robert updated to the latest upstream. -- You received this message because you are subscribed to the Google Groups "open-iscsi" group. To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscr...@googlegroups.com. To post to this group, send email to open-iscsi@googlegroups.com. Visit this group at https://groups.google.com/group/open-iscsi. For more options, visit https://groups.google.com/d/optout.
[RESEND PATCH] libiscsi: Fix race between iscsi_xmit_task and iscsi_complete_task
From: Anoob Soman When a target sends Check Condition, whilst initiator is busy xmiting re-queued data, could lead to race between iscsi_complete_task() and iscsi_xmit_task() and eventually crashing with the following kernel backtrace. [3326150.987523] ALERT: BUG: unable to handle kernel NULL pointer dereference at 0078 [3326150.987549] ALERT: IP: [] iscsi_xmit_task+0x2d/0xc0 [libiscsi] [3326150.987571] WARN: PGD 569c8067 PUD 569c9067 PMD 0 [3326150.987582] WARN: Oops: 0002 [#1] SMP [3326150.987593] WARN: Modules linked in: tun nfsv3 nfs fscache dm_round_robin [3326150.987762] WARN: CPU: 2 PID: 8399 Comm: kworker/u32:1 Tainted: G O 4.4.0+2 #1 [3326150.987774] WARN: Hardware name: Dell Inc. PowerEdge R720/0W7JN5, BIOS 2.5.4 01/22/2016 [3326150.987790] WARN: Workqueue: iscsi_q_13 iscsi_xmitworker [libiscsi] [3326150.987799] WARN: task: 8801d50f3800 ti: 8801f5458000 task.ti: 8801f5458000 [3326150.987810] WARN: RIP: e030:[] [] iscsi_xmit_task+0x2d/0xc0 [libiscsi] [3326150.987825] WARN: RSP: e02b:8801f545bdb0 EFLAGS: 00010246 [3326150.987831] WARN: RAX: ffc3 RBX: 880282d2ab20 RCX: 88026b6ac480 [3326150.987842] WARN: RDX: RSI: fe01 RDI: 880282d2ab20 [3326150.987852] WARN: RBP: 8801f545bdc8 R08: R09: 0008 [3326150.987862] WARN: R10: R11: fe88 R12: [3326150.987872] WARN: R13: 880282d2abe8 R14: 880282d2abd8 R15: 880282d2ac08 [3326150.987890] WARN: FS: 7f5a866b4840() GS:88028a64() knlGS: [3326150.987900] WARN: CS: e033 DS: ES: CR0: 80050033 [3326150.987907] WARN: CR2: 0078 CR3: 70244000 CR4: 00042660 [3326150.987918] WARN: Stack: [3326150.987924] WARN: 880282d2ad58 880282d2ab20 880282d2abe8 8801f545be18 [3326150.987938] WARN: a05cea90 880282d2abf8 88026b59cc80 88026b59cc00 [3326150.987951] WARN: 88022acf32c0 880289491800 880255a80800 0400 [3326150.987964] WARN: Call Trace: [3326150.987975] WARN: [] iscsi_xmitworker+0x2f0/0x360 [libiscsi] [3326150.987988] WARN: [] process_one_work+0x1fc/0x3b0 [3326150.987997] WARN: [] worker_thread+0x2a5/0x470 [3326150.988006] WARN: [] ? __schedule+0x648/0x870 [3326150.988015] WARN: [] ? rescuer_thread+0x300/0x300 [3326150.988023] WARN: [] kthread+0xd5/0xe0 [3326150.988031] WARN: [] ? kthread_stop+0x110/0x110 [3326150.988040] WARN: [] ret_from_fork+0x3f/0x70 [3326150.988048] WARN: [] ? kthread_stop+0x110/0x110 [3326150.988127] ALERT: RIP [] iscsi_xmit_task+0x2d/0xc0 [libiscsi] [3326150.988138] WARN: RSP [3326150.988144] WARN: CR2: 0078 [3326151.020366] WARN: ---[ end trace 1c60974d4678d81b ]--- Commit 6f8830f5bbab (scsi: libiscsi: add lock around task lists to fix list corruption regression) introduced "taskqueuelock" to fix list corruption during the race, but this wasn't enough. Re-setting of conn->task to NULL, could race with iscsi_xmit_task(). iscsi_complete_task() { if (conn->task == task) conn->task = NULL; } conn->task in iscsi_xmit_task() could be NULL and so will be task. __iscsi_get_task(task) will crash (NullPtr de-ref), trying to access refcount. iscsi_xmit_task() { struct iscsi_task *task = conn->task; __iscsi_get_task(task); } This commit will take extra conn->session->back_lock in iscsi_xmit_task() to ensure iscsi_xmit_task() waits for iscsi_complete_task(), if iscsi_complete_task() wins the race. If iscsi_xmit_task() wins the race, iscsi_xmit_task() increments task->refcount (__iscsi_get_task) ensuring iscsi_complete_task() will not iscsi_free_task(). Signed-off-by: Anoob Soman Signed-off-by: Bob Liu --- drivers/scsi/libiscsi.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c index b8d325ce8754..120fc520f27a 100644 --- a/drivers/scsi/libiscsi.c +++ b/drivers/scsi/libiscsi.c @@ -1459,7 +1459,13 @@ static int iscsi_xmit_task(struct iscsi_conn *conn) if (test_bit(ISCSI_SUSPEND_BIT, >suspend_tx)) return -ENODATA; + spin_lock_bh(>session->back_lock); + if (conn->task == NULL) { + spin_unlock_bh(>session->back_lock); + return -ENODATA; + } __iscsi_get_task(task); + spin_unlock_bh(>session->back_lock); spin_unlock_bh(>session->frwd_lock); rc = conn->session->tt->xmit_task(task); spin_lock_bh(>session->frwd_lock); -- 2.17.1 -- You received this message because you are subscribed to the Google Groups "open-iscsi" group. To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscr...@googlegroups.com. To post to this group, send email to open-iscsi@googlegroups.com. Visit this group at https://groups.google.com/group/open-iscsi. For more options, visit
Re: max_sectors_kb so small, throughout so small
Hello I am having very similar issue. Scenario 1 )When I mount a rbd volume directly using rbd map command to a linux kernel. I am getting good performance. when I do dd on the above volume, I am getting 380 - 430 MB's. -- I am good with this. Scenario 2 ) I am using iscsi-gateway. iscsi1 and iscs2 as targets and presenting LUNS ro all my initiators. When I do dd on the volumes presented via iscsi gateway, my performance is 10 to 25 MB's. which is very poor. I am gwcli commandline and connot use the belwo command "create images image=test1 size=100g max_data_area_mb=128 Unexpected keyword parameter 'max_data_area_mb' I also cannot use targetcli on this server. How can I increase the max_data_area_mb && hw_max_sectors sizes in my cluster ? Thank you in advance On Friday, September 21, 2018 at 7:16:58 AM UTC-7, 3kbo...@gmail.com wrote: > > Thank you for your opinion of max_sectors_kb, Ulrich. > It provided me some inspiration for understanding the whole thing. > > Thank you for your advice, Mike. > I will tune some parameters as you have mentioned below. > > I will share if I could make achievements in performance tuning. > > > 在 2018年9月21日星期五 UTC+8上午1:43:14,Mike Christie写道: >> >> On 09/12/2018 09:26 PM, 3kbo...@gmail.com wrote: >> > Thank you for your reply, Mike. >> > Now my iscsi disk performance can be around 300MB/s in 4M sequence >> > write(TCMU+LIO) >> > It increase from 20MB/s to 300MB/s, after I can change max_data_area_mb >> > from 8 to 256 && hw_max_sectors from 128 to 8192. >> > To my cluster, after a lot of tests I found that I should keep >> > "max_data_area_mb>128 && hw_max_sectors>=4096" in order to get a good >> > performance. >> > Does my setting can cause some side effects? >> >> It depends on the kernel. For the RHEL/Centos kernel you are using the >> kernel will preallocate max_data_area_mb of memory for each device. For >> upstream, we no longer preallocate, but once it is allocated we do not >> free the memory unless global_max_data_area_mb is hit or the device is >> removed. >> >> With a high hw_max_sectors latency will increase due to sending really >> large commands, so it depends on your workload and what you need. >> >> We used to set hw_max_sectors to the rbd object size (4MB by default), >> but in our testing we would see throughput go down around 512k - 1MB. >> >> > Are there any other parameters can improve the performance quite >> obvious? >> >> The normal networking ones like using jumbo frames, >> net.core.*/net.ipv4.*, etc. Check your nic's documentation for the best >> settings. >> >> There are some iscsi ones like the cmdsn/cmds_max I mentioned and then >> also the segment related ones like MaxRecvDataSegmentLength, >> MaxXmitDataSegmentLength, MaxBurstLength, FirstBurstLength and have >> ImmediateData on. >> >> >> > Why the default value of max_data_area_mb and hw_max_sectors is so >> > small, and bad performance? >> >> I do not know. It was just what the original author had used initially. >> >> > Could you talk something about this? >> > At least, max_data_area_mb>128 && hw_max_sectors>=4096, I can get a >> > better performance seems to be acceptable. >> > If my settings can give other users some help, I will be happy. >> > >> > 在 2018年9月12日星期三 UTC+8上午12:39:16,Mike Christie写道: >> > >> > On 09/11/2018 11:30 AM, Mike Christie wrote: >> > > Hey, >> > > >> > > Cc mchr...@redhat.com , or I will not see these >> > messages until I check >> > > the list maybe once a week. >> > > >> > > On 09/05/2018 10:36 PM, 3kbo...@gmail.com wrote: >> > >> What lio fabric driver are you using? iSCSI? What kernel >> > version >> > >> and >> > >> what version of tcmu-runner? >> > >> >> > >> io fabric driver :iscsi >> > >> >> > >> iscsid version: 2.0-873 >> > >> >> > >> OS version: CentOS Linux release 7.5.1804 >> > (Core) >> > >> >> > >> kernel version: 3.10.0-862.el7.x86_64 >> > >> >> > >> tcmu-runner version:1.4.0-rc1 >> > >> >> > >> >> > >> > There is also a perf bug in that initiator if the >> node.session.cmds_max >> > is greater than the LIO default_cmdsn_depth and your IO test tries >> to >> > send cmds > node.session.cmds_max. >> > >> > I have known the bug before, because I had google a lot. >> > It increase from 320MB/s to 340MB/s (4M seq write), seems like a >> stable >> > promotion. >> > >> > Settings Before: 320MB/s >> > node.session.cmds_max = 2048 >> > default_cmdsn_depth = 64 >> > >> > Settings After: 340MB/s >> > node.session.cmds_max = 64 >> > default_cmdsn_depth = 64 >> > >> > So set the node.session.cmds_max and default_cmdsn_depth to the >> same >> > value. You can set the default_cmdsn_depth in saveconfig.json, and >> set >> >