Re: [RESEND PATCH] libiscsi: Fix race between iscsi_xmit_task and iscsi_complete_task

2019-02-15 Thread Martin K. Petersen


Bob,

> When a target sends Check Condition, whilst initiator is busy xmiting
> re-queued data, could lead to race between iscsi_complete_task() and
> iscsi_xmit_task() and eventually crashing with the following kernel
> backtrace.

Applied to 5.0/scsi-fixes, thanks!

-- 
Martin K. Petersen  Oracle Linux Engineering

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To post to this group, send email to open-iscsi@googlegroups.com.
Visit this group at https://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.


Re: [RESEND PATCH] libiscsi: Fix race between iscsi_xmit_task and iscsi_complete_task

2019-02-15 Thread Lee Duncan
On 2/12/19 9:21 PM, Bob Liu wrote:
> From: Anoob Soman 
> 
> When a target sends Check Condition, whilst initiator is busy xmiting
> re-queued data, could lead to race between iscsi_complete_task() and
> iscsi_xmit_task() and eventually crashing with the following kernel
> backtrace.
> 
> [3326150.987523] ALERT: BUG: unable to handle kernel NULL pointer dereference 
> at 0078
> [3326150.987549] ALERT: IP: [] iscsi_xmit_task+0x2d/0xc0 
> [libiscsi]
> [3326150.987571] WARN: PGD 569c8067 PUD 569c9067 PMD 0
> [3326150.987582] WARN: Oops: 0002 [#1] SMP
> [3326150.987593] WARN: Modules linked in: tun nfsv3 nfs fscache dm_round_robin
> [3326150.987762] WARN: CPU: 2 PID: 8399 Comm: kworker/u32:1 Tainted: G O 
> 4.4.0+2 #1
> [3326150.987774] WARN: Hardware name: Dell Inc. PowerEdge R720/0W7JN5, BIOS 
> 2.5.4 01/22/2016
> [3326150.987790] WARN: Workqueue: iscsi_q_13 iscsi_xmitworker [libiscsi]
> [3326150.987799] WARN: task: 8801d50f3800 ti: 8801f5458000 task.ti: 
> 8801f5458000
> [3326150.987810] WARN: RIP: e030:[] [] 
> iscsi_xmit_task+0x2d/0xc0 [libiscsi]
> [3326150.987825] WARN: RSP: e02b:8801f545bdb0 EFLAGS: 00010246
> [3326150.987831] WARN: RAX: ffc3 RBX: 880282d2ab20 RCX: 
> 88026b6ac480
> [3326150.987842] WARN: RDX:  RSI: fe01 RDI: 
> 880282d2ab20
> [3326150.987852] WARN: RBP: 8801f545bdc8 R08:  R09: 
> 0008
> [3326150.987862] WARN: R10:  R11: fe88 R12: 
> 
> [3326150.987872] WARN: R13: 880282d2abe8 R14: 880282d2abd8 R15: 
> 880282d2ac08
> [3326150.987890] WARN: FS: 7f5a866b4840() GS:88028a64() 
> knlGS:
> [3326150.987900] WARN: CS: e033 DS:  ES:  CR0: 80050033
> [3326150.987907] WARN: CR2: 0078 CR3: 70244000 CR4: 
> 00042660
> [3326150.987918] WARN: Stack:
> [3326150.987924] WARN: 880282d2ad58 880282d2ab20 880282d2abe8 
> 8801f545be18
> [3326150.987938] WARN: a05cea90 880282d2abf8 88026b59cc80 
> 88026b59cc00
> [3326150.987951] WARN: 88022acf32c0 880289491800 880255a80800 
> 0400
> [3326150.987964] WARN: Call Trace:
> [3326150.987975] WARN: [] iscsi_xmitworker+0x2f0/0x360 
> [libiscsi]
> [3326150.987988] WARN: [] process_one_work+0x1fc/0x3b0
> [3326150.987997] WARN: [] worker_thread+0x2a5/0x470
> [3326150.988006] WARN: [] ? __schedule+0x648/0x870
> [3326150.988015] WARN: [] ? rescuer_thread+0x300/0x300
> [3326150.988023] WARN: [] kthread+0xd5/0xe0
> [3326150.988031] WARN: [] ? kthread_stop+0x110/0x110
> [3326150.988040] WARN: [] ret_from_fork+0x3f/0x70
> [3326150.988048] WARN: [] ? kthread_stop+0x110/0x110
> [3326150.988127] ALERT: RIP [] iscsi_xmit_task+0x2d/0xc0 
> [libiscsi]
> [3326150.988138] WARN: RSP 
> [3326150.988144] WARN: CR2: 0078
> [3326151.020366] WARN: ---[ end trace 1c60974d4678d81b ]---
> 
> Commit 6f8830f5bbab (scsi: libiscsi: add lock around task lists to fix list
> corruption regression) introduced "taskqueuelock" to fix list corruption 
> during
> the race, but this wasn't enough.
> 
> Re-setting of conn->task to NULL, could race with iscsi_xmit_task().
> iscsi_complete_task()
> {
> 
> if (conn->task == task)
> conn->task = NULL;
> }
> 
> conn->task in iscsi_xmit_task() could be NULL and so will be task.
> __iscsi_get_task(task) will crash (NullPtr de-ref), trying to access refcount.
> 
> iscsi_xmit_task()
> {
> struct iscsi_task *task = conn->task;
> 
> __iscsi_get_task(task);
> }
> 
> This commit will take extra conn->session->back_lock in iscsi_xmit_task()
> to ensure iscsi_xmit_task() waits for iscsi_complete_task(), if
> iscsi_complete_task() wins the race.
> If iscsi_xmit_task() wins the race, iscsi_xmit_task() increments 
> task->refcount
> (__iscsi_get_task) ensuring iscsi_complete_task() will not iscsi_free_task().
> 
> Signed-off-by: Anoob Soman 
> Signed-off-by: Bob Liu 
> ---
>  drivers/scsi/libiscsi.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
> index b8d325ce8754..120fc520f27a 100644
> --- a/drivers/scsi/libiscsi.c
> +++ b/drivers/scsi/libiscsi.c
> @@ -1459,7 +1459,13 @@ static int iscsi_xmit_task(struct iscsi_conn *conn)
>   if (test_bit(ISCSI_SUSPEND_BIT, >suspend_tx))
>   return -ENODATA;
>  
> + spin_lock_bh(>session->back_lock);
> + if (conn->task == NULL) {
> + spin_unlock_bh(>session->back_lock);
> + return -ENODATA;
> + }
>   __iscsi_get_task(task);
> + spin_unlock_bh(>session->back_lock);
>   spin_unlock_bh(>session->frwd_lock);
>   rc = conn->session->tt->xmit_task(task);
>   spin_lock_bh(>session->frwd_lock);
> 

Signed-off-by: Lee Duncan 

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop 

Re: [RFC] Add source address binding to iSER

2019-02-15 Thread The Lee-Man
On Wednesday, February 6, 2019 at 9:11:45 AM UTC-8, Nikhil Valluru wrote:
>
> Hi Robert,
>
> I am looking for a similar fix and wanted to know if this patch was 
> accepted into the linux kernel? 
>
> Thanks,
> Nikhil
>
>
As a quick look at linux upstream shows, this code is not in the kernel.

More over, looking at this old thread, I believe the original problem 
turned out to be a non-problem when Robert updated to the latest upstream.

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To post to this group, send email to open-iscsi@googlegroups.com.
Visit this group at https://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.


[RESEND PATCH] libiscsi: Fix race between iscsi_xmit_task and iscsi_complete_task

2019-02-15 Thread Bob Liu
From: Anoob Soman 

When a target sends Check Condition, whilst initiator is busy xmiting
re-queued data, could lead to race between iscsi_complete_task() and
iscsi_xmit_task() and eventually crashing with the following kernel
backtrace.

[3326150.987523] ALERT: BUG: unable to handle kernel NULL pointer dereference 
at 0078
[3326150.987549] ALERT: IP: [] iscsi_xmit_task+0x2d/0xc0 
[libiscsi]
[3326150.987571] WARN: PGD 569c8067 PUD 569c9067 PMD 0
[3326150.987582] WARN: Oops: 0002 [#1] SMP
[3326150.987593] WARN: Modules linked in: tun nfsv3 nfs fscache dm_round_robin
[3326150.987762] WARN: CPU: 2 PID: 8399 Comm: kworker/u32:1 Tainted: G O 
4.4.0+2 #1
[3326150.987774] WARN: Hardware name: Dell Inc. PowerEdge R720/0W7JN5, BIOS 
2.5.4 01/22/2016
[3326150.987790] WARN: Workqueue: iscsi_q_13 iscsi_xmitworker [libiscsi]
[3326150.987799] WARN: task: 8801d50f3800 ti: 8801f5458000 task.ti: 
8801f5458000
[3326150.987810] WARN: RIP: e030:[] [] 
iscsi_xmit_task+0x2d/0xc0 [libiscsi]
[3326150.987825] WARN: RSP: e02b:8801f545bdb0 EFLAGS: 00010246
[3326150.987831] WARN: RAX: ffc3 RBX: 880282d2ab20 RCX: 
88026b6ac480
[3326150.987842] WARN: RDX:  RSI: fe01 RDI: 
880282d2ab20
[3326150.987852] WARN: RBP: 8801f545bdc8 R08:  R09: 
0008
[3326150.987862] WARN: R10:  R11: fe88 R12: 

[3326150.987872] WARN: R13: 880282d2abe8 R14: 880282d2abd8 R15: 
880282d2ac08
[3326150.987890] WARN: FS: 7f5a866b4840() GS:88028a64() 
knlGS:
[3326150.987900] WARN: CS: e033 DS:  ES:  CR0: 80050033
[3326150.987907] WARN: CR2: 0078 CR3: 70244000 CR4: 
00042660
[3326150.987918] WARN: Stack:
[3326150.987924] WARN: 880282d2ad58 880282d2ab20 880282d2abe8 
8801f545be18
[3326150.987938] WARN: a05cea90 880282d2abf8 88026b59cc80 
88026b59cc00
[3326150.987951] WARN: 88022acf32c0 880289491800 880255a80800 
0400
[3326150.987964] WARN: Call Trace:
[3326150.987975] WARN: [] iscsi_xmitworker+0x2f0/0x360 
[libiscsi]
[3326150.987988] WARN: [] process_one_work+0x1fc/0x3b0
[3326150.987997] WARN: [] worker_thread+0x2a5/0x470
[3326150.988006] WARN: [] ? __schedule+0x648/0x870
[3326150.988015] WARN: [] ? rescuer_thread+0x300/0x300
[3326150.988023] WARN: [] kthread+0xd5/0xe0
[3326150.988031] WARN: [] ? kthread_stop+0x110/0x110
[3326150.988040] WARN: [] ret_from_fork+0x3f/0x70
[3326150.988048] WARN: [] ? kthread_stop+0x110/0x110
[3326150.988127] ALERT: RIP [] iscsi_xmit_task+0x2d/0xc0 
[libiscsi]
[3326150.988138] WARN: RSP 
[3326150.988144] WARN: CR2: 0078
[3326151.020366] WARN: ---[ end trace 1c60974d4678d81b ]---

Commit 6f8830f5bbab (scsi: libiscsi: add lock around task lists to fix list
corruption regression) introduced "taskqueuelock" to fix list corruption during
the race, but this wasn't enough.

Re-setting of conn->task to NULL, could race with iscsi_xmit_task().
iscsi_complete_task()
{

if (conn->task == task)
conn->task = NULL;
}

conn->task in iscsi_xmit_task() could be NULL and so will be task.
__iscsi_get_task(task) will crash (NullPtr de-ref), trying to access refcount.

iscsi_xmit_task()
{
struct iscsi_task *task = conn->task;

__iscsi_get_task(task);
}

This commit will take extra conn->session->back_lock in iscsi_xmit_task()
to ensure iscsi_xmit_task() waits for iscsi_complete_task(), if
iscsi_complete_task() wins the race.
If iscsi_xmit_task() wins the race, iscsi_xmit_task() increments task->refcount
(__iscsi_get_task) ensuring iscsi_complete_task() will not iscsi_free_task().

Signed-off-by: Anoob Soman 
Signed-off-by: Bob Liu 
---
 drivers/scsi/libiscsi.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index b8d325ce8754..120fc520f27a 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -1459,7 +1459,13 @@ static int iscsi_xmit_task(struct iscsi_conn *conn)
if (test_bit(ISCSI_SUSPEND_BIT, >suspend_tx))
return -ENODATA;
 
+   spin_lock_bh(>session->back_lock);
+   if (conn->task == NULL) {
+   spin_unlock_bh(>session->back_lock);
+   return -ENODATA;
+   }
__iscsi_get_task(task);
+   spin_unlock_bh(>session->back_lock);
spin_unlock_bh(>session->frwd_lock);
rc = conn->session->tt->xmit_task(task);
spin_lock_bh(>session->frwd_lock);
-- 
2.17.1

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To post to this group, send email to open-iscsi@googlegroups.com.
Visit this group at https://groups.google.com/group/open-iscsi.
For more options, visit 

Re: max_sectors_kb so small, throughout so small

2019-02-15 Thread Chan
Hello

I am having very similar issue.

Scenario 1 )When I mount a rbd volume directly using rbd map command to 
a linux kernel. I am getting good performance.
when I do dd on the above volume, I am getting 380 - 430 MB's. -- I am good 
with this.


Scenario 2 ) I am using iscsi-gateway. iscsi1 and iscs2 as targets and 
presenting LUNS ro all my initiators.


When I do dd on the volumes presented via iscsi gateway, my performance is 
10 to 25 MB's. which is very poor.

I am gwcli commandline and connot use the belwo command

"create images image=test1 size=100g max_data_area_mb=128

Unexpected keyword parameter 'max_data_area_mb'

I also cannot use targetcli on this server. How can I increase the 
max_data_area_mb  &&  hw_max_sectors sizes in my cluster ?

Thank you in advance 



On Friday, September 21, 2018 at 7:16:58 AM UTC-7, 3kbo...@gmail.com wrote:
>
> Thank you for your opinion of max_sectors_kb, Ulrich.
> It provided me some inspiration for understanding the whole thing.
>
> Thank you for your advice,  Mike.
> I will tune some parameters as you have mentioned below.
>
> I will share if I could make achievements in performance tuning.
>
>
> 在 2018年9月21日星期五 UTC+8上午1:43:14,Mike Christie写道:
>>
>> On 09/12/2018 09:26 PM, 3kbo...@gmail.com wrote: 
>> > Thank you for your reply, Mike. 
>> > Now my iscsi disk performance can be around 300MB/s in 4M sequence 
>> > write(TCMU+LIO) 
>> > It increase from 20MB/s to 300MB/s, after I can change max_data_area_mb 
>> > from 8 to 256 && hw_max_sectors from 128 to 8192. 
>> > To my cluster, after a lot of tests I found that I should keep 
>> > "max_data_area_mb>128 &&  hw_max_sectors>=4096" in order to get a good 
>> > performance. 
>> > Does my setting can cause some side effects? 
>>
>> It depends on the kernel. For the RHEL/Centos kernel you are using the 
>> kernel will preallocate max_data_area_mb of memory for each device. For 
>> upstream, we no longer preallocate, but once it is allocated we do not 
>> free the memory unless global_max_data_area_mb is hit or the device is 
>> removed. 
>>
>> With a high hw_max_sectors latency will increase due to sending really 
>> large commands, so it depends on your workload and what you need. 
>>
>> We used to set hw_max_sectors to the rbd object size (4MB by default), 
>> but in our testing we would see throughput go down around 512k - 1MB. 
>>
>> > Are there any other parameters can improve the performance quite 
>> obvious? 
>>
>> The normal networking ones like using jumbo frames, 
>> net.core.*/net.ipv4.*, etc. Check your nic's documentation for the best 
>> settings. 
>>
>> There are some iscsi ones like the cmdsn/cmds_max I mentioned and then 
>> also the segment related ones like MaxRecvDataSegmentLength, 
>> MaxXmitDataSegmentLength, MaxBurstLength, FirstBurstLength and have 
>> ImmediateData on. 
>>
>>
>> > Why the default value of max_data_area_mb and hw_max_sectors is so 
>> > small, and bad performance? 
>>
>> I do not know. It was just what the original author had used initially. 
>>
>> > Could you talk something about this? 
>> > At least, max_data_area_mb>128 &&  hw_max_sectors>=4096, I can get a 
>> > better performance seems to be acceptable. 
>> > If my settings can give other users some help, I will be happy. 
>> > 
>> > 在 2018年9月12日星期三 UTC+8上午12:39:16,Mike Christie写道: 
>> > 
>> > On 09/11/2018 11:30 AM, Mike Christie wrote: 
>> > > Hey, 
>> > > 
>> > > Cc mchr...@redhat.com , or I will not see these 
>> > messages until I check 
>> > > the list maybe once a week. 
>> > > 
>> > > On 09/05/2018 10:36 PM, 3kbo...@gmail.com  wrote: 
>> > >> What lio fabric driver are you using? iSCSI? What kernel 
>> > version 
>> > >> and 
>> > >> what version of tcmu-runner? 
>> > >> 
>> > >> io fabric driver :iscsi 
>> > >> 
>> > >> iscsid version:  2.0-873 
>> > >> 
>> > >> OS version:  CentOS Linux release 7.5.1804 
>> > (Core) 
>> > >> 
>> > >> kernel version:  3.10.0-862.el7.x86_64 
>> > >> 
>> > >> tcmu-runner version:1.4.0-rc1 
>> > >> 
>> > >> 
>> > 
>> > There is also a perf bug in that initiator if the 
>> node.session.cmds_max 
>> > is greater than the LIO default_cmdsn_depth and your IO test tries 
>> to 
>> > send cmds > node.session.cmds_max. 
>> > 
>> > I have known the bug before, because I had google a lot. 
>> > It increase from 320MB/s to 340MB/s (4M seq write), seems  like a 
>> stable 
>> > promotion. 
>> > 
>> > Settings Before: 320MB/s 
>> >  node.session.cmds_max = 2048 
>> >  default_cmdsn_depth = 64 
>> > 
>> > Settings After: 340MB/s 
>> >  node.session.cmds_max = 64 
>> >  default_cmdsn_depth = 64 
>> > 
>> > So set the node.session.cmds_max and default_cmdsn_depth to the 
>> same 
>> > value. You can set the default_cmdsn_depth in saveconfig.json, and 
>> set 
>> >