Re: [PATCH 1/1] iscsi: fix regression caused by session lock patch

2017-02-06 Thread Sagi Grimberg

Hey Chris and Guilherme,

I'm indeed not responsive under this email address.


Thanks for the testing, looks like you have the magic target to reproduce this.

I think this verifies what Mike's idea of what was going wrong, and we're way 
overdue to get this fixed upstream.  Thanks to IBM for pushing this, I don't 
think any major distro is shipping this patch and we don't want to keep having 
to back it out.

The options look like
1) back out the session lock changes that split it into two locks
2) add in the additional locking from this test patch
3) some other fix for the issue of targets that complete tasks oddly

I'm leaning to #1, as I don't want to keep adding more locks for this.


Thanks Chris! IIRC, the lock changes from Shlomo/Or are not on RHEL,
SLES and Ubuntu anymore, as you mentioned. We requested them to revert
the patch, and it was accepted.

On the other hand, your patch is great and a cool fix to this. If we
have any good numbers and/or reasons to keep their patch, guess the
alternative #2 is cool too. I can perform more testing if you plan to
send this (or similar) patch to iscsi list.



Sagi, Or, Shlomo?  You pushed to keep this from being backed out before.  
Here's your cause, any better ideas on fixing it?  I also tried to go back in 
the mailing list archives, but I don't see any real numbers for the performance 
gains.


I'll loop Sagi here based on the email I see he's using on NVMe list
currently - seems it's different from the one showed in the header of
this message.


IIRC, this was brought up more than two years ago? it's been
a while now.

The motivation for the fined grained locking from Shlomo was
designed to address the submission/completion inter-locking
scheme that was not needed for iser.

In iser, task completions are triggered from soft-irq only for
task responses, the data-transfer is driven in HW, so we don't need
the inter-locking between submissions and task management or error
handling.

My recollection is that this scheme solved a contention point we had
back then, if I'm not mistaken it was as much as 50% improvement in
IOPs scalability in some scenarios.

Now, this was all pre block-mq. So I think the correct solution for
iscsi (iser, tcp and offloads) is to use block-mq facilities for
task pre-allocations (scsi host tagset) and have iscsi tcp take care
of it's own locking instead of imposing it inherently in libiscsi.

We can have LOGIN, LOGOUT, NOOP_OUT, TEXT, TMR as reserved tags,
and queue_depth with max session cmds. I had a prototype for that
back when I experimented with scsi-mq conversion (way back...),
but kinda got stuck with trying to figure out how to convert the
offload drivers qla4xxx, bnx2i and cxgbi which seemed to rely heavily
on on the task pools.

If people are more interested in improving iscsi locking schemes we
can discuss on approaches for it.

--
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To post to this group, send email to open-iscsi@googlegroups.com.
Visit this group at https://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.


Re: [PATCH 1/1] iscsi: fix regression caused by session lock patch

2017-02-06 Thread Guilherme G. Piccoli
On 06/02/2017 15:27, Chris Leech wrote:
> - Original Message -
>> On 09/11/2016 03:21, Chris Leech wrote:
>>> On Mon, Nov 07, 2016 at 04:23:10PM -0200, Guilherme G. Piccoli wrote:

 Sure! Count on us to test any patches. I guess the first step is to
 reproduce on upstream right? We haven't tested specifically this
 scenario for long time. Will try to reproduce on 4.9-rc4 and update here.
>>>
>>> Great, I'm looking forward to hearing the result.
>>>
>>> Assuming it reproduces, I don't think this level of fine grained locking
>>> is necessarily the best solution, but it may help confirm the problem.
>>> Especially if the WARN_ONCE I slipped in here triggers.
>>
>> Chris, sorry for my huge delay.
>> Finally I was able to perform tests and I have good news - seems your
>> patch below fixed the issue.
> 
> Thanks for the testing, looks like you have the magic target to reproduce 
> this.
> 
> I think this verifies what Mike's idea of what was going wrong, and we're way 
> overdue to get this fixed upstream.  Thanks to IBM for pushing this, I don't 
> think any major distro is shipping this patch and we don't want to keep 
> having to back it out.
> 
> The options look like
> 1) back out the session lock changes that split it into two locks
> 2) add in the additional locking from this test patch
> 3) some other fix for the issue of targets that complete tasks oddly
> 
> I'm leaning to #1, as I don't want to keep adding more locks for this.

Thanks Chris! IIRC, the lock changes from Shlomo/Or are not on RHEL,
SLES and Ubuntu anymore, as you mentioned. We requested them to revert
the patch, and it was accepted.

On the other hand, your patch is great and a cool fix to this. If we
have any good numbers and/or reasons to keep their patch, guess the
alternative #2 is cool too. I can perform more testing if you plan to
send this (or similar) patch to iscsi list.


> Sagi, Or, Shlomo?  You pushed to keep this from being backed out before.  
> Here's your cause, any better ideas on fixing it?  I also tried to go back in 
> the mailing list archives, but I don't see any real numbers for the 
> performance gains.

I'll loop Sagi here based on the email I see he's using on NVMe list
currently - seems it's different from the one showed in the header of
this message.


Thanks,



Guilherme

> 
> - Chris
> 
>> Firstly, I was able to reproduce with kernel 4.10-rc6. See the file
>> repro.out - it's a dump from xmon, the kernel debugger from PowerPC.
>> With this tool we can dump the exception details, registers, PACA
>> (Processor Address Communication Area, ppc specific structure) and
>> dmesg. It took me less than 15 minutes to reproduce.
>>
>> Then, I applied your patch on the top of this kernel and the benchmark
>> was able to successfully complete, in about 3 hours. We can see the
>> WARN() you added was reached, the attached file dmesg-cleech_patch shows
>> the kernel log with your patch.
>>
>> The workload was FIO benchmark doing both reads and writes to the remote
>> storage via iSCSI, connected over ethernet direct cabling (10Gb speed).
>> Distro was Ubuntu 16.04.1 .
>>
>> Any more tests or info you need, please let me know!
>> Cheers,
>>
>>
>> Guilherme
>>
>>
>>> - Chris
>>>
>>> ---
>>>
>>> diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
>>> index f9b6fba..fbd18ab 100644
>>> --- a/drivers/scsi/libiscsi.c
>>> +++ b/drivers/scsi/libiscsi.c
>>> @@ -560,8 +560,12 @@ static void iscsi_complete_task(struct iscsi_task
>>> *task, int state)
>>> WARN_ON_ONCE(task->state == ISCSI_TASK_FREE);
>>> task->state = state;
>>>
>>> -   if (!list_empty(>running))
>>> +   spin_lock_bh(>taskqueuelock);
>>> +   if (!list_empty(>running)) {
>>> +   WARN_ONCE(1, "iscsi_complete_task while task on list");
>>> list_del_init(>running);
>>> +   }
>>> +   spin_unlock_bh(>taskqueuelock);
>>>
>>> if (conn->task == task)
>>> conn->task = NULL;
>>> @@ -783,7 +787,9 @@ __iscsi_conn_send_pdu(struct iscsi_conn *conn, struct
>>> iscsi_hdr *hdr,
>>> if (session->tt->xmit_task(task))
>>> goto free_task;
>>> } else {
>>> +   spin_lock_bh(>taskqueuelock);
>>> list_add_tail(>running, >mgmtqueue);
>>> +   spin_unlock_bh(>taskqueuelock);
>>> iscsi_conn_queue_work(conn);
>>> }
>>>
>>> @@ -1474,8 +1480,10 @@ void iscsi_requeue_task(struct iscsi_task *task)
>>>  * this may be on the requeue list already if the xmit_task callout
>>>  * is handling the r2ts while we are adding new ones
>>>  */
>>> +   spin_lock_bh(>taskqueuelock);
>>> if (list_empty(>running))
>>> list_add_tail(>running, >requeue);
>>> +   spin_unlock_bh(>taskqueuelock);
>>> iscsi_conn_queue_work(conn);
>>>  }
>>>  EXPORT_SYMBOL_GPL(iscsi_requeue_task);
>>> @@ -1512,22 +1520,26 @@ static int iscsi_data_xmit(struct iscsi_conn *conn)
>>>  * only have one nop-out as a ping from us and targets should not

Re: [PATCH 1/1] iscsi: fix regression caused by session lock patch

2017-02-06 Thread Chris Leech
- Original Message -
> On 09/11/2016 03:21, Chris Leech wrote:
> > On Mon, Nov 07, 2016 at 04:23:10PM -0200, Guilherme G. Piccoli wrote:
> >>
> >> Sure! Count on us to test any patches. I guess the first step is to
> >> reproduce on upstream right? We haven't tested specifically this
> >> scenario for long time. Will try to reproduce on 4.9-rc4 and update here.
> > 
> > Great, I'm looking forward to hearing the result.
> > 
> > Assuming it reproduces, I don't think this level of fine grained locking
> > is necessarily the best solution, but it may help confirm the problem.
> > Especially if the WARN_ONCE I slipped in here triggers.
> 
> Chris, sorry for my huge delay.
> Finally I was able to perform tests and I have good news - seems your
> patch below fixed the issue.

Thanks for the testing, looks like you have the magic target to reproduce this.

I think this verifies what Mike's idea of what was going wrong, and we're way 
overdue to get this fixed upstream.  Thanks to IBM for pushing this, I don't 
think any major distro is shipping this patch and we don't want to keep having 
to back it out.

The options look like
1) back out the session lock changes that split it into two locks
2) add in the additional locking from this test patch
3) some other fix for the issue of targets that complete tasks oddly

I'm leaning to #1, as I don't want to keep adding more locks for this.

Sagi, Or, Shlomo?  You pushed to keep this from being backed out before.  
Here's your cause, any better ideas on fixing it?  I also tried to go back in 
the mailing list archives, but I don't see any real numbers for the performance 
gains.

- Chris

> Firstly, I was able to reproduce with kernel 4.10-rc6. See the file
> repro.out - it's a dump from xmon, the kernel debugger from PowerPC.
> With this tool we can dump the exception details, registers, PACA
> (Processor Address Communication Area, ppc specific structure) and
> dmesg. It took me less than 15 minutes to reproduce.
> 
> Then, I applied your patch on the top of this kernel and the benchmark
> was able to successfully complete, in about 3 hours. We can see the
> WARN() you added was reached, the attached file dmesg-cleech_patch shows
> the kernel log with your patch.
> 
> The workload was FIO benchmark doing both reads and writes to the remote
> storage via iSCSI, connected over ethernet direct cabling (10Gb speed).
> Distro was Ubuntu 16.04.1 .
> 
> Any more tests or info you need, please let me know!
> Cheers,
> 
> 
> Guilherme
> 
> 
> > - Chris
> > 
> > ---
> > 
> > diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
> > index f9b6fba..fbd18ab 100644
> > --- a/drivers/scsi/libiscsi.c
> > +++ b/drivers/scsi/libiscsi.c
> > @@ -560,8 +560,12 @@ static void iscsi_complete_task(struct iscsi_task
> > *task, int state)
> > WARN_ON_ONCE(task->state == ISCSI_TASK_FREE);
> > task->state = state;
> > 
> > -   if (!list_empty(>running))
> > +   spin_lock_bh(>taskqueuelock);
> > +   if (!list_empty(>running)) {
> > +   WARN_ONCE(1, "iscsi_complete_task while task on list");
> > list_del_init(>running);
> > +   }
> > +   spin_unlock_bh(>taskqueuelock);
> > 
> > if (conn->task == task)
> > conn->task = NULL;
> > @@ -783,7 +787,9 @@ __iscsi_conn_send_pdu(struct iscsi_conn *conn, struct
> > iscsi_hdr *hdr,
> > if (session->tt->xmit_task(task))
> > goto free_task;
> > } else {
> > +   spin_lock_bh(>taskqueuelock);
> > list_add_tail(>running, >mgmtqueue);
> > +   spin_unlock_bh(>taskqueuelock);
> > iscsi_conn_queue_work(conn);
> > }
> > 
> > @@ -1474,8 +1480,10 @@ void iscsi_requeue_task(struct iscsi_task *task)
> >  * this may be on the requeue list already if the xmit_task callout
> >  * is handling the r2ts while we are adding new ones
> >  */
> > +   spin_lock_bh(>taskqueuelock);
> > if (list_empty(>running))
> > list_add_tail(>running, >requeue);
> > +   spin_unlock_bh(>taskqueuelock);
> > iscsi_conn_queue_work(conn);
> >  }
> >  EXPORT_SYMBOL_GPL(iscsi_requeue_task);
> > @@ -1512,22 +1520,26 @@ static int iscsi_data_xmit(struct iscsi_conn *conn)
> >  * only have one nop-out as a ping from us and targets should not
> >  * overflow us with nop-ins
> >  */
> > +   spin_lock_bh(>taskqueuelock);
> >  check_mgmt:
> > while (!list_empty(>mgmtqueue)) {
> > conn->task = list_entry(conn->mgmtqueue.next,
> >  struct iscsi_task, running);
> > list_del_init(>task->running);
> > +   spin_unlock_bh(>taskqueuelock);
> > if (iscsi_prep_mgmt_task(conn, conn->task)) {
> > /* regular RX path uses back_lock */
> > spin_lock_bh(>session->back_lock);
> > __iscsi_put_task(conn->task);
> > spin_unlock_bh(>session->back_lock);
> > conn->task =