Re: General protection fault in iscsi_rx_thread_pre_handler

2015-02-16 Thread Gavin Guo
Hi Nicholas,

On Mon, Feb 16, 2015 at 6:52 PM, Gavin Guo gavin@canonical.com wrote:
 Hi Nicholas,

 On Thu, Feb 12, 2015 at 3:16 PM, Nicholas A. Bellinger
 n...@linux-iscsi.org wrote:
 Hi Gavin,

 On Tue, 2015-02-03 at 08:28 +0800, Gavin Guo wrote:
 Hi Nicholas,

 On Sun, Feb 1, 2015 at 11:47 AM, Gavin Guo gavin@canonical.com wrote:
  Hi Nicholas,
 
  On Sat, Jan 31, 2015 at 6:53 AM, Nicholas A. Bellinger
  n...@linux-iscsi.org wrote:
  On Fri, 2015-01-23 at 09:30 +0800, Gavin Guo wrote:
  Hi Nicholas,
 
  On Fri, Jan 23, 2015 at 1:35 AM, Nicholas A. Bellinger
  n...@linux-iscsi.org wrote:
   On Thu, 2015-01-22 at 23:56 +0800, Gavin Guo wrote:
   Hi Nicolas,
  
   On Thu, Jan 22, 2015 at 5:50 PM, Nicholas A. Bellinger

 SNIP

At the time, a different set of iser-target related changes ended 
up
avoiding this issue on his particular setup, so we thought it was 
likely
a race triggered by login failures specific to iser-target code.
   
There was a untested patch (included inline below) to drop the 
legacy
active_ts_list usage all-together, but IIRC he was not able to 
reproduce
further so the patch didn't get picked up for mainline.
   
If your able to reliability reproduce, please try with the 
following
patch and let us know your progress.
  
   Thanks for your time reading the mail. I'll let you know the result.
  
   Just curious, are you able to reliability reproduce this bug in a 
   VM..?
 
  Thanks for your caring, the machine is on the customer side, I've
  asked and now waiting for their response.
 
  Hi Gavin,
 
  Just curious if there has been any update on this yet..?
 
  --nab
 
 
  Really thanks for your attention. I'm also currently waiting for the
  customer's reply and will send the email again to ask for the result.
  However, I think the symptom may be hard to replicate that's why the
  customer didn't reply me for a long time. Thanks for your time again.
 
  Thanks,
  Gavin

 Sorry for making you wait so long. I just got the response from the
 customer, they said the general protection fault happened just 2 times
 in the past and cannot be reliably reproduced. And I am now waiting
 for the verification test.


 Just a heads up that I'm planning to include this patch in the v3.20-rc1
 PULL request.

 Please let me know if you have any objections.

 Thank you,

 --nab


 The bug

Sorry, I mistakenly press the send button last time.

The bug doesn't appear after the customer upgraded the kernel with the
patch. Really thanks for your help. I'll keep you posted if the bug
appears again.

Thanks,
Gavin
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: General protection fault in iscsi_rx_thread_pre_handler

2015-02-11 Thread Nicholas A. Bellinger
Hi Gavin,

On Tue, 2015-02-03 at 08:28 +0800, Gavin Guo wrote:
 Hi Nicholas,
 
 On Sun, Feb 1, 2015 at 11:47 AM, Gavin Guo gavin@canonical.com wrote:
  Hi Nicholas,
 
  On Sat, Jan 31, 2015 at 6:53 AM, Nicholas A. Bellinger
  n...@linux-iscsi.org wrote:
  On Fri, 2015-01-23 at 09:30 +0800, Gavin Guo wrote:
  Hi Nicholas,
 
  On Fri, Jan 23, 2015 at 1:35 AM, Nicholas A. Bellinger
  n...@linux-iscsi.org wrote:
   On Thu, 2015-01-22 at 23:56 +0800, Gavin Guo wrote:
   Hi Nicolas,
  
   On Thu, Jan 22, 2015 at 5:50 PM, Nicholas A. Bellinger

SNIP

At the time, a different set of iser-target related changes ended up
avoiding this issue on his particular setup, so we thought it was 
likely
a race triggered by login failures specific to iser-target code.
   
There was a untested patch (included inline below) to drop the legacy
active_ts_list usage all-together, but IIRC he was not able to 
reproduce
further so the patch didn't get picked up for mainline.
   
If your able to reliability reproduce, please try with the following
patch and let us know your progress.
  
   Thanks for your time reading the mail. I'll let you know the result.
  
   Just curious, are you able to reliability reproduce this bug in a VM..?
 
  Thanks for your caring, the machine is on the customer side, I've
  asked and now waiting for their response.
 
  Hi Gavin,
 
  Just curious if there has been any update on this yet..?
 
  --nab
 
 
  Really thanks for your attention. I'm also currently waiting for the
  customer's reply and will send the email again to ask for the result.
  However, I think the symptom may be hard to replicate that's why the
  customer didn't reply me for a long time. Thanks for your time again.
 
  Thanks,
  Gavin
 
 Sorry for making you wait so long. I just got the response from the
 customer, they said the general protection fault happened just 2 times
 in the past and cannot be reliably reproduced. And I am now waiting
 for the verification test.
 

Just a heads up that I'm planning to include this patch in the v3.20-rc1
PULL request.

Please let me know if you have any objections.

Thank you,

--nab

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: General protection fault in iscsi_rx_thread_pre_handler

2015-02-02 Thread Gavin Guo
Hi Nicholas,

On Sun, Feb 1, 2015 at 11:47 AM, Gavin Guo gavin@canonical.com wrote:
 Hi Nicholas,

 On Sat, Jan 31, 2015 at 6:53 AM, Nicholas A. Bellinger
 n...@linux-iscsi.org wrote:
 On Fri, 2015-01-23 at 09:30 +0800, Gavin Guo wrote:
 Hi Nicholas,

 On Fri, Jan 23, 2015 at 1:35 AM, Nicholas A. Bellinger
 n...@linux-iscsi.org wrote:
  On Thu, 2015-01-22 at 23:56 +0800, Gavin Guo wrote:
  Hi Nicolas,
 
  On Thu, Jan 22, 2015 at 5:50 PM, Nicholas A. Bellinger
  n...@linux-iscsi.org wrote:
   Hi Gavin,
  
   On Thu, 2015-01-22 at 06:38 +0800, Gavin Guo wrote:
   Hi all,
  
   The general protection fault screenshot is attached.
  
   Summary:
   The kernel is Ubuntu-3.13.0-39.66. I've done basic analysis and found
   the fault is in list_del of iscsi_del_ts_from_active_list. And it
   looks like deleting the iscsi_thread_set *ts two times. The point to
   delete including iscsi_get_ts_from_inactive_list, was also checked but
   still can't find the clue. Really appreciate if anyone can provide any
   idea on the bug.
  
 
  SNIP
 
  
   Thanks for your detailed analysis.
  
   A similar bug was reported off-list some months back by a person using
   iser-target + RoCE export on v3.12.y code.  Just to confirm, your
   environment is using traditional iscsi-target + TCP export, right..?
 
  I am sorry that I'm not an expert of the field and already google RoCE
  on the internet but still don't really know what RoCE is. However, I
  can provide the informations.  We used iscsiadm on the initiator side
  and lio_node and tcm_node commands to create the targets for
  connection. I think it should be normal  iscsi-target using TCP
  export.
 
 
  Yep, that would be traditional iscsi-target + TCP export.
 
  
   At the time, a different set of iser-target related changes ended up
   avoiding this issue on his particular setup, so we thought it was 
   likely
   a race triggered by login failures specific to iser-target code.
  
   There was a untested patch (included inline below) to drop the legacy
   active_ts_list usage all-together, but IIRC he was not able to 
   reproduce
   further so the patch didn't get picked up for mainline.
  
   If your able to reliability reproduce, please try with the following
   patch and let us know your progress.
 
  Thanks for your time reading the mail. I'll let you know the result.
 
  Just curious, are you able to reliability reproduce this bug in a VM..?

 Thanks for your caring, the machine is on the customer side, I've
 asked and now waiting for their response.

 Hi Gavin,

 Just curious if there has been any update on this yet..?

 --nab


 Really thanks for your attention. I'm also currently waiting for the
 customer's reply and will send the email again to ask for the result.
 However, I think the symptom may be hard to replicate that's why the
 customer didn't reply me for a long time. Thanks for your time again.

 Thanks,
 Gavin

Sorry for making you wait so long. I just got the response from the
customer, they said the general protection fault happened just 2 times
in the past and cannot be reliably reproduced. And I am now waiting
for the verification test.

Thanks,
Gavin
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: General protection fault in iscsi_rx_thread_pre_handler

2015-01-31 Thread Gavin Guo
Hi Nicholas,

On Sat, Jan 31, 2015 at 6:53 AM, Nicholas A. Bellinger
n...@linux-iscsi.org wrote:
 On Fri, 2015-01-23 at 09:30 +0800, Gavin Guo wrote:
 Hi Nicholas,

 On Fri, Jan 23, 2015 at 1:35 AM, Nicholas A. Bellinger
 n...@linux-iscsi.org wrote:
  On Thu, 2015-01-22 at 23:56 +0800, Gavin Guo wrote:
  Hi Nicolas,
 
  On Thu, Jan 22, 2015 at 5:50 PM, Nicholas A. Bellinger
  n...@linux-iscsi.org wrote:
   Hi Gavin,
  
   On Thu, 2015-01-22 at 06:38 +0800, Gavin Guo wrote:
   Hi all,
  
   The general protection fault screenshot is attached.
  
   Summary:
   The kernel is Ubuntu-3.13.0-39.66. I've done basic analysis and found
   the fault is in list_del of iscsi_del_ts_from_active_list. And it
   looks like deleting the iscsi_thread_set *ts two times. The point to
   delete including iscsi_get_ts_from_inactive_list, was also checked but
   still can't find the clue. Really appreciate if anyone can provide any
   idea on the bug.
  
 
  SNIP
 
  
   Thanks for your detailed analysis.
  
   A similar bug was reported off-list some months back by a person using
   iser-target + RoCE export on v3.12.y code.  Just to confirm, your
   environment is using traditional iscsi-target + TCP export, right..?
 
  I am sorry that I'm not an expert of the field and already google RoCE
  on the internet but still don't really know what RoCE is. However, I
  can provide the informations.  We used iscsiadm on the initiator side
  and lio_node and tcm_node commands to create the targets for
  connection. I think it should be normal  iscsi-target using TCP
  export.
 
 
  Yep, that would be traditional iscsi-target + TCP export.
 
  
   At the time, a different set of iser-target related changes ended up
   avoiding this issue on his particular setup, so we thought it was likely
   a race triggered by login failures specific to iser-target code.
  
   There was a untested patch (included inline below) to drop the legacy
   active_ts_list usage all-together, but IIRC he was not able to reproduce
   further so the patch didn't get picked up for mainline.
  
   If your able to reliability reproduce, please try with the following
   patch and let us know your progress.
 
  Thanks for your time reading the mail. I'll let you know the result.
 
  Just curious, are you able to reliability reproduce this bug in a VM..?

 Thanks for your caring, the machine is on the customer side, I've
 asked and now waiting for their response.

 Hi Gavin,

 Just curious if there has been any update on this yet..?

 --nab


Really thanks for your attention. I'm also currently waiting for the
customer's reply and will send the email again to ask for the result.
However, I think the symptom may be hard to replicate that's why the
customer didn't reply me for a long time. Thanks for your time again.

Thanks,
Gavin
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: General protection fault in iscsi_rx_thread_pre_handler

2015-01-30 Thread Nicholas A. Bellinger
On Fri, 2015-01-23 at 09:30 +0800, Gavin Guo wrote:
 Hi Nicholas,
 
 On Fri, Jan 23, 2015 at 1:35 AM, Nicholas A. Bellinger
 n...@linux-iscsi.org wrote:
  On Thu, 2015-01-22 at 23:56 +0800, Gavin Guo wrote:
  Hi Nicolas,
 
  On Thu, Jan 22, 2015 at 5:50 PM, Nicholas A. Bellinger
  n...@linux-iscsi.org wrote:
   Hi Gavin,
  
   On Thu, 2015-01-22 at 06:38 +0800, Gavin Guo wrote:
   Hi all,
  
   The general protection fault screenshot is attached.
  
   Summary:
   The kernel is Ubuntu-3.13.0-39.66. I've done basic analysis and found
   the fault is in list_del of iscsi_del_ts_from_active_list. And it
   looks like deleting the iscsi_thread_set *ts two times. The point to
   delete including iscsi_get_ts_from_inactive_list, was also checked but
   still can't find the clue. Really appreciate if anyone can provide any
   idea on the bug.
  
 
  SNIP
 
  
   Thanks for your detailed analysis.
  
   A similar bug was reported off-list some months back by a person using
   iser-target + RoCE export on v3.12.y code.  Just to confirm, your
   environment is using traditional iscsi-target + TCP export, right..?
 
  I am sorry that I'm not an expert of the field and already google RoCE
  on the internet but still don't really know what RoCE is. However, I
  can provide the informations.  We used iscsiadm on the initiator side
  and lio_node and tcm_node commands to create the targets for
  connection. I think it should be normal  iscsi-target using TCP
  export.
 
 
  Yep, that would be traditional iscsi-target + TCP export.
 
  
   At the time, a different set of iser-target related changes ended up
   avoiding this issue on his particular setup, so we thought it was likely
   a race triggered by login failures specific to iser-target code.
  
   There was a untested patch (included inline below) to drop the legacy
   active_ts_list usage all-together, but IIRC he was not able to reproduce
   further so the patch didn't get picked up for mainline.
  
   If your able to reliability reproduce, please try with the following
   patch and let us know your progress.
 
  Thanks for your time reading the mail. I'll let you know the result.
 
  Just curious, are you able to reliability reproduce this bug in a VM..?
 
 Thanks for your caring, the machine is on the customer side, I've
 asked and now waiting for their response.

Hi Gavin,

Just curious if there has been any update on this yet..?

--nab

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: General protection fault in iscsi_rx_thread_pre_handler

2015-01-22 Thread Nicholas A. Bellinger
Hi Gavin,

On Thu, 2015-01-22 at 06:38 +0800, Gavin Guo wrote:
 Hi all,
 
 The general protection fault screenshot is attached.
 
 Summary:
 The kernel is Ubuntu-3.13.0-39.66. I've done basic analysis and found
 the fault is in list_del of iscsi_del_ts_from_active_list. And it
 looks like deleting the iscsi_thread_set *ts two times. The point to
 delete including iscsi_get_ts_from_inactive_list, was also checked but
 still can't find the clue. Really appreciate if anyone can provide any
 idea on the bug.
 
 static void iscsi_del_ts_from_active_list(struct iscsi_thread_set *ts)
 {
 ...
 list_del(ts-ts_list);
 ...
 }
 static inline void list_del(struct list_head *entry)
 {
 __list_del(entry-prev, entry-next);
 entry-next = LIST_POISON1;
 entry-prev = LIST_POISON2;
 }
 
 
 static inline void __list_del(struct list_head * prev, struct list_head * 
 next)
 {
 next-prev = prev;
 prev-next = next;
 }
 
 According coredump is trace3.png. The %rdx is ts-ts_list-next
 (0xdead00100100, LIST_POISON1), %rax is ts-ts_list-prev
 (0xdead00200200, LIST_POISON2). When the “next-prev = prev;”
 executes, it’s the instruction:
 
 48 89 42 08 mov %rax,0x8(%rdx)
 
 The %rdx is the value (0xdead00100100, LIST_POISON1). So, general
 protection fault happened. List_del() is the one of the only three
 points to set LIST_POISON1/2. The other two are hlist_bl_del() and
 hlist_del(). The root cause has high possibility related to calling
 __list_del() twice for deleting the ts-ts_list.
 
 Detailed analysis:
 
 57a0 iscsi_del_ts_from_active_list:
 __list_del():
 /build/buildd/linux-3.13.0/drivers/target/iscsi/iscsi_target_tq.c:50
 57a0: e8 00 00 00 00 callq 57a5 iscsi_del_ts_from_active_list+0
 x5
 list_del():
 57a5: 55 push %rbp
 57a6: 48 89 e5 mov %rsp,%rbp
 57a9: 53 push %rbx
 57aa: 48 89 fb mov %rdi,%rbx --iscsi_thread_set *ts
 /build/buildd/linux-3.13.0/include/linux/spinlock.h:293
 57ad: 48 c7 c7 00 00 00 00 mov $0x0,%rdi
 57b4: e8 00 00 00 00 callq 57b9 iscsi_del_ts_from_active_list+0
 x19
 
 __list_del(entry-prev, entry-next);
 
 /build/buildd/linux-3.13.0/include/linux/list.h:106
 57b9: 48 8b 83 c8 00 00 00 mov 0xc8(%rbx),%rax --ts-ts_list-prev
 57c0: 48 8b 93 c0 00 00 00 mov 0xc0(%rbx),%rdx --ts-ts_list-next
 iscsi_del_ts_from_active_list():
 /build/buildd/linux-3.13.0/include/linux/spinlock.h:333
 57c7: 48 c7 c7 00 00 00 00 mov $0x0,%rdi
 /build/buildd/linux-3.13.0/include/linux/list.h:88
 57ce: 48 89 42 08 mov %rax,0x8(%rdx) ts-ts_list-next-prev = 
 ts-ts_list-prev
 spin_unlock():
 /build/buildd/linux-3.13.0/include/linux/list.h:89
 57d2: 48 89 10 mov %rdx,(%rax) ts-ts_list-prev-next = ts-ts_list-next
 
 entry-next = LIST_POISON1;
 
 /build/buildd/linux-3.13.0/include/linux/list.h:107
 57d5: 48 b8 00 01 10 00 00 movabs $0xdead00100100,%rax
 57dc: 00 ad de
 iscsi_del_ts_from_active_list():
 57df: 48 89 83 c0 00 00 00 mov %rax,0xc0(%rbx)
 
 entry-prev = LIST_POISON2;
 
 iscsi_deallocate_thread_one():
 /build/buildd/linux-3.13.0/include/linux/list.h:108
 57e6: 48 b8 00 02 20 00 00 movabs $0xdead00200200,%rax
 57ed: 00 ad de
 57f0: 48 89 83 c8 00 00 00 mov %rax,0xc8(%rbx)
 

Thanks for your detailed analysis.

A similar bug was reported off-list some months back by a person using
iser-target + RoCE export on v3.12.y code.  Just to confirm, your
environment is using traditional iscsi-target + TCP export, right..?

At the time, a different set of iser-target related changes ended up
avoiding this issue on his particular setup, so we thought it was likely
a race triggered by login failures specific to iser-target code.

There was a untested patch (included inline below) to drop the legacy
active_ts_list usage all-together, but IIRC he was not able to reproduce
further so the patch didn't get picked up for mainline.

If your able to reliability reproduce, please try with the following
patch and let us know your progress.

Thank you,

--nab

From 33f211fcf0f4149b13de826dcbe204241f71b2e8 Mon Sep 17 00:00:00 2001
From: Nicholas Bellinger n...@linux-iscsi.org
Date: Thu, 22 Jan 2015 00:56:53 -0800
Subject: [PATCH] iscsi-target: Drop problematic active_ts_list usage

Signed-off-by: Nicholas Bellinger n...@linux-iscsi.org
---
 drivers/target/iscsi/iscsi_target_tq.c | 28 +---
 1 file changed, 5 insertions(+), 23 deletions(-)

diff --git a/drivers/target/iscsi/iscsi_target_tq.c 
b/drivers/target/iscsi/iscsi_target_tq.c
index 601e9cc..bb2890e 100644
--- a/drivers/target/iscsi/iscsi_target_tq.c
+++ b/drivers/target/iscsi/iscsi_target_tq.c
@@ -24,36 +24,22 @@
 #include iscsi_target_tq.h
 #include iscsi_target.h
 
-static LIST_HEAD(active_ts_list);
 static LIST_HEAD(inactive_ts_list);
-static DEFINE_SPINLOCK(active_ts_lock);
 static DEFINE_SPINLOCK(inactive_ts_lock);
 static DEFINE_SPINLOCK(ts_bitmap_lock);
 
-static void iscsi_add_ts_to_active_list(struct iscsi_thread_set *ts)
-{
-   spin_lock(active_ts_lock);
-   list_add_tail(ts-ts_list, active_ts_list);
-   

Re: General protection fault in iscsi_rx_thread_pre_handler

2015-01-22 Thread Gavin Guo
Hi Nicolas,

On Thu, Jan 22, 2015 at 5:50 PM, Nicholas A. Bellinger
n...@linux-iscsi.org wrote:
 Hi Gavin,

 On Thu, 2015-01-22 at 06:38 +0800, Gavin Guo wrote:
 Hi all,

 The general protection fault screenshot is attached.

 Summary:
 The kernel is Ubuntu-3.13.0-39.66. I've done basic analysis and found
 the fault is in list_del of iscsi_del_ts_from_active_list. And it
 looks like deleting the iscsi_thread_set *ts two times. The point to
 delete including iscsi_get_ts_from_inactive_list, was also checked but
 still can't find the clue. Really appreciate if anyone can provide any
 idea on the bug.

 static void iscsi_del_ts_from_active_list(struct iscsi_thread_set *ts)
 {
 ...
 list_del(ts-ts_list);
 ...
 }
 static inline void list_del(struct list_head *entry)
 {
 __list_del(entry-prev, entry-next);
 entry-next = LIST_POISON1;
 entry-prev = LIST_POISON2;
 }


 static inline void __list_del(struct list_head * prev, struct list_head * 
 next)
 {
 next-prev = prev;
 prev-next = next;
 }

 According coredump is trace3.png. The %rdx is ts-ts_list-next
 (0xdead00100100, LIST_POISON1), %rax is ts-ts_list-prev
 (0xdead00200200, LIST_POISON2). When the “next-prev = prev;”
 executes, it’s the instruction:

 48 89 42 08 mov %rax,0x8(%rdx)

 The %rdx is the value (0xdead00100100, LIST_POISON1). So, general
 protection fault happened. List_del() is the one of the only three
 points to set LIST_POISON1/2. The other two are hlist_bl_del() and
 hlist_del(). The root cause has high possibility related to calling
 __list_del() twice for deleting the ts-ts_list.

 Detailed analysis:

 57a0 iscsi_del_ts_from_active_list:
 __list_del():
 /build/buildd/linux-3.13.0/drivers/target/iscsi/iscsi_target_tq.c:50
 57a0: e8 00 00 00 00 callq 57a5 iscsi_del_ts_from_active_list+0
 x5
 list_del():
 57a5: 55 push %rbp
 57a6: 48 89 e5 mov %rsp,%rbp
 57a9: 53 push %rbx
 57aa: 48 89 fb mov %rdi,%rbx --iscsi_thread_set *ts
 /build/buildd/linux-3.13.0/include/linux/spinlock.h:293
 57ad: 48 c7 c7 00 00 00 00 mov $0x0,%rdi
 57b4: e8 00 00 00 00 callq 57b9 iscsi_del_ts_from_active_list+0
 x19

 __list_del(entry-prev, entry-next);

 /build/buildd/linux-3.13.0/include/linux/list.h:106
 57b9: 48 8b 83 c8 00 00 00 mov 0xc8(%rbx),%rax --ts-ts_list-prev
 57c0: 48 8b 93 c0 00 00 00 mov 0xc0(%rbx),%rdx --ts-ts_list-next
 iscsi_del_ts_from_active_list():
 /build/buildd/linux-3.13.0/include/linux/spinlock.h:333
 57c7: 48 c7 c7 00 00 00 00 mov $0x0,%rdi
 /build/buildd/linux-3.13.0/include/linux/list.h:88
 57ce: 48 89 42 08 mov %rax,0x8(%rdx) ts-ts_list-next-prev = 
 ts-ts_list-prev
 spin_unlock():
 /build/buildd/linux-3.13.0/include/linux/list.h:89
 57d2: 48 89 10 mov %rdx,(%rax) ts-ts_list-prev-next = ts-ts_list-next

 entry-next = LIST_POISON1;

 /build/buildd/linux-3.13.0/include/linux/list.h:107
 57d5: 48 b8 00 01 10 00 00 movabs $0xdead00100100,%rax
 57dc: 00 ad de
 iscsi_del_ts_from_active_list():
 57df: 48 89 83 c0 00 00 00 mov %rax,0xc0(%rbx)

 entry-prev = LIST_POISON2;

 iscsi_deallocate_thread_one():
 /build/buildd/linux-3.13.0/include/linux/list.h:108
 57e6: 48 b8 00 02 20 00 00 movabs $0xdead00200200,%rax
 57ed: 00 ad de
 57f0: 48 89 83 c8 00 00 00 mov %rax,0xc8(%rbx)


 Thanks for your detailed analysis.

 A similar bug was reported off-list some months back by a person using
 iser-target + RoCE export on v3.12.y code.  Just to confirm, your
 environment is using traditional iscsi-target + TCP export, right..?

I am sorry that I'm not an expert of the field and already google RoCE
on the internet but still don't really know what RoCE is. However, I
can provide the informations.  We used iscsiadm on the initiator side
and lio_node and tcm_node commands to create the targets for
connection. I think it should be normal  iscsi-target using TCP
export.


 At the time, a different set of iser-target related changes ended up
 avoiding this issue on his particular setup, so we thought it was likely
 a race triggered by login failures specific to iser-target code.

 There was a untested patch (included inline below) to drop the legacy
 active_ts_list usage all-together, but IIRC he was not able to reproduce
 further so the patch didn't get picked up for mainline.

 If your able to reliability reproduce, please try with the following
 patch and let us know your progress.

Thanks for your time reading the mail. I'll let you know the result.


 Thank you,

 --nab

 From 33f211fcf0f4149b13de826dcbe204241f71b2e8 Mon Sep 17 00:00:00 2001
 From: Nicholas Bellinger n...@linux-iscsi.org
 Date: Thu, 22 Jan 2015 00:56:53 -0800
 Subject: [PATCH] iscsi-target: Drop problematic active_ts_list usage

 Signed-off-by: Nicholas Bellinger n...@linux-iscsi.org
 ---
  drivers/target/iscsi/iscsi_target_tq.c | 28 +---
  1 file changed, 5 insertions(+), 23 deletions(-)

 diff --git a/drivers/target/iscsi/iscsi_target_tq.c 
 b/drivers/target/iscsi/iscsi_target_tq.c
 index 601e9cc..bb2890e 100644
 --- 

Re: General protection fault in iscsi_rx_thread_pre_handler

2015-01-22 Thread Nicholas A. Bellinger
On Thu, 2015-01-22 at 23:56 +0800, Gavin Guo wrote:
 Hi Nicolas,
 
 On Thu, Jan 22, 2015 at 5:50 PM, Nicholas A. Bellinger
 n...@linux-iscsi.org wrote:
  Hi Gavin,
 
  On Thu, 2015-01-22 at 06:38 +0800, Gavin Guo wrote:
  Hi all,
 
  The general protection fault screenshot is attached.
 
  Summary:
  The kernel is Ubuntu-3.13.0-39.66. I've done basic analysis and found
  the fault is in list_del of iscsi_del_ts_from_active_list. And it
  looks like deleting the iscsi_thread_set *ts two times. The point to
  delete including iscsi_get_ts_from_inactive_list, was also checked but
  still can't find the clue. Really appreciate if anyone can provide any
  idea on the bug.
 

SNIP

 
  Thanks for your detailed analysis.
 
  A similar bug was reported off-list some months back by a person using
  iser-target + RoCE export on v3.12.y code.  Just to confirm, your
  environment is using traditional iscsi-target + TCP export, right..?
 
 I am sorry that I'm not an expert of the field and already google RoCE
 on the internet but still don't really know what RoCE is. However, I
 can provide the informations.  We used iscsiadm on the initiator side
 and lio_node and tcm_node commands to create the targets for
 connection. I think it should be normal  iscsi-target using TCP
 export.
 

Yep, that would be traditional iscsi-target + TCP export.

 
  At the time, a different set of iser-target related changes ended up
  avoiding this issue on his particular setup, so we thought it was likely
  a race triggered by login failures specific to iser-target code.
 
  There was a untested patch (included inline below) to drop the legacy
  active_ts_list usage all-together, but IIRC he was not able to reproduce
  further so the patch didn't get picked up for mainline.
 
  If your able to reliability reproduce, please try with the following
  patch and let us know your progress.
 
 Thanks for your time reading the mail. I'll let you know the result.

Just curious, are you able to reliability reproduce this bug in a VM..?

--nab

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


General protection fault in iscsi_rx_thread_pre_handler

2015-01-21 Thread Gavin Guo
Hi all,

The general protection fault screenshot is attached.

Summary:
The kernel is Ubuntu-3.13.0-39.66. I've done basic analysis and found
the fault is in list_del of iscsi_del_ts_from_active_list. And it
looks like deleting the iscsi_thread_set *ts two times. The point to
delete including iscsi_get_ts_from_inactive_list, was also checked but
still can't find the clue. Really appreciate if anyone can provide any
idea on the bug.

static void iscsi_del_ts_from_active_list(struct iscsi_thread_set *ts)
{
...
list_del(ts-ts_list);
...
}
static inline void list_del(struct list_head *entry)
{
__list_del(entry-prev, entry-next);
entry-next = LIST_POISON1;
entry-prev = LIST_POISON2;
}


static inline void __list_del(struct list_head * prev, struct list_head * next)
{
next-prev = prev;
prev-next = next;
}

According coredump is trace3.png. The %rdx is ts-ts_list-next
(0xdead00100100, LIST_POISON1), %rax is ts-ts_list-prev
(0xdead00200200, LIST_POISON2). When the “next-prev = prev;”
executes, it’s the instruction:

48 89 42 08 mov %rax,0x8(%rdx)

The %rdx is the value (0xdead00100100, LIST_POISON1). So, general
protection fault happened. List_del() is the one of the only three
points to set LIST_POISON1/2. The other two are hlist_bl_del() and
hlist_del(). The root cause has high possibility related to calling
__list_del() twice for deleting the ts-ts_list.

Detailed analysis:

57a0 iscsi_del_ts_from_active_list:
__list_del():
/build/buildd/linux-3.13.0/drivers/target/iscsi/iscsi_target_tq.c:50
57a0: e8 00 00 00 00 callq 57a5 iscsi_del_ts_from_active_list+0
x5
list_del():
57a5: 55 push %rbp
57a6: 48 89 e5 mov %rsp,%rbp
57a9: 53 push %rbx
57aa: 48 89 fb mov %rdi,%rbx --iscsi_thread_set *ts
/build/buildd/linux-3.13.0/include/linux/spinlock.h:293
57ad: 48 c7 c7 00 00 00 00 mov $0x0,%rdi
57b4: e8 00 00 00 00 callq 57b9 iscsi_del_ts_from_active_list+0
x19

__list_del(entry-prev, entry-next);

/build/buildd/linux-3.13.0/include/linux/list.h:106
57b9: 48 8b 83 c8 00 00 00 mov 0xc8(%rbx),%rax --ts-ts_list-prev
57c0: 48 8b 93 c0 00 00 00 mov 0xc0(%rbx),%rdx --ts-ts_list-next
iscsi_del_ts_from_active_list():
/build/buildd/linux-3.13.0/include/linux/spinlock.h:333
57c7: 48 c7 c7 00 00 00 00 mov $0x0,%rdi
/build/buildd/linux-3.13.0/include/linux/list.h:88
57ce: 48 89 42 08 mov %rax,0x8(%rdx) ts-ts_list-next-prev = ts-ts_list-prev
spin_unlock():
/build/buildd/linux-3.13.0/include/linux/list.h:89
57d2: 48 89 10 mov %rdx,(%rax) ts-ts_list-prev-next = ts-ts_list-next

entry-next = LIST_POISON1;

/build/buildd/linux-3.13.0/include/linux/list.h:107
57d5: 48 b8 00 01 10 00 00 movabs $0xdead00100100,%rax
57dc: 00 ad de
iscsi_del_ts_from_active_list():
57df: 48 89 83 c0 00 00 00 mov %rax,0xc0(%rbx)

entry-prev = LIST_POISON2;

iscsi_deallocate_thread_one():
/build/buildd/linux-3.13.0/include/linux/list.h:108
57e6: 48 b8 00 02 20 00 00 movabs $0xdead00200200,%rax
57ed: 00 ad de
57f0: 48 89 83 c8 00 00 00 mov %rax,0xc8(%rbx)

Thanks,
Gavin Guo