Re: General protection fault in iscsi_rx_thread_pre_handler
Hi Nicholas, On Mon, Feb 16, 2015 at 6:52 PM, Gavin Guo gavin@canonical.com wrote: Hi Nicholas, On Thu, Feb 12, 2015 at 3:16 PM, Nicholas A. Bellinger n...@linux-iscsi.org wrote: Hi Gavin, On Tue, 2015-02-03 at 08:28 +0800, Gavin Guo wrote: Hi Nicholas, On Sun, Feb 1, 2015 at 11:47 AM, Gavin Guo gavin@canonical.com wrote: Hi Nicholas, On Sat, Jan 31, 2015 at 6:53 AM, Nicholas A. Bellinger n...@linux-iscsi.org wrote: On Fri, 2015-01-23 at 09:30 +0800, Gavin Guo wrote: Hi Nicholas, On Fri, Jan 23, 2015 at 1:35 AM, Nicholas A. Bellinger n...@linux-iscsi.org wrote: On Thu, 2015-01-22 at 23:56 +0800, Gavin Guo wrote: Hi Nicolas, On Thu, Jan 22, 2015 at 5:50 PM, Nicholas A. Bellinger SNIP At the time, a different set of iser-target related changes ended up avoiding this issue on his particular setup, so we thought it was likely a race triggered by login failures specific to iser-target code. There was a untested patch (included inline below) to drop the legacy active_ts_list usage all-together, but IIRC he was not able to reproduce further so the patch didn't get picked up for mainline. If your able to reliability reproduce, please try with the following patch and let us know your progress. Thanks for your time reading the mail. I'll let you know the result. Just curious, are you able to reliability reproduce this bug in a VM..? Thanks for your caring, the machine is on the customer side, I've asked and now waiting for their response. Hi Gavin, Just curious if there has been any update on this yet..? --nab Really thanks for your attention. I'm also currently waiting for the customer's reply and will send the email again to ask for the result. However, I think the symptom may be hard to replicate that's why the customer didn't reply me for a long time. Thanks for your time again. Thanks, Gavin Sorry for making you wait so long. I just got the response from the customer, they said the general protection fault happened just 2 times in the past and cannot be reliably reproduced. And I am now waiting for the verification test. Just a heads up that I'm planning to include this patch in the v3.20-rc1 PULL request. Please let me know if you have any objections. Thank you, --nab The bug Sorry, I mistakenly press the send button last time. The bug doesn't appear after the customer upgraded the kernel with the patch. Really thanks for your help. I'll keep you posted if the bug appears again. Thanks, Gavin -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: General protection fault in iscsi_rx_thread_pre_handler
Hi Gavin, On Tue, 2015-02-03 at 08:28 +0800, Gavin Guo wrote: Hi Nicholas, On Sun, Feb 1, 2015 at 11:47 AM, Gavin Guo gavin@canonical.com wrote: Hi Nicholas, On Sat, Jan 31, 2015 at 6:53 AM, Nicholas A. Bellinger n...@linux-iscsi.org wrote: On Fri, 2015-01-23 at 09:30 +0800, Gavin Guo wrote: Hi Nicholas, On Fri, Jan 23, 2015 at 1:35 AM, Nicholas A. Bellinger n...@linux-iscsi.org wrote: On Thu, 2015-01-22 at 23:56 +0800, Gavin Guo wrote: Hi Nicolas, On Thu, Jan 22, 2015 at 5:50 PM, Nicholas A. Bellinger SNIP At the time, a different set of iser-target related changes ended up avoiding this issue on his particular setup, so we thought it was likely a race triggered by login failures specific to iser-target code. There was a untested patch (included inline below) to drop the legacy active_ts_list usage all-together, but IIRC he was not able to reproduce further so the patch didn't get picked up for mainline. If your able to reliability reproduce, please try with the following patch and let us know your progress. Thanks for your time reading the mail. I'll let you know the result. Just curious, are you able to reliability reproduce this bug in a VM..? Thanks for your caring, the machine is on the customer side, I've asked and now waiting for their response. Hi Gavin, Just curious if there has been any update on this yet..? --nab Really thanks for your attention. I'm also currently waiting for the customer's reply and will send the email again to ask for the result. However, I think the symptom may be hard to replicate that's why the customer didn't reply me for a long time. Thanks for your time again. Thanks, Gavin Sorry for making you wait so long. I just got the response from the customer, they said the general protection fault happened just 2 times in the past and cannot be reliably reproduced. And I am now waiting for the verification test. Just a heads up that I'm planning to include this patch in the v3.20-rc1 PULL request. Please let me know if you have any objections. Thank you, --nab -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: General protection fault in iscsi_rx_thread_pre_handler
Hi Nicholas, On Sun, Feb 1, 2015 at 11:47 AM, Gavin Guo gavin@canonical.com wrote: Hi Nicholas, On Sat, Jan 31, 2015 at 6:53 AM, Nicholas A. Bellinger n...@linux-iscsi.org wrote: On Fri, 2015-01-23 at 09:30 +0800, Gavin Guo wrote: Hi Nicholas, On Fri, Jan 23, 2015 at 1:35 AM, Nicholas A. Bellinger n...@linux-iscsi.org wrote: On Thu, 2015-01-22 at 23:56 +0800, Gavin Guo wrote: Hi Nicolas, On Thu, Jan 22, 2015 at 5:50 PM, Nicholas A. Bellinger n...@linux-iscsi.org wrote: Hi Gavin, On Thu, 2015-01-22 at 06:38 +0800, Gavin Guo wrote: Hi all, The general protection fault screenshot is attached. Summary: The kernel is Ubuntu-3.13.0-39.66. I've done basic analysis and found the fault is in list_del of iscsi_del_ts_from_active_list. And it looks like deleting the iscsi_thread_set *ts two times. The point to delete including iscsi_get_ts_from_inactive_list, was also checked but still can't find the clue. Really appreciate if anyone can provide any idea on the bug. SNIP Thanks for your detailed analysis. A similar bug was reported off-list some months back by a person using iser-target + RoCE export on v3.12.y code. Just to confirm, your environment is using traditional iscsi-target + TCP export, right..? I am sorry that I'm not an expert of the field and already google RoCE on the internet but still don't really know what RoCE is. However, I can provide the informations. We used iscsiadm on the initiator side and lio_node and tcm_node commands to create the targets for connection. I think it should be normal iscsi-target using TCP export. Yep, that would be traditional iscsi-target + TCP export. At the time, a different set of iser-target related changes ended up avoiding this issue on his particular setup, so we thought it was likely a race triggered by login failures specific to iser-target code. There was a untested patch (included inline below) to drop the legacy active_ts_list usage all-together, but IIRC he was not able to reproduce further so the patch didn't get picked up for mainline. If your able to reliability reproduce, please try with the following patch and let us know your progress. Thanks for your time reading the mail. I'll let you know the result. Just curious, are you able to reliability reproduce this bug in a VM..? Thanks for your caring, the machine is on the customer side, I've asked and now waiting for their response. Hi Gavin, Just curious if there has been any update on this yet..? --nab Really thanks for your attention. I'm also currently waiting for the customer's reply and will send the email again to ask for the result. However, I think the symptom may be hard to replicate that's why the customer didn't reply me for a long time. Thanks for your time again. Thanks, Gavin Sorry for making you wait so long. I just got the response from the customer, they said the general protection fault happened just 2 times in the past and cannot be reliably reproduced. And I am now waiting for the verification test. Thanks, Gavin -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: General protection fault in iscsi_rx_thread_pre_handler
Hi Nicholas, On Sat, Jan 31, 2015 at 6:53 AM, Nicholas A. Bellinger n...@linux-iscsi.org wrote: On Fri, 2015-01-23 at 09:30 +0800, Gavin Guo wrote: Hi Nicholas, On Fri, Jan 23, 2015 at 1:35 AM, Nicholas A. Bellinger n...@linux-iscsi.org wrote: On Thu, 2015-01-22 at 23:56 +0800, Gavin Guo wrote: Hi Nicolas, On Thu, Jan 22, 2015 at 5:50 PM, Nicholas A. Bellinger n...@linux-iscsi.org wrote: Hi Gavin, On Thu, 2015-01-22 at 06:38 +0800, Gavin Guo wrote: Hi all, The general protection fault screenshot is attached. Summary: The kernel is Ubuntu-3.13.0-39.66. I've done basic analysis and found the fault is in list_del of iscsi_del_ts_from_active_list. And it looks like deleting the iscsi_thread_set *ts two times. The point to delete including iscsi_get_ts_from_inactive_list, was also checked but still can't find the clue. Really appreciate if anyone can provide any idea on the bug. SNIP Thanks for your detailed analysis. A similar bug was reported off-list some months back by a person using iser-target + RoCE export on v3.12.y code. Just to confirm, your environment is using traditional iscsi-target + TCP export, right..? I am sorry that I'm not an expert of the field and already google RoCE on the internet but still don't really know what RoCE is. However, I can provide the informations. We used iscsiadm on the initiator side and lio_node and tcm_node commands to create the targets for connection. I think it should be normal iscsi-target using TCP export. Yep, that would be traditional iscsi-target + TCP export. At the time, a different set of iser-target related changes ended up avoiding this issue on his particular setup, so we thought it was likely a race triggered by login failures specific to iser-target code. There was a untested patch (included inline below) to drop the legacy active_ts_list usage all-together, but IIRC he was not able to reproduce further so the patch didn't get picked up for mainline. If your able to reliability reproduce, please try with the following patch and let us know your progress. Thanks for your time reading the mail. I'll let you know the result. Just curious, are you able to reliability reproduce this bug in a VM..? Thanks for your caring, the machine is on the customer side, I've asked and now waiting for their response. Hi Gavin, Just curious if there has been any update on this yet..? --nab Really thanks for your attention. I'm also currently waiting for the customer's reply and will send the email again to ask for the result. However, I think the symptom may be hard to replicate that's why the customer didn't reply me for a long time. Thanks for your time again. Thanks, Gavin -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: General protection fault in iscsi_rx_thread_pre_handler
On Fri, 2015-01-23 at 09:30 +0800, Gavin Guo wrote: Hi Nicholas, On Fri, Jan 23, 2015 at 1:35 AM, Nicholas A. Bellinger n...@linux-iscsi.org wrote: On Thu, 2015-01-22 at 23:56 +0800, Gavin Guo wrote: Hi Nicolas, On Thu, Jan 22, 2015 at 5:50 PM, Nicholas A. Bellinger n...@linux-iscsi.org wrote: Hi Gavin, On Thu, 2015-01-22 at 06:38 +0800, Gavin Guo wrote: Hi all, The general protection fault screenshot is attached. Summary: The kernel is Ubuntu-3.13.0-39.66. I've done basic analysis and found the fault is in list_del of iscsi_del_ts_from_active_list. And it looks like deleting the iscsi_thread_set *ts two times. The point to delete including iscsi_get_ts_from_inactive_list, was also checked but still can't find the clue. Really appreciate if anyone can provide any idea on the bug. SNIP Thanks for your detailed analysis. A similar bug was reported off-list some months back by a person using iser-target + RoCE export on v3.12.y code. Just to confirm, your environment is using traditional iscsi-target + TCP export, right..? I am sorry that I'm not an expert of the field and already google RoCE on the internet but still don't really know what RoCE is. However, I can provide the informations. We used iscsiadm on the initiator side and lio_node and tcm_node commands to create the targets for connection. I think it should be normal iscsi-target using TCP export. Yep, that would be traditional iscsi-target + TCP export. At the time, a different set of iser-target related changes ended up avoiding this issue on his particular setup, so we thought it was likely a race triggered by login failures specific to iser-target code. There was a untested patch (included inline below) to drop the legacy active_ts_list usage all-together, but IIRC he was not able to reproduce further so the patch didn't get picked up for mainline. If your able to reliability reproduce, please try with the following patch and let us know your progress. Thanks for your time reading the mail. I'll let you know the result. Just curious, are you able to reliability reproduce this bug in a VM..? Thanks for your caring, the machine is on the customer side, I've asked and now waiting for their response. Hi Gavin, Just curious if there has been any update on this yet..? --nab -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: General protection fault in iscsi_rx_thread_pre_handler
Hi Gavin, On Thu, 2015-01-22 at 06:38 +0800, Gavin Guo wrote: Hi all, The general protection fault screenshot is attached. Summary: The kernel is Ubuntu-3.13.0-39.66. I've done basic analysis and found the fault is in list_del of iscsi_del_ts_from_active_list. And it looks like deleting the iscsi_thread_set *ts two times. The point to delete including iscsi_get_ts_from_inactive_list, was also checked but still can't find the clue. Really appreciate if anyone can provide any idea on the bug. static void iscsi_del_ts_from_active_list(struct iscsi_thread_set *ts) { ... list_del(ts-ts_list); ... } static inline void list_del(struct list_head *entry) { __list_del(entry-prev, entry-next); entry-next = LIST_POISON1; entry-prev = LIST_POISON2; } static inline void __list_del(struct list_head * prev, struct list_head * next) { next-prev = prev; prev-next = next; } According coredump is trace3.png. The %rdx is ts-ts_list-next (0xdead00100100, LIST_POISON1), %rax is ts-ts_list-prev (0xdead00200200, LIST_POISON2). When the “next-prev = prev;” executes, it’s the instruction: 48 89 42 08 mov %rax,0x8(%rdx) The %rdx is the value (0xdead00100100, LIST_POISON1). So, general protection fault happened. List_del() is the one of the only three points to set LIST_POISON1/2. The other two are hlist_bl_del() and hlist_del(). The root cause has high possibility related to calling __list_del() twice for deleting the ts-ts_list. Detailed analysis: 57a0 iscsi_del_ts_from_active_list: __list_del(): /build/buildd/linux-3.13.0/drivers/target/iscsi/iscsi_target_tq.c:50 57a0: e8 00 00 00 00 callq 57a5 iscsi_del_ts_from_active_list+0 x5 list_del(): 57a5: 55 push %rbp 57a6: 48 89 e5 mov %rsp,%rbp 57a9: 53 push %rbx 57aa: 48 89 fb mov %rdi,%rbx --iscsi_thread_set *ts /build/buildd/linux-3.13.0/include/linux/spinlock.h:293 57ad: 48 c7 c7 00 00 00 00 mov $0x0,%rdi 57b4: e8 00 00 00 00 callq 57b9 iscsi_del_ts_from_active_list+0 x19 __list_del(entry-prev, entry-next); /build/buildd/linux-3.13.0/include/linux/list.h:106 57b9: 48 8b 83 c8 00 00 00 mov 0xc8(%rbx),%rax --ts-ts_list-prev 57c0: 48 8b 93 c0 00 00 00 mov 0xc0(%rbx),%rdx --ts-ts_list-next iscsi_del_ts_from_active_list(): /build/buildd/linux-3.13.0/include/linux/spinlock.h:333 57c7: 48 c7 c7 00 00 00 00 mov $0x0,%rdi /build/buildd/linux-3.13.0/include/linux/list.h:88 57ce: 48 89 42 08 mov %rax,0x8(%rdx) ts-ts_list-next-prev = ts-ts_list-prev spin_unlock(): /build/buildd/linux-3.13.0/include/linux/list.h:89 57d2: 48 89 10 mov %rdx,(%rax) ts-ts_list-prev-next = ts-ts_list-next entry-next = LIST_POISON1; /build/buildd/linux-3.13.0/include/linux/list.h:107 57d5: 48 b8 00 01 10 00 00 movabs $0xdead00100100,%rax 57dc: 00 ad de iscsi_del_ts_from_active_list(): 57df: 48 89 83 c0 00 00 00 mov %rax,0xc0(%rbx) entry-prev = LIST_POISON2; iscsi_deallocate_thread_one(): /build/buildd/linux-3.13.0/include/linux/list.h:108 57e6: 48 b8 00 02 20 00 00 movabs $0xdead00200200,%rax 57ed: 00 ad de 57f0: 48 89 83 c8 00 00 00 mov %rax,0xc8(%rbx) Thanks for your detailed analysis. A similar bug was reported off-list some months back by a person using iser-target + RoCE export on v3.12.y code. Just to confirm, your environment is using traditional iscsi-target + TCP export, right..? At the time, a different set of iser-target related changes ended up avoiding this issue on his particular setup, so we thought it was likely a race triggered by login failures specific to iser-target code. There was a untested patch (included inline below) to drop the legacy active_ts_list usage all-together, but IIRC he was not able to reproduce further so the patch didn't get picked up for mainline. If your able to reliability reproduce, please try with the following patch and let us know your progress. Thank you, --nab From 33f211fcf0f4149b13de826dcbe204241f71b2e8 Mon Sep 17 00:00:00 2001 From: Nicholas Bellinger n...@linux-iscsi.org Date: Thu, 22 Jan 2015 00:56:53 -0800 Subject: [PATCH] iscsi-target: Drop problematic active_ts_list usage Signed-off-by: Nicholas Bellinger n...@linux-iscsi.org --- drivers/target/iscsi/iscsi_target_tq.c | 28 +--- 1 file changed, 5 insertions(+), 23 deletions(-) diff --git a/drivers/target/iscsi/iscsi_target_tq.c b/drivers/target/iscsi/iscsi_target_tq.c index 601e9cc..bb2890e 100644 --- a/drivers/target/iscsi/iscsi_target_tq.c +++ b/drivers/target/iscsi/iscsi_target_tq.c @@ -24,36 +24,22 @@ #include iscsi_target_tq.h #include iscsi_target.h -static LIST_HEAD(active_ts_list); static LIST_HEAD(inactive_ts_list); -static DEFINE_SPINLOCK(active_ts_lock); static DEFINE_SPINLOCK(inactive_ts_lock); static DEFINE_SPINLOCK(ts_bitmap_lock); -static void iscsi_add_ts_to_active_list(struct iscsi_thread_set *ts) -{ - spin_lock(active_ts_lock); - list_add_tail(ts-ts_list, active_ts_list); -
Re: General protection fault in iscsi_rx_thread_pre_handler
Hi Nicolas, On Thu, Jan 22, 2015 at 5:50 PM, Nicholas A. Bellinger n...@linux-iscsi.org wrote: Hi Gavin, On Thu, 2015-01-22 at 06:38 +0800, Gavin Guo wrote: Hi all, The general protection fault screenshot is attached. Summary: The kernel is Ubuntu-3.13.0-39.66. I've done basic analysis and found the fault is in list_del of iscsi_del_ts_from_active_list. And it looks like deleting the iscsi_thread_set *ts two times. The point to delete including iscsi_get_ts_from_inactive_list, was also checked but still can't find the clue. Really appreciate if anyone can provide any idea on the bug. static void iscsi_del_ts_from_active_list(struct iscsi_thread_set *ts) { ... list_del(ts-ts_list); ... } static inline void list_del(struct list_head *entry) { __list_del(entry-prev, entry-next); entry-next = LIST_POISON1; entry-prev = LIST_POISON2; } static inline void __list_del(struct list_head * prev, struct list_head * next) { next-prev = prev; prev-next = next; } According coredump is trace3.png. The %rdx is ts-ts_list-next (0xdead00100100, LIST_POISON1), %rax is ts-ts_list-prev (0xdead00200200, LIST_POISON2). When the “next-prev = prev;” executes, it’s the instruction: 48 89 42 08 mov %rax,0x8(%rdx) The %rdx is the value (0xdead00100100, LIST_POISON1). So, general protection fault happened. List_del() is the one of the only three points to set LIST_POISON1/2. The other two are hlist_bl_del() and hlist_del(). The root cause has high possibility related to calling __list_del() twice for deleting the ts-ts_list. Detailed analysis: 57a0 iscsi_del_ts_from_active_list: __list_del(): /build/buildd/linux-3.13.0/drivers/target/iscsi/iscsi_target_tq.c:50 57a0: e8 00 00 00 00 callq 57a5 iscsi_del_ts_from_active_list+0 x5 list_del(): 57a5: 55 push %rbp 57a6: 48 89 e5 mov %rsp,%rbp 57a9: 53 push %rbx 57aa: 48 89 fb mov %rdi,%rbx --iscsi_thread_set *ts /build/buildd/linux-3.13.0/include/linux/spinlock.h:293 57ad: 48 c7 c7 00 00 00 00 mov $0x0,%rdi 57b4: e8 00 00 00 00 callq 57b9 iscsi_del_ts_from_active_list+0 x19 __list_del(entry-prev, entry-next); /build/buildd/linux-3.13.0/include/linux/list.h:106 57b9: 48 8b 83 c8 00 00 00 mov 0xc8(%rbx),%rax --ts-ts_list-prev 57c0: 48 8b 93 c0 00 00 00 mov 0xc0(%rbx),%rdx --ts-ts_list-next iscsi_del_ts_from_active_list(): /build/buildd/linux-3.13.0/include/linux/spinlock.h:333 57c7: 48 c7 c7 00 00 00 00 mov $0x0,%rdi /build/buildd/linux-3.13.0/include/linux/list.h:88 57ce: 48 89 42 08 mov %rax,0x8(%rdx) ts-ts_list-next-prev = ts-ts_list-prev spin_unlock(): /build/buildd/linux-3.13.0/include/linux/list.h:89 57d2: 48 89 10 mov %rdx,(%rax) ts-ts_list-prev-next = ts-ts_list-next entry-next = LIST_POISON1; /build/buildd/linux-3.13.0/include/linux/list.h:107 57d5: 48 b8 00 01 10 00 00 movabs $0xdead00100100,%rax 57dc: 00 ad de iscsi_del_ts_from_active_list(): 57df: 48 89 83 c0 00 00 00 mov %rax,0xc0(%rbx) entry-prev = LIST_POISON2; iscsi_deallocate_thread_one(): /build/buildd/linux-3.13.0/include/linux/list.h:108 57e6: 48 b8 00 02 20 00 00 movabs $0xdead00200200,%rax 57ed: 00 ad de 57f0: 48 89 83 c8 00 00 00 mov %rax,0xc8(%rbx) Thanks for your detailed analysis. A similar bug was reported off-list some months back by a person using iser-target + RoCE export on v3.12.y code. Just to confirm, your environment is using traditional iscsi-target + TCP export, right..? I am sorry that I'm not an expert of the field and already google RoCE on the internet but still don't really know what RoCE is. However, I can provide the informations. We used iscsiadm on the initiator side and lio_node and tcm_node commands to create the targets for connection. I think it should be normal iscsi-target using TCP export. At the time, a different set of iser-target related changes ended up avoiding this issue on his particular setup, so we thought it was likely a race triggered by login failures specific to iser-target code. There was a untested patch (included inline below) to drop the legacy active_ts_list usage all-together, but IIRC he was not able to reproduce further so the patch didn't get picked up for mainline. If your able to reliability reproduce, please try with the following patch and let us know your progress. Thanks for your time reading the mail. I'll let you know the result. Thank you, --nab From 33f211fcf0f4149b13de826dcbe204241f71b2e8 Mon Sep 17 00:00:00 2001 From: Nicholas Bellinger n...@linux-iscsi.org Date: Thu, 22 Jan 2015 00:56:53 -0800 Subject: [PATCH] iscsi-target: Drop problematic active_ts_list usage Signed-off-by: Nicholas Bellinger n...@linux-iscsi.org --- drivers/target/iscsi/iscsi_target_tq.c | 28 +--- 1 file changed, 5 insertions(+), 23 deletions(-) diff --git a/drivers/target/iscsi/iscsi_target_tq.c b/drivers/target/iscsi/iscsi_target_tq.c index 601e9cc..bb2890e 100644 ---
Re: General protection fault in iscsi_rx_thread_pre_handler
On Thu, 2015-01-22 at 23:56 +0800, Gavin Guo wrote: Hi Nicolas, On Thu, Jan 22, 2015 at 5:50 PM, Nicholas A. Bellinger n...@linux-iscsi.org wrote: Hi Gavin, On Thu, 2015-01-22 at 06:38 +0800, Gavin Guo wrote: Hi all, The general protection fault screenshot is attached. Summary: The kernel is Ubuntu-3.13.0-39.66. I've done basic analysis and found the fault is in list_del of iscsi_del_ts_from_active_list. And it looks like deleting the iscsi_thread_set *ts two times. The point to delete including iscsi_get_ts_from_inactive_list, was also checked but still can't find the clue. Really appreciate if anyone can provide any idea on the bug. SNIP Thanks for your detailed analysis. A similar bug was reported off-list some months back by a person using iser-target + RoCE export on v3.12.y code. Just to confirm, your environment is using traditional iscsi-target + TCP export, right..? I am sorry that I'm not an expert of the field and already google RoCE on the internet but still don't really know what RoCE is. However, I can provide the informations. We used iscsiadm on the initiator side and lio_node and tcm_node commands to create the targets for connection. I think it should be normal iscsi-target using TCP export. Yep, that would be traditional iscsi-target + TCP export. At the time, a different set of iser-target related changes ended up avoiding this issue on his particular setup, so we thought it was likely a race triggered by login failures specific to iser-target code. There was a untested patch (included inline below) to drop the legacy active_ts_list usage all-together, but IIRC he was not able to reproduce further so the patch didn't get picked up for mainline. If your able to reliability reproduce, please try with the following patch and let us know your progress. Thanks for your time reading the mail. I'll let you know the result. Just curious, are you able to reliability reproduce this bug in a VM..? --nab -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
General protection fault in iscsi_rx_thread_pre_handler
Hi all, The general protection fault screenshot is attached. Summary: The kernel is Ubuntu-3.13.0-39.66. I've done basic analysis and found the fault is in list_del of iscsi_del_ts_from_active_list. And it looks like deleting the iscsi_thread_set *ts two times. The point to delete including iscsi_get_ts_from_inactive_list, was also checked but still can't find the clue. Really appreciate if anyone can provide any idea on the bug. static void iscsi_del_ts_from_active_list(struct iscsi_thread_set *ts) { ... list_del(ts-ts_list); ... } static inline void list_del(struct list_head *entry) { __list_del(entry-prev, entry-next); entry-next = LIST_POISON1; entry-prev = LIST_POISON2; } static inline void __list_del(struct list_head * prev, struct list_head * next) { next-prev = prev; prev-next = next; } According coredump is trace3.png. The %rdx is ts-ts_list-next (0xdead00100100, LIST_POISON1), %rax is ts-ts_list-prev (0xdead00200200, LIST_POISON2). When the “next-prev = prev;” executes, it’s the instruction: 48 89 42 08 mov %rax,0x8(%rdx) The %rdx is the value (0xdead00100100, LIST_POISON1). So, general protection fault happened. List_del() is the one of the only three points to set LIST_POISON1/2. The other two are hlist_bl_del() and hlist_del(). The root cause has high possibility related to calling __list_del() twice for deleting the ts-ts_list. Detailed analysis: 57a0 iscsi_del_ts_from_active_list: __list_del(): /build/buildd/linux-3.13.0/drivers/target/iscsi/iscsi_target_tq.c:50 57a0: e8 00 00 00 00 callq 57a5 iscsi_del_ts_from_active_list+0 x5 list_del(): 57a5: 55 push %rbp 57a6: 48 89 e5 mov %rsp,%rbp 57a9: 53 push %rbx 57aa: 48 89 fb mov %rdi,%rbx --iscsi_thread_set *ts /build/buildd/linux-3.13.0/include/linux/spinlock.h:293 57ad: 48 c7 c7 00 00 00 00 mov $0x0,%rdi 57b4: e8 00 00 00 00 callq 57b9 iscsi_del_ts_from_active_list+0 x19 __list_del(entry-prev, entry-next); /build/buildd/linux-3.13.0/include/linux/list.h:106 57b9: 48 8b 83 c8 00 00 00 mov 0xc8(%rbx),%rax --ts-ts_list-prev 57c0: 48 8b 93 c0 00 00 00 mov 0xc0(%rbx),%rdx --ts-ts_list-next iscsi_del_ts_from_active_list(): /build/buildd/linux-3.13.0/include/linux/spinlock.h:333 57c7: 48 c7 c7 00 00 00 00 mov $0x0,%rdi /build/buildd/linux-3.13.0/include/linux/list.h:88 57ce: 48 89 42 08 mov %rax,0x8(%rdx) ts-ts_list-next-prev = ts-ts_list-prev spin_unlock(): /build/buildd/linux-3.13.0/include/linux/list.h:89 57d2: 48 89 10 mov %rdx,(%rax) ts-ts_list-prev-next = ts-ts_list-next entry-next = LIST_POISON1; /build/buildd/linux-3.13.0/include/linux/list.h:107 57d5: 48 b8 00 01 10 00 00 movabs $0xdead00100100,%rax 57dc: 00 ad de iscsi_del_ts_from_active_list(): 57df: 48 89 83 c0 00 00 00 mov %rax,0xc0(%rbx) entry-prev = LIST_POISON2; iscsi_deallocate_thread_one(): /build/buildd/linux-3.13.0/include/linux/list.h:108 57e6: 48 b8 00 02 20 00 00 movabs $0xdead00200200,%rax 57ed: 00 ad de 57f0: 48 89 83 c8 00 00 00 mov %rax,0xc8(%rbx) Thanks, Gavin Guo