Re: iSCSI session logout regression after fbce4d97fd ("scsi: fixup kernel warning during rmmod()")

2018-02-19 Thread Max Ivanov
> As already explained in the previous mail, there is a fixup for this in
> commit 81b6c9998979 ('scsi: core: check for device state in
> __scsi_remove_target()').
> Please check if this is applied, too.

I tested commit 81b6c9998979 cherry-picked on top of 4.14.20 and it
indeed solves the problem.

Can it be backported to 4.14 LTS, please?


Re: iSCSI session logout regression after fbce4d97fd ("scsi: fixup kernel warning during rmmod()")

2018-02-19 Thread Max Ivanov
It seems that commit 81b6c9998979 ('scsi: core: check for device state
in __scsi_remove_target()') didn't make it to 4.14 branch

$ git tag --contains 81b6c9998979
v4.15
v4.15-rc6
v4.15-rc7
v4.15-rc8
v4.15-rc9
v4.15.1
v4.15.2
v4.15.3
v4.15.4
v4.16-rc1
v4.16-rc2

On 19 February 2018 at 06:56, Hannes Reinecke <h...@suse.de> wrote:
> On 02/18/2018 07:33 PM, Max Ivanov wrote:
>> Hi,
>>
>> on my system I can't logout from iSCSI session when on 4.4.18, but
>> 4.3.19 works just fine. git bisect points to  fbce4d97fd ("scsi: fixup
>> kernel warning during rmmod()")
>>
>> Bug manifests itself like following:
>>   - iSCSI session logout hangs and never completes
>>   - 1 kworker per iSCSI session start consuming 100% CPU
>>   - very shortly one of 2 errors show up in dmesg (full listings are below):
>>   * kernel: list_del corruption, 88c1cd6bb810->next is LIST_POISON1
>>   * kernel BUG at mm/slub.c:295!
>>
>> Ways to trigger bug:
>>   1. initiate iSCSI sessions to multiple portals
>>   2. let multipathd to create multipath devices
>>   3. run 'iscsiadm -m node --logoutall=all'
>>
>> Bugs is NOT triggered and iSCSI logout succeeds when either:
>>   - multipathd is masked and never started
>>   - I manually delete all scsi devices via /sys/block/$d/device/delete
>> before attempting
>> to do iSCSI logout
>>
>> list_del_corrpution:
>>
>> Feb 16 10:37:11 localhost.localdomain kernel: alua: device handler registered
>> Feb 16 10:37:11 localhost.localdomain kernel: emc: device handler registered
>> Feb 16 10:37:11 localhost.localdomain kernel: rdac: device handler registered
>> Feb 16 10:37:11 localhost.localdomain kernel: device-mapper: multipath
>> service-time: version 0.3.0 loaded
>> Feb 16 10:38:38 localhost.localdomain kernel: list_del corruption,
>> 88c1cd6bb810->next is LIST_POISON1 (dead0100)
>> Feb 16 10:38:38 localhost.localdomain kernel: [ cut here
>> ]
>> Feb 16 10:38:38 localhost.localdomain kernel: kernel BUG at 
>> lib/list_debug.c:47!
>> Feb 16 10:38:38 localhost.localdomain kernel: invalid opcode:  [#1] SMP 
>> PTI
>> Feb 16 10:38:38 localhost.localdomain kernel: Modules linked in:
>> dm_service_time dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua
>> binfmt_misc iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi
>> ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink
>> ebtable_nat ebtable_broute bridge stp llc ip6tabl
>> Feb 16 10:38:38 localhost.localdomain kernel:  pata_acpi
>> Feb 16 10:38:38 localhost.localdomain kernel: CPU: 2 PID: 5 Comm:
>> kworker/u24:0 Not tainted 4.14.18-300.fc27.x86_64 #1
>> Feb 16 10:38:38 localhost.localdomain kernel: Hardware name: VMware,
>> Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS
>> 6.00 09/21/2015
>> Feb 16 10:38:38 localhost.localdomain kernel: Workqueue: scsi_wq_5
>> __iscsi_unbind_session [scsi_transport_iscsi]
>> Feb 16 10:38:38 localhost.localdomain kernel: task: 88bdede83e80
>> task.stack: b15043158000
>> Feb 16 10:38:38 localhost.localdomain kernel: RIP:
>> 0010:__list_del_entry_valid+0x4e/0x90
>> Feb 16 10:38:38 localhost.localdomain kernel: RSP:
>> 0018:b1504315bd88 EFLAGS: 00010082
>> Feb 16 10:38:38 localhost.localdomain kernel: RAX: 004e
>> RBX: 88c1cd6bbf38 RCX: 
>> Feb 16 10:38:38 localhost.localdomain kernel: RDX: 
>> RSI: 88bdefc96a38 RDI: 88bdefc96a38
>> Feb 16 10:38:38 localhost.localdomain kernel: RBP: 0246
>> R08: 07be R09: 00aa
>> Feb 16 10:38:38 localhost.localdomain kernel: R10: b1504315bd58
>> R11:  R12: 88c1ebb659c0
>> Feb 16 10:38:38 localhost.localdomain kernel: R13: 88bdec827010
>> R14: 88c1cd6bb800 R15: 88c1cd6bb800
>> Feb 16 10:38:38 localhost.localdomain kernel: FS:
>> () GS:88bdefc8()
>> knlGS:
>> Feb 16 10:38:38 localhost.localdomain kernel: CS:  0010 DS:  ES:
>>  CR0: 80050033
>> Feb 16 10:38:38 localhost.localdomain kernel: CR2: 563d0c1ed280
>> CR3: 00057120a001 CR4: 001606e0
>> Feb 16 10:38:38 localhost.localdomain kernel: Call Trace:
>> Feb 16 10:38:38 localhost.localdomain kernel:
>> scsi_device_dev_release_usercontext+0x52/0x250
>> Feb 16 10:38:38 localhost.localdomain kernel:  ? __schedule+0x10f/0x880
>> Feb 16 10:38:38 localhost.localdomain kernel:
>> execute_in_process_context+0x21/0x60

Re: iSCSI session logout regression after fbce4d97fd ("scsi: fixup kernel warning during rmmod()")

2018-02-19 Thread Max Ivanov
Neither it was backported:

$ git log --grep 'commit 81b6c99' v4.14..v4.14.20

I'll try to apply it and see if it fixes my problem. If it does, what
would be the proccess of backporting this patch to 4.14?

On 19 February 2018 at 08:08, Max Ivanov <ivanov.ma...@gmail.com> wrote:
> It seems that commit 81b6c9998979 ('scsi: core: check for device state
> in __scsi_remove_target()') didn't make it to 4.14 branch
>
> $ git tag --contains 81b6c9998979
> v4.15
> v4.15-rc6
> v4.15-rc7
> v4.15-rc8
> v4.15-rc9
> v4.15.1
> v4.15.2
> v4.15.3
> v4.15.4
> v4.16-rc1
> v4.16-rc2
>
> On 19 February 2018 at 06:56, Hannes Reinecke <h...@suse.de> wrote:
>> On 02/18/2018 07:33 PM, Max Ivanov wrote:
>>> Hi,
>>>
>>> on my system I can't logout from iSCSI session when on 4.4.18, but
>>> 4.3.19 works just fine. git bisect points to  fbce4d97fd ("scsi: fixup
>>> kernel warning during rmmod()")
>>>
>>> Bug manifests itself like following:
>>>   - iSCSI session logout hangs and never completes
>>>   - 1 kworker per iSCSI session start consuming 100% CPU
>>>   - very shortly one of 2 errors show up in dmesg (full listings are below):
>>>   * kernel: list_del corruption, 88c1cd6bb810->next is LIST_POISON1
>>>   * kernel BUG at mm/slub.c:295!
>>>
>>> Ways to trigger bug:
>>>   1. initiate iSCSI sessions to multiple portals
>>>   2. let multipathd to create multipath devices
>>>   3. run 'iscsiadm -m node --logoutall=all'
>>>
>>> Bugs is NOT triggered and iSCSI logout succeeds when either:
>>>   - multipathd is masked and never started
>>>   - I manually delete all scsi devices via /sys/block/$d/device/delete
>>> before attempting
>>> to do iSCSI logout
>>>
>>> list_del_corrpution:
>>>
>>> Feb 16 10:37:11 localhost.localdomain kernel: alua: device handler 
>>> registered
>>> Feb 16 10:37:11 localhost.localdomain kernel: emc: device handler registered
>>> Feb 16 10:37:11 localhost.localdomain kernel: rdac: device handler 
>>> registered
>>> Feb 16 10:37:11 localhost.localdomain kernel: device-mapper: multipath
>>> service-time: version 0.3.0 loaded
>>> Feb 16 10:38:38 localhost.localdomain kernel: list_del corruption,
>>> 88c1cd6bb810->next is LIST_POISON1 (dead0100)
>>> Feb 16 10:38:38 localhost.localdomain kernel: [ cut here
>>> ]
>>> Feb 16 10:38:38 localhost.localdomain kernel: kernel BUG at 
>>> lib/list_debug.c:47!
>>> Feb 16 10:38:38 localhost.localdomain kernel: invalid opcode:  [#1] SMP 
>>> PTI
>>> Feb 16 10:38:38 localhost.localdomain kernel: Modules linked in:
>>> dm_service_time dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua
>>> binfmt_misc iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi
>>> ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink
>>> ebtable_nat ebtable_broute bridge stp llc ip6tabl
>>> Feb 16 10:38:38 localhost.localdomain kernel:  pata_acpi
>>> Feb 16 10:38:38 localhost.localdomain kernel: CPU: 2 PID: 5 Comm:
>>> kworker/u24:0 Not tainted 4.14.18-300.fc27.x86_64 #1
>>> Feb 16 10:38:38 localhost.localdomain kernel: Hardware name: VMware,
>>> Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS
>>> 6.00 09/21/2015
>>> Feb 16 10:38:38 localhost.localdomain kernel: Workqueue: scsi_wq_5
>>> __iscsi_unbind_session [scsi_transport_iscsi]
>>> Feb 16 10:38:38 localhost.localdomain kernel: task: 88bdede83e80
>>> task.stack: b15043158000
>>> Feb 16 10:38:38 localhost.localdomain kernel: RIP:
>>> 0010:__list_del_entry_valid+0x4e/0x90
>>> Feb 16 10:38:38 localhost.localdomain kernel: RSP:
>>> 0018:b1504315bd88 EFLAGS: 00010082
>>> Feb 16 10:38:38 localhost.localdomain kernel: RAX: 004e
>>> RBX: 88c1cd6bbf38 RCX: 
>>> Feb 16 10:38:38 localhost.localdomain kernel: RDX: 
>>> RSI: 88bdefc96a38 RDI: 88bdefc96a38
>>> Feb 16 10:38:38 localhost.localdomain kernel: RBP: 0246
>>> R08: 07be R09: 00aa
>>> Feb 16 10:38:38 localhost.localdomain kernel: R10: b1504315bd58
>>> R11:  R12: 88c1ebb659c0
>>> Feb 16 10:38:38 localhost.localdomain kernel: R13: 88bdec827010
>>> R14: 88c1cd6bb800 R15: 88c1cd6bb800
>>> Feb 16 10:38:38 localhost.localdomain kernel: FS:
>>> (

iSCSI session logout regression after fbce4d97fd ("scsi: fixup kernel warning during rmmod()")

2018-02-18 Thread Max Ivanov
Hi,

on my system I can't logout from iSCSI session when on 4.4.18, but
4.3.19 works just fine. git bisect points to  fbce4d97fd ("scsi: fixup
kernel warning during rmmod()")

Bug manifests itself like following:
  - iSCSI session logout hangs and never completes
  - 1 kworker per iSCSI session start consuming 100% CPU
  - very shortly one of 2 errors show up in dmesg (full listings are below):
  * kernel: list_del corruption, 88c1cd6bb810->next is LIST_POISON1
  * kernel BUG at mm/slub.c:295!

Ways to trigger bug:
  1. initiate iSCSI sessions to multiple portals
  2. let multipathd to create multipath devices
  3. run 'iscsiadm -m node --logoutall=all'

Bugs is NOT triggered and iSCSI logout succeeds when either:
  - multipathd is masked and never started
  - I manually delete all scsi devices via /sys/block/$d/device/delete
before attempting
to do iSCSI logout

list_del_corrpution:

Feb 16 10:37:11 localhost.localdomain kernel: alua: device handler registered
Feb 16 10:37:11 localhost.localdomain kernel: emc: device handler registered
Feb 16 10:37:11 localhost.localdomain kernel: rdac: device handler registered
Feb 16 10:37:11 localhost.localdomain kernel: device-mapper: multipath
service-time: version 0.3.0 loaded
Feb 16 10:38:38 localhost.localdomain kernel: list_del corruption,
88c1cd6bb810->next is LIST_POISON1 (dead0100)
Feb 16 10:38:38 localhost.localdomain kernel: [ cut here
]
Feb 16 10:38:38 localhost.localdomain kernel: kernel BUG at lib/list_debug.c:47!
Feb 16 10:38:38 localhost.localdomain kernel: invalid opcode:  [#1] SMP PTI
Feb 16 10:38:38 localhost.localdomain kernel: Modules linked in:
dm_service_time dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua
binfmt_misc iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi
ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink
ebtable_nat ebtable_broute bridge stp llc ip6tabl
Feb 16 10:38:38 localhost.localdomain kernel:  pata_acpi
Feb 16 10:38:38 localhost.localdomain kernel: CPU: 2 PID: 5 Comm:
kworker/u24:0 Not tainted 4.14.18-300.fc27.x86_64 #1
Feb 16 10:38:38 localhost.localdomain kernel: Hardware name: VMware,
Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS
6.00 09/21/2015
Feb 16 10:38:38 localhost.localdomain kernel: Workqueue: scsi_wq_5
__iscsi_unbind_session [scsi_transport_iscsi]
Feb 16 10:38:38 localhost.localdomain kernel: task: 88bdede83e80
task.stack: b15043158000
Feb 16 10:38:38 localhost.localdomain kernel: RIP:
0010:__list_del_entry_valid+0x4e/0x90
Feb 16 10:38:38 localhost.localdomain kernel: RSP:
0018:b1504315bd88 EFLAGS: 00010082
Feb 16 10:38:38 localhost.localdomain kernel: RAX: 004e
RBX: 88c1cd6bbf38 RCX: 
Feb 16 10:38:38 localhost.localdomain kernel: RDX: 
RSI: 88bdefc96a38 RDI: 88bdefc96a38
Feb 16 10:38:38 localhost.localdomain kernel: RBP: 0246
R08: 07be R09: 00aa
Feb 16 10:38:38 localhost.localdomain kernel: R10: b1504315bd58
R11:  R12: 88c1ebb659c0
Feb 16 10:38:38 localhost.localdomain kernel: R13: 88bdec827010
R14: 88c1cd6bb800 R15: 88c1cd6bb800
Feb 16 10:38:38 localhost.localdomain kernel: FS:
() GS:88bdefc8()
knlGS:
Feb 16 10:38:38 localhost.localdomain kernel: CS:  0010 DS:  ES:
 CR0: 80050033
Feb 16 10:38:38 localhost.localdomain kernel: CR2: 563d0c1ed280
CR3: 00057120a001 CR4: 001606e0
Feb 16 10:38:38 localhost.localdomain kernel: Call Trace:
Feb 16 10:38:38 localhost.localdomain kernel:
scsi_device_dev_release_usercontext+0x52/0x250
Feb 16 10:38:38 localhost.localdomain kernel:  ? __schedule+0x10f/0x880
Feb 16 10:38:38 localhost.localdomain kernel:
execute_in_process_context+0x21/0x60
Feb 16 10:38:38 localhost.localdomain kernel:  device_release+0x30/0x80
Feb 16 10:38:38 localhost.localdomain kernel:  kobject_put+0x80/0x1a0
Feb 16 10:38:38 localhost.localdomain kernel:  scsi_remove_target+0x16d/0x1b0
Feb 16 10:38:38 localhost.localdomain kernel:
__iscsi_unbind_session+0xad/0x150 [scsi_transport_iscsi]
Feb 16 10:38:38 localhost.localdomain kernel:  process_one_work+0x184/0x3a0
Feb 16 10:38:38 localhost.localdomain kernel:  worker_thread+0x2e/0x380
Feb 16 10:38:38 localhost.localdomain kernel:  ? process_one_work+0x3a0/0x3a0
Feb 16 10:38:38 localhost.localdomain kernel:  kthread+0x11a/0x130
Feb 16 10:38:38 localhost.localdomain kernel:  ? kthread_park+0x60/0x60
Feb 16 10:38:38 localhost.localdomain kernel:  ret_from_fork+0x35/0x40
Feb 16 10:38:38 localhost.localdomain kernel: Code: 74 2b 48 8b 32 48
39 fe 75 34 48 8b 50 08 48 39 f2 75 3f b8 01 00 00 00 c3 48 89 fe 48
89 c2 48 c7 c7 90 c6 0c bf e8 0d 94 cc ff <0f> 0b 48 89 fe 48 c7 c7 c8
c6 0c bf e8 fc 93 cc ff 0f 0b 48 89
Feb 16 10:38:38 localhost.localdomain kernel: RIP:
__list_del_entry_valid+0x4e/0x90 RSP: b1504315bd88
Feb 16 10:38:38 

iSCSI session logout regression after fbce4d97fd ("scsi: fixup kernel warning during rmmod()")

2018-02-18 Thread Max Ivanov
Hi,

on my system I can't logout from iSCSI session when on 4.4.18, but
4.3.19 works just fine. git bisect points to  fbce4d97fd ("scsi: fixup
kernel warning during rmmod()")

Bug manifests itself like following:
  - iSCSI session logout hangs and never completes
  - 1 kworker per iSCSI session start consuming 100% CPU
  - very shortly one of 2 errors show up in dmesg (full listings are below):
  * kernel: list_del corruption, 88c1cd6bb810->next is LIST_POISON1
  * kernel BUG at mm/slub.c:295!

Ways to trigger bug:
  1. initiate iSCSI sessions to multiple portals
  2. let multipathd to create multipath devices
  3. run 'iscsiadm -m node --logoutall=all'

Bugs is NOT triggered and iSCSI logout succeeds when either:
  - multipathd is masked and never started
  - I manually delete all scsi devices via /sys/block/$d/device/delete
before attempting
to do iSCSI logout

list_del_corrpution:

Feb 16 10:37:11 localhost.localdomain kernel: alua: device handler registered
Feb 16 10:37:11 localhost.localdomain kernel: emc: device handler registered
Feb 16 10:37:11 localhost.localdomain kernel: rdac: device handler registered
Feb 16 10:37:11 localhost.localdomain kernel: device-mapper: multipath
service-time: version 0.3.0 loaded
Feb 16 10:38:38 localhost.localdomain kernel: list_del corruption,
88c1cd6bb810->next is LIST_POISON1 (dead0100)
Feb 16 10:38:38 localhost.localdomain kernel: [ cut here
]
Feb 16 10:38:38 localhost.localdomain kernel: kernel BUG at lib/list_debug.c:47!
Feb 16 10:38:38 localhost.localdomain kernel: invalid opcode:  [#1] SMP PTI
Feb 16 10:38:38 localhost.localdomain kernel: Modules linked in:
dm_service_time dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua
binfmt_misc iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi
ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink
ebtable_nat ebtable_broute bridge stp llc ip6tabl
Feb 16 10:38:38 localhost.localdomain kernel:  pata_acpi
Feb 16 10:38:38 localhost.localdomain kernel: CPU: 2 PID: 5 Comm:
kworker/u24:0 Not tainted 4.14.18-300.fc27.x86_64 #1
Feb 16 10:38:38 localhost.localdomain kernel: Hardware name: VMware,
Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS
6.00 09/21/2015
Feb 16 10:38:38 localhost.localdomain kernel: Workqueue: scsi_wq_5
__iscsi_unbind_session [scsi_transport_iscsi]
Feb 16 10:38:38 localhost.localdomain kernel: task: 88bdede83e80
task.stack: b15043158000
Feb 16 10:38:38 localhost.localdomain kernel: RIP:
0010:__list_del_entry_valid+0x4e/0x90
Feb 16 10:38:38 localhost.localdomain kernel: RSP:
0018:b1504315bd88 EFLAGS: 00010082
Feb 16 10:38:38 localhost.localdomain kernel: RAX: 004e
RBX: 88c1cd6bbf38 RCX: 
Feb 16 10:38:38 localhost.localdomain kernel: RDX: 
RSI: 88bdefc96a38 RDI: 88bdefc96a38
Feb 16 10:38:38 localhost.localdomain kernel: RBP: 0246
R08: 07be R09: 00aa
Feb 16 10:38:38 localhost.localdomain kernel: R10: b1504315bd58
R11:  R12: 88c1ebb659c0
Feb 16 10:38:38 localhost.localdomain kernel: R13: 88bdec827010
R14: 88c1cd6bb800 R15: 88c1cd6bb800
Feb 16 10:38:38 localhost.localdomain kernel: FS:
() GS:88bdefc8()
knlGS:
Feb 16 10:38:38 localhost.localdomain kernel: CS:  0010 DS:  ES:
 CR0: 80050033
Feb 16 10:38:38 localhost.localdomain kernel: CR2: 563d0c1ed280
CR3: 00057120a001 CR4: 001606e0
Feb 16 10:38:38 localhost.localdomain kernel: Call Trace:
Feb 16 10:38:38 localhost.localdomain kernel:
scsi_device_dev_release_usercontext+0x52/0x250
Feb 16 10:38:38 localhost.localdomain kernel:  ? __schedule+0x10f/0x880
Feb 16 10:38:38 localhost.localdomain kernel:
execute_in_process_context+0x21/0x60
Feb 16 10:38:38 localhost.localdomain kernel:  device_release+0x30/0x80
Feb 16 10:38:38 localhost.localdomain kernel:  kobject_put+0x80/0x1a0
Feb 16 10:38:38 localhost.localdomain kernel:  scsi_remove_target+0x16d/0x1b0
Feb 16 10:38:38 localhost.localdomain kernel:
__iscsi_unbind_session+0xad/0x150 [scsi_transport_iscsi]
Feb 16 10:38:38 localhost.localdomain kernel:  process_one_work+0x184/0x3a0
Feb 16 10:38:38 localhost.localdomain kernel:  worker_thread+0x2e/0x380
Feb 16 10:38:38 localhost.localdomain kernel:  ? process_one_work+0x3a0/0x3a0
Feb 16 10:38:38 localhost.localdomain kernel:  kthread+0x11a/0x130
Feb 16 10:38:38 localhost.localdomain kernel:  ? kthread_park+0x60/0x60
Feb 16 10:38:38 localhost.localdomain kernel:  ret_from_fork+0x35/0x40
Feb 16 10:38:38 localhost.localdomain kernel: Code: 74 2b 48 8b 32 48
39 fe 75 34 48 8b 50 08 48 39 f2 75 3f b8 01 00 00 00 c3 48 89 fe 48
89 c2 48 c7 c7 90 c6 0c bf e8 0d 94 cc ff <0f> 0b 48 89 fe 48 c7 c7 c8
c6 0c bf e8 fc 93 cc ff 0f 0b 48 89
Feb 16 10:38:38 localhost.localdomain kernel: RIP:
__list_del_entry_valid+0x4e/0x90 RSP: b1504315bd88
Feb 16 10:38:38 

Re: iSCSI session logout regression after fbce4d97fd ("scsi: fixup kernel warning during rmmod()")

2018-02-19 Thread Max Ivanov
It seems that commit 81b6c9998979 ('scsi: core: check for device state
in __scsi_remove_target()') didn't make it to 4.14 branch

$ git tag --contains 81b6c9998979
v4.15
v4.15-rc6
v4.15-rc7
v4.15-rc8
v4.15-rc9
v4.15.1
v4.15.2
v4.15.3
v4.15.4
v4.16-rc1
v4.16-rc2

On 19 February 2018 at 06:56, Hannes Reinecke  wrote:
> On 02/18/2018 07:33 PM, Max Ivanov wrote:
>> Hi,
>>
>> on my system I can't logout from iSCSI session when on 4.4.18, but
>> 4.3.19 works just fine. git bisect points to  fbce4d97fd ("scsi: fixup
>> kernel warning during rmmod()")
>>
>> Bug manifests itself like following:
>>   - iSCSI session logout hangs and never completes
>>   - 1 kworker per iSCSI session start consuming 100% CPU
>>   - very shortly one of 2 errors show up in dmesg (full listings are below):
>>   * kernel: list_del corruption, 88c1cd6bb810->next is LIST_POISON1
>>   * kernel BUG at mm/slub.c:295!
>>
>> Ways to trigger bug:
>>   1. initiate iSCSI sessions to multiple portals
>>   2. let multipathd to create multipath devices
>>   3. run 'iscsiadm -m node --logoutall=all'
>>
>> Bugs is NOT triggered and iSCSI logout succeeds when either:
>>   - multipathd is masked and never started
>>   - I manually delete all scsi devices via /sys/block/$d/device/delete
>> before attempting
>> to do iSCSI logout
>>
>> list_del_corrpution:
>>
>> Feb 16 10:37:11 localhost.localdomain kernel: alua: device handler registered
>> Feb 16 10:37:11 localhost.localdomain kernel: emc: device handler registered
>> Feb 16 10:37:11 localhost.localdomain kernel: rdac: device handler registered
>> Feb 16 10:37:11 localhost.localdomain kernel: device-mapper: multipath
>> service-time: version 0.3.0 loaded
>> Feb 16 10:38:38 localhost.localdomain kernel: list_del corruption,
>> 88c1cd6bb810->next is LIST_POISON1 (dead0100)
>> Feb 16 10:38:38 localhost.localdomain kernel: [ cut here
>> ]
>> Feb 16 10:38:38 localhost.localdomain kernel: kernel BUG at 
>> lib/list_debug.c:47!
>> Feb 16 10:38:38 localhost.localdomain kernel: invalid opcode:  [#1] SMP 
>> PTI
>> Feb 16 10:38:38 localhost.localdomain kernel: Modules linked in:
>> dm_service_time dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua
>> binfmt_misc iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi
>> ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink
>> ebtable_nat ebtable_broute bridge stp llc ip6tabl
>> Feb 16 10:38:38 localhost.localdomain kernel:  pata_acpi
>> Feb 16 10:38:38 localhost.localdomain kernel: CPU: 2 PID: 5 Comm:
>> kworker/u24:0 Not tainted 4.14.18-300.fc27.x86_64 #1
>> Feb 16 10:38:38 localhost.localdomain kernel: Hardware name: VMware,
>> Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS
>> 6.00 09/21/2015
>> Feb 16 10:38:38 localhost.localdomain kernel: Workqueue: scsi_wq_5
>> __iscsi_unbind_session [scsi_transport_iscsi]
>> Feb 16 10:38:38 localhost.localdomain kernel: task: 88bdede83e80
>> task.stack: b15043158000
>> Feb 16 10:38:38 localhost.localdomain kernel: RIP:
>> 0010:__list_del_entry_valid+0x4e/0x90
>> Feb 16 10:38:38 localhost.localdomain kernel: RSP:
>> 0018:b1504315bd88 EFLAGS: 00010082
>> Feb 16 10:38:38 localhost.localdomain kernel: RAX: 004e
>> RBX: 88c1cd6bbf38 RCX: 
>> Feb 16 10:38:38 localhost.localdomain kernel: RDX: 
>> RSI: 88bdefc96a38 RDI: 88bdefc96a38
>> Feb 16 10:38:38 localhost.localdomain kernel: RBP: 0246
>> R08: 07be R09: 00aa
>> Feb 16 10:38:38 localhost.localdomain kernel: R10: b1504315bd58
>> R11:  R12: 88c1ebb659c0
>> Feb 16 10:38:38 localhost.localdomain kernel: R13: 88bdec827010
>> R14: 88c1cd6bb800 R15: 88c1cd6bb800
>> Feb 16 10:38:38 localhost.localdomain kernel: FS:
>> () GS:88bdefc8()
>> knlGS:
>> Feb 16 10:38:38 localhost.localdomain kernel: CS:  0010 DS:  ES:
>>  CR0: 80050033
>> Feb 16 10:38:38 localhost.localdomain kernel: CR2: 563d0c1ed280
>> CR3: 00057120a001 CR4: 001606e0
>> Feb 16 10:38:38 localhost.localdomain kernel: Call Trace:
>> Feb 16 10:38:38 localhost.localdomain kernel:
>> scsi_device_dev_release_usercontext+0x52/0x250
>> Feb 16 10:38:38 localhost.localdomain kernel:  ? __schedule+0x10f/0x880
>> Feb 16 10:38:38 localhost.localdomain kernel:
>> execute_in_process_context+0x21/0x60
>> Feb 16 10:38:38 localhost

Re: iSCSI session logout regression after fbce4d97fd ("scsi: fixup kernel warning during rmmod()")

2018-02-19 Thread Max Ivanov
Neither it was backported:

$ git log --grep 'commit 81b6c99' v4.14..v4.14.20

I'll try to apply it and see if it fixes my problem. If it does, what
would be the proccess of backporting this patch to 4.14?

On 19 February 2018 at 08:08, Max Ivanov  wrote:
> It seems that commit 81b6c9998979 ('scsi: core: check for device state
> in __scsi_remove_target()') didn't make it to 4.14 branch
>
> $ git tag --contains 81b6c9998979
> v4.15
> v4.15-rc6
> v4.15-rc7
> v4.15-rc8
> v4.15-rc9
> v4.15.1
> v4.15.2
> v4.15.3
> v4.15.4
> v4.16-rc1
> v4.16-rc2
>
> On 19 February 2018 at 06:56, Hannes Reinecke  wrote:
>> On 02/18/2018 07:33 PM, Max Ivanov wrote:
>>> Hi,
>>>
>>> on my system I can't logout from iSCSI session when on 4.4.18, but
>>> 4.3.19 works just fine. git bisect points to  fbce4d97fd ("scsi: fixup
>>> kernel warning during rmmod()")
>>>
>>> Bug manifests itself like following:
>>>   - iSCSI session logout hangs and never completes
>>>   - 1 kworker per iSCSI session start consuming 100% CPU
>>>   - very shortly one of 2 errors show up in dmesg (full listings are below):
>>>   * kernel: list_del corruption, 88c1cd6bb810->next is LIST_POISON1
>>>   * kernel BUG at mm/slub.c:295!
>>>
>>> Ways to trigger bug:
>>>   1. initiate iSCSI sessions to multiple portals
>>>   2. let multipathd to create multipath devices
>>>   3. run 'iscsiadm -m node --logoutall=all'
>>>
>>> Bugs is NOT triggered and iSCSI logout succeeds when either:
>>>   - multipathd is masked and never started
>>>   - I manually delete all scsi devices via /sys/block/$d/device/delete
>>> before attempting
>>> to do iSCSI logout
>>>
>>> list_del_corrpution:
>>>
>>> Feb 16 10:37:11 localhost.localdomain kernel: alua: device handler 
>>> registered
>>> Feb 16 10:37:11 localhost.localdomain kernel: emc: device handler registered
>>> Feb 16 10:37:11 localhost.localdomain kernel: rdac: device handler 
>>> registered
>>> Feb 16 10:37:11 localhost.localdomain kernel: device-mapper: multipath
>>> service-time: version 0.3.0 loaded
>>> Feb 16 10:38:38 localhost.localdomain kernel: list_del corruption,
>>> 88c1cd6bb810->next is LIST_POISON1 (dead0100)
>>> Feb 16 10:38:38 localhost.localdomain kernel: [ cut here
>>> ]
>>> Feb 16 10:38:38 localhost.localdomain kernel: kernel BUG at 
>>> lib/list_debug.c:47!
>>> Feb 16 10:38:38 localhost.localdomain kernel: invalid opcode:  [#1] SMP 
>>> PTI
>>> Feb 16 10:38:38 localhost.localdomain kernel: Modules linked in:
>>> dm_service_time dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua
>>> binfmt_misc iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi
>>> ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink
>>> ebtable_nat ebtable_broute bridge stp llc ip6tabl
>>> Feb 16 10:38:38 localhost.localdomain kernel:  pata_acpi
>>> Feb 16 10:38:38 localhost.localdomain kernel: CPU: 2 PID: 5 Comm:
>>> kworker/u24:0 Not tainted 4.14.18-300.fc27.x86_64 #1
>>> Feb 16 10:38:38 localhost.localdomain kernel: Hardware name: VMware,
>>> Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS
>>> 6.00 09/21/2015
>>> Feb 16 10:38:38 localhost.localdomain kernel: Workqueue: scsi_wq_5
>>> __iscsi_unbind_session [scsi_transport_iscsi]
>>> Feb 16 10:38:38 localhost.localdomain kernel: task: 88bdede83e80
>>> task.stack: b15043158000
>>> Feb 16 10:38:38 localhost.localdomain kernel: RIP:
>>> 0010:__list_del_entry_valid+0x4e/0x90
>>> Feb 16 10:38:38 localhost.localdomain kernel: RSP:
>>> 0018:b1504315bd88 EFLAGS: 00010082
>>> Feb 16 10:38:38 localhost.localdomain kernel: RAX: 004e
>>> RBX: 88c1cd6bbf38 RCX: 
>>> Feb 16 10:38:38 localhost.localdomain kernel: RDX: 
>>> RSI: 88bdefc96a38 RDI: 88bdefc96a38
>>> Feb 16 10:38:38 localhost.localdomain kernel: RBP: 0246
>>> R08: 07be R09: 00aa
>>> Feb 16 10:38:38 localhost.localdomain kernel: R10: b1504315bd58
>>> R11:  R12: 88c1ebb659c0
>>> Feb 16 10:38:38 localhost.localdomain kernel: R13: 88bdec827010
>>> R14: 88c1cd6bb800 R15: 88c1cd6bb800
>>> Feb 16 10:38:38 localhost.localdomain kernel: FS:
>>> () GS:88bdefc8()
>>> knlGS:

Re: iSCSI session logout regression after fbce4d97fd ("scsi: fixup kernel warning during rmmod()")

2018-02-19 Thread Max Ivanov
> As already explained in the previous mail, there is a fixup for this in
> commit 81b6c9998979 ('scsi: core: check for device state in
> __scsi_remove_target()').
> Please check if this is applied, too.

I tested commit 81b6c9998979 cherry-picked on top of 4.14.20 and it
indeed solves the problem.

Can it be backported to 4.14 LTS, please?