qla2xxx firmware crashes in target mode
Hi folks, So this is a bit of a strange situation I'm in, where my *target* qla2xxx firmware appears to get stuck when the *initiator* kernel is 4.1+. The target is an Intel system with a QLE2464 running kernel 4.2.1 (from Debian) and using fw=7.03.00. The initiator is another Intel system with a QLE2460 and using fw=7.03.00. They are connected by direct fibre link, there are no switches / fabric involved. The initiator and target are both stable when the initiator is running kernel 4.0 or lower. When the initiator is running a 4.1 or 4.2 kernel, the *target* firmware becomes unstable and the initiator times out IOs and generally becomes very unhappy. When booting a 4.1+ kernel on the initiator, everything appears to work well for a little while (up to an hour or so) before the issue manifests itself. At some point I see the "ISP System Error" message and IO locks up. To get out of this situation I need to reboot the initiator; the target appears to recover by itself. Do you know about this issue? I can debug further (e.g. try to bisect it?) if required but no point if you know about it already. dmesg from the target end (I haven't been able to capture the initiator end): [484701.194971] qla2xxx [:05:00.0]-5003:9: ISP System Error - mbx1=c19h mbx2=10h mbx3=0h mbx7=0h. [484701.222021] qla2xxx [:05:00.0]-d001:9: Firmware dump saved to temp buffer (9/c90002b84000), dump status flags (0x3f). [484701.222082] qla2xxx [:05:00.0]-00af:9: Performing ISP error recovery - ha=8800ab7c4000. [484702.063799] qla2xxx [:05:00.0]-500a:9: LOOP UP detected (4 Gbps). [484702.112814] qla2xxx [:05:00.0]-0121:9: Failed to enable receiving of RSCN requests: 0x2. [484702.743687] qla2xxx [:05:00.0]-5003:9: ISP System Error - mbx1=c19h mbx2=10h mbx3=0h mbx7=0h. [484702.754050] qla2xxx [:05:00.0]-d007:9: Firmware has been previously dumped (c90002b84000) -- ignoring request. [484703.619362] qla2xxx [:05:00.0]-00af:9: Performing ISP error recovery - ha=8800ab7c4000. [484704.459181] qla2xxx [:05:00.0]-500a:9: LOOP UP detected (4 Gbps). [484704.508170] qla2xxx [:05:00.0]-0121:9: Failed to enable receiving of RSCN requests: 0x2. [484704.854664] qla2xxx [:05:00.0]-5003:9: ISP System Error - mbx1=c19h mbx2=10h mbx3=0h mbx7=0h. [484704.865014] qla2xxx [:05:00.0]-d007:9: Firmware has been previously dumped (c90002b84000) -- ignoring request. [484734.867554] qla2xxx [:05:00.0]-d007:9: Firmware has been previously dumped (c90002b84000) -- ignoring request. [484764.883993] qla2xxx [:05:00.0]-d007:9: Firmware has been previously dumped (c90002b84000) -- ignoring request. [484794.900464] qla2xxx [:05:00.0]-d007:9: Firmware has been previously dumped (c90002b84000) -- ignoring request. [484824.916954] qla2xxx [:05:00.0]-d007:9: Firmware has been previously dumped (c90002b84000) -- ignoring request. [484854.933415] qla2xxx [:05:00.0]-d007:9: Firmware has been previously dumped (c90002b84000) -- ignoring request. [484884.953887] qla2xxx [:05:00.0]-d007:9: Firmware has been previously dumped (c90002b84000) -- ignoring request. [484914.974377] qla2xxx [:05:00.0]-d007:9: Firmware has been previously dumped (c90002b84000) -- ignoring request. [484918.761483] INFO: task kworker/2:17:36759 blocked for more than 120 seconds. [484918.778839] Not tainted 4.2.0-0.bpo.1-amd64 #1 [484918.793941] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [484918.812578] kworker/2:17D 88042e855840 0 36759 2 0x [484918.812597] Workqueue: qla_tgt_wq qlt_create_sess_from_atio [qla2xxx] [484918.812607] 880108076500 0046 88009e473d80 880107cef040 [484918.812613] 0286 88009e474000 880426a5f9a4 880108076500 [484918.812624] 880426a5f9a8 0296 8154f26f [484918.812626] Call Trace: [484918.812632] [] ? schedule+0x2f/0x70 [484918.812635] [] ? schedule_preempt_disabled+0xe/0x20 [484918.812643] [] ? __mutex_lock_slowpath+0x85/0x100 [484918.812649] [] ? mutex_lock+0x1b/0x30 [484918.812659] [] ? qlt_create_sess_from_atio+0x12a/0x1c0 [qla2xxx] [484918.812668] [] ? process_one_work+0x14a/0x3d0 [484918.812671] [] ? worker_thread+0x65/0x470 [484918.812675] [] ? rescuer_thread+0x2f0/0x2f0 [484918.812677] [] ? kthread+0xd3/0xf0 [484918.812680] [] ? kthread_create_on_node+0x170/0x170 [484918.812684] [] ? ret_from_fork+0x3f/0x70 [484918.812687] [] ? kthread_create_on_node+0x170/0x170 [484944.994831] qla2xxx [:05:00.0]-d007:9: Firmware has been previously dumped (c90002b84000) -- ignoring request. [484975.019311] qla2xxx [:05:00.0]-d007:9: Firmware has been previously dumped (c90002b84000) -- ignoring request. [484975.559187] qla2xxx [:05:00.0]-00af:9: Performing ISP error recovery - ha=8800ab7c4000. [484976.430963] qla2xxx [:05:00.0]-500a:9: LOOP UP detected (4 Gbps). [484976.448002]
qla2xxx firmware crashes in target mode
Hi folks, So this is a bit of a strange situation I'm in, where my *target* qla2xxx firmware appears to get stuck when the *initiator* kernel is 4.1+. The target is an Intel system with a QLE2464 running kernel 4.2.1 (from Debian) and using fw=7.03.00. The initiator is another Intel system with a QLE2460 and using fw=7.03.00. They are connected by direct fibre link, there are no switches / fabric involved. The initiator and target are both stable when the initiator is running kernel 4.0 or lower. When the initiator is running a 4.1 or 4.2 kernel, the *target* firmware becomes unstable and the initiator times out IOs and generally becomes very unhappy. When booting a 4.1+ kernel on the initiator, everything appears to work well for a little while (up to an hour or so) before the issue manifests itself. At some point I see the "ISP System Error" message and IO locks up. To get out of this situation I need to reboot the initiator; the target appears to recover by itself. Do you know about this issue? I can debug further (e.g. try to bisect it?) if required but no point if you know about it already. dmesg from the target end (I haven't been able to capture the initiator end): [484701.194971] qla2xxx [:05:00.0]-5003:9: ISP System Error - mbx1=c19h mbx2=10h mbx3=0h mbx7=0h. [484701.222021] qla2xxx [:05:00.0]-d001:9: Firmware dump saved to temp buffer (9/c90002b84000), dump status flags (0x3f). [484701.222082] qla2xxx [:05:00.0]-00af:9: Performing ISP error recovery - ha=8800ab7c4000. [484702.063799] qla2xxx [:05:00.0]-500a:9: LOOP UP detected (4 Gbps). [484702.112814] qla2xxx [:05:00.0]-0121:9: Failed to enable receiving of RSCN requests: 0x2. [484702.743687] qla2xxx [:05:00.0]-5003:9: ISP System Error - mbx1=c19h mbx2=10h mbx3=0h mbx7=0h. [484702.754050] qla2xxx [:05:00.0]-d007:9: Firmware has been previously dumped (c90002b84000) -- ignoring request. [484703.619362] qla2xxx [:05:00.0]-00af:9: Performing ISP error recovery - ha=8800ab7c4000. [484704.459181] qla2xxx [:05:00.0]-500a:9: LOOP UP detected (4 Gbps). [484704.508170] qla2xxx [:05:00.0]-0121:9: Failed to enable receiving of RSCN requests: 0x2. [484704.854664] qla2xxx [:05:00.0]-5003:9: ISP System Error - mbx1=c19h mbx2=10h mbx3=0h mbx7=0h. [484704.865014] qla2xxx [:05:00.0]-d007:9: Firmware has been previously dumped (c90002b84000) -- ignoring request. [484734.867554] qla2xxx [:05:00.0]-d007:9: Firmware has been previously dumped (c90002b84000) -- ignoring request. [484764.883993] qla2xxx [:05:00.0]-d007:9: Firmware has been previously dumped (c90002b84000) -- ignoring request. [484794.900464] qla2xxx [:05:00.0]-d007:9: Firmware has been previously dumped (c90002b84000) -- ignoring request. [484824.916954] qla2xxx [:05:00.0]-d007:9: Firmware has been previously dumped (c90002b84000) -- ignoring request. [484854.933415] qla2xxx [:05:00.0]-d007:9: Firmware has been previously dumped (c90002b84000) -- ignoring request. [484884.953887] qla2xxx [:05:00.0]-d007:9: Firmware has been previously dumped (c90002b84000) -- ignoring request. [484914.974377] qla2xxx [:05:00.0]-d007:9: Firmware has been previously dumped (c90002b84000) -- ignoring request. [484918.761483] INFO: task kworker/2:17:36759 blocked for more than 120 seconds. [484918.778839] Not tainted 4.2.0-0.bpo.1-amd64 #1 [484918.793941] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [484918.812578] kworker/2:17D 88042e855840 0 36759 2 0x [484918.812597] Workqueue: qla_tgt_wq qlt_create_sess_from_atio [qla2xxx] [484918.812607] 880108076500 0046 88009e473d80 880107cef040 [484918.812613] 0286 88009e474000 880426a5f9a4 880108076500 [484918.812624] 880426a5f9a8 0296 8154f26f [484918.812626] Call Trace: [484918.812632] [] ? schedule+0x2f/0x70 [484918.812635] [] ? schedule_preempt_disabled+0xe/0x20 [484918.812643] [] ? __mutex_lock_slowpath+0x85/0x100 [484918.812649] [] ? mutex_lock+0x1b/0x30 [484918.812659] [] ? qlt_create_sess_from_atio+0x12a/0x1c0 [qla2xxx] [484918.812668] [] ? process_one_work+0x14a/0x3d0 [484918.812671] [] ? worker_thread+0x65/0x470 [484918.812675] [] ? rescuer_thread+0x2f0/0x2f0 [484918.812677] [] ? kthread+0xd3/0xf0 [484918.812680] [] ? kthread_create_on_node+0x170/0x170 [484918.812684] [] ? ret_from_fork+0x3f/0x70 [484918.812687] [] ? kthread_create_on_node+0x170/0x170 [484944.994831] qla2xxx [:05:00.0]-d007:9: Firmware has been previously dumped (c90002b84000) -- ignoring request. [484975.019311] qla2xxx [:05:00.0]-d007:9: Firmware has been previously dumped (c90002b84000) -- ignoring request. [484975.559187] qla2xxx [:05:00.0]-00af:9: Performing ISP error recovery - ha=8800ab7c4000. [484976.430963] qla2xxx [:05:00.0]-500a:9: LOOP UP detected (4 Gbps). [484976.448002]