Thanks a lot for your explaination, Mike.

In my orginal problem, a logout was issued manually after login. At the time
of this command was issued, kernel oops happened.

For the new test, it failed int the same place (in spi_device_match). I
cannot track the time when logout was issued, but I suspect it was the same
time when kernel oops happened. Below is the kernel log in my new test.

-Kevin


Oct 24 14:00:48 qye-cms kernel: [184273.268108] scsi55415 : iSCSI Initiator
over TCP/IP
Oct 24 14:00:48 qye-cms kernel: [184273.523222] scsi 55415:0:0:91:
Direct-Access     IET      VIRTUAL-DISK     0    PQ: 0 ANSI: 4
Oct 24 14:00:48 qye-cms kernel: [184273.528678] sd 55415:0:0:91: [sdb]
1024000 512-byte hardware sectors (524 MB)
Oct 24 14:00:48 qye-cms kernel: [184273.529105] sd 55415:0:0:91: [sdb] Write
Protect is off
Oct 24 14:00:48 qye-cms kernel: [184273.529109] sd 55415:0:0:91: [sdb] Mode
Sense: 77 00 00 08
Oct 24 14:00:48 qye-cms kernel: [184273.529545] sd 55415:0:0:91: [sdb] Write
cache: disabled, read cache: enabled, doesn't support DPO or FUA
Oct 24 14:00:48 qye-cms kernel: [184273.530040] sd 55415:0:0:91: [sdb]
1024000 512-byte hardware sectors (524 MB)
Oct 24 14:00:48 qye-cms kernel: [184273.531361] sd 55415:0:0:91: [sdb] Write
Protect is off
Oct 24 14:00:48 qye-cms kernel: [184273.531367] sd 55415:0:0:91: [sdb] Mode
Sense: 77 00 00 08
Oct 24 14:00:48 qye-cms kernel: [184273.533646] sd 55415:0:0:91: [sdb] Write
cache: disabled, read cache: enabled, doesn't support DPO or FUA
Oct 24 14:01:08 qye-cms kernel: [184273.533654]  sdb:<3> connection55413:0:
ping timeout of 15 secs expired, last rx 46079163, last ping 46080413, now
46084163
Oct 24 14:01:08 qye-cms kernel: [184293.496601]  connection55413:0: detected
conn error (1011)
Oct 24 14:03:08 qye-cms kernel: [184413.530473]  session55413: session
recovery timed out after 120 secs
Oct 24 14:03:08 qye-cms kernel: [184413.534610] sd 55415:0:0:91: [sdb]
Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK
Oct 24 14:03:08 qye-cms kernel: [184413.534697] end_request: I/O error, dev
sdb, sector 8
Oct 24 14:03:08 qye-cms kernel: [184413.534804] Buffer I/O error on device
sdb, logical block 1
Oct 24 14:03:08 qye-cms kernel: [184413.535039] sd 55415:0:0:91: [sdb]
Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK
Oct 24 14:03:08 qye-cms kernel: [184413.535044] end_request: I/O error, dev
sdb, sector 8
Oct 24 14:03:08 qye-cms kernel: [184413.535053] Buffer I/O error on device
sdb, logical block 1
Oct 24 14:03:08 qye-cms kernel: [184413.535163] Dev sdb: unable to read RDB
block 8
Oct 24 14:03:08 qye-cms kernel: [184413.537038] sd 55415:0:0:91: [sdb]
Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK
Oct 24 14:03:08 qye-cms kernel: [184413.537048] end_request: I/O error, dev
sdb, sector 24
Oct 24 14:03:08 qye-cms kernel: [184413.537057] Buffer I/O error on device
sdb, logical block 3
Oct 24 14:03:08 qye-cms kernel: [184413.537185] sd 55415:0:0:91: [sdb]
Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK
Oct 24 14:03:08 qye-cms kernel: [184413.537189] end_request: I/O error, dev
sdb, sector 24
Oct 24 14:03:08 qye-cms kernel: [184413.537192] Buffer I/O error on device
sdb, logical block 3
Oct 24 14:03:08 qye-cms kernel: [184413.537326]  unable to read partition
table
Oct 24 14:03:08 qye-cms kernel: [184413.537398] sd 55415:0:0:91: [sdb]
Attached SCSI disk
Oct 24 14:03:08 qye-cms kernel: [184413.537478] sd 55415:0:0:91: Attached
scsi generic sg1 type 0
Oct 24 14:05:03 qye-cms kernel: [184413.597688] BUG: unable to handle kernel
NULL pointer dereference at virtual address 00000060
Oct 24 14:05:03 qye-cms kernel: [184413.684629] printing eip: e08a212a *pde
= 00000000
Oct 24 14:05:03 qye-cms kernel: [184413.684907] Oops: 0000 [#1] SMP
Oct 24 14:05:03 qye-cms kernel: [184413.685202] Modules linked in:
iscsi_trgt crc32c libcrc32c vmblock vmmemctl cpufreq_conservative
cpufreq_ondemand cpufreq_userspace cpufreq_stats freq_table
cpufreq_powersave sbs video output sbshc dock battery iptable_filter
ip_tables x_tables vmhgfs iscsi_tcp libiscsi scsi_transport_iscsi lp loop
ipv6 parport_pc parport evdev container serio_raw psmouse ac button
i2c_piix4 intel_agp i2c_core pcspkr agpgart shpchp pci_hotplug ext3 jbd
mbcache sg sd_mod floppy pcnet32 mptspi mptscsih mptbase mii pata_acpi
ata_generic scsi_transport_spi ata_piix libata scsi_mod raid10 raid456
async_xor async_memcpy async_tx xor raid1 raid0 multipath linear md_mod
dm_mirror dm_snapshot dm_mod thermal processor fan fbcon tileblit font
bitblit softcursor fuse vmxnet
Oct 24 14:05:03 qye-cms kernel: [184413.686686]
Oct 24 14:05:03 qye-cms kernel: [184413.686773] Pid: 3770, comm: iscsid Not
tainted (2.6.24-24-generic #1)
Oct 24 14:05:03 qye-cms kernel: [184413.686850] EIP: 0060:[<e08a212a>]
EFLAGS: 00010202 CPU: 0
Oct 24 14:05:03 qye-cms kernel: [184413.725583] EIP is at
spi_device_match+0x1a/0x60 [scsi_transport_spi]
Oct 24 14:05:03 qye-cms kernel: [184413.725680] EAX: 00000000 EBX: df1c0cb0
ECX: df1c0c00 EDX: df1c0cb0
Oct 24 14:05:03 qye-cms kernel: [184413.725766] ESI: df1c0cb0 EDI: dfbd8a00
EBP: c0286000 ESP: deaf3b8c
Oct 24 14:05:03 qye-cms kernel: [184413.727201]  DS: 007b ES: 007b FS: 00d8
GS: 0033 SS: 0068
Oct 24 14:05:03 qye-cms kernel: [184413.727287] Process iscsid (pid: 3770,
ti=deaf2000 task=dda1e5c0 task.ti=deaf2000)
Oct 24 14:05:03 qye-cms kernel: [184413.727370] Stack: e08a7c90 c0285c8f
e095e328 df1c0dd8 de1ee030 df1c0c00 df1c0cb0 dfbd8a00
Oct 24 14:05:03 qye-cms kernel: [184413.727532]        00000202 e09449cd
df1c0c00 de1ee030 e0944a2f df1c0c00 de1ee000 e0944acc
Oct 24 14:05:03 qye-cms kernel: [184413.727650]        dfbd8a14 de1ee8a4
e0944b50 00000000 e0944b64 00000000 c02805c2 de1ee8a4
Oct 24 14:05:03 qye-cms kernel: [184413.727770] Call Trace:
Oct 24 14:05:03 qye-cms kernel: [184413.727961]  [<c0285c8f>]
attribute_container_device_trigger+0x4f/0xb0
Oct 24 14:05:03 qye-cms kernel: [184413.728262]  [<e09449cd>]
__scsi_remove_device+0x3d/0x80 [scsi_mod]
Oct 24 14:05:03 qye-cms kernel: [184413.747281]  [<e0944a2f>]
scsi_remove_device+0x1f/0x30 [scsi_mod]
Oct 24 14:05:03 qye-cms kernel: [184413.747364]  [<e0944acc>]
__scsi_remove_target+0x8c/0xc0 [scsi_mod]
Oct 24 14:05:03 qye-cms kernel: [184413.747443]  [<e0944b50>]
__remove_child+0x0/0x20 [scsi_mod]
Oct 24 14:05:03 qye-cms kernel: [184413.747523]  [<e0944b64>]
__remove_child+0x14/0x20 [scsi_mod]
Oct 24 14:05:03 qye-cms kernel: [184413.747599]  [<c02805c2>]
device_for_each_child+0x22/0x40
Oct 24 14:05:03 qye-cms kernel: [184413.747684]  [<e0944b3e>]
scsi_remove_target+0x3e/0x50 [scsi_mod]
Oct 24 14:05:03 qye-cms kernel: [184413.747766]  [<e0a9185c>]
__iscsi_unbind_session+0x6c/0xa0 [scsi_transport_iscsi]
Oct 24 14:05:03 qye-cms kernel: [184413.747910]  [<e0a91948>]
iscsi_remove_session+0xb8/0x120 [scsi_transport_iscsi]
Oct 24 14:05:03 qye-cms kernel: [184413.748020]  [<e0a9bac6>]
iscsi_session_teardown+0x26/0x80 [libiscsi]
Oct 24 14:05:03 qye-cms kernel: [184413.804030]  [<e0a9baa0>]
iscsi_session_teardown+0x0/0x80 [libiscsi]
Oct 24 14:05:03 qye-cms kernel: [184413.804104]  [<e0a919b8>]
iscsi_destroy_session+0x8/0x20 [scsi_transport_iscsi]
Oct 24 14:05:03 qye-cms kernel: [184413.804217]  [<e0a9bb05>]
iscsi_session_teardown+0x65/0x80 [libiscsi]
Oct 24 14:05:03 qye-cms kernel: [184413.804291]  [<e0a9baa0>]
iscsi_session_teardown+0x0/0x80 [libiscsi]
Oct 24 14:05:03 qye-cms kernel: [184413.804362]  [<e0a90030>]
iscsi_iter_session_fn+0x0/0x30 [scsi_transport_iscsi]
Oct 24 14:05:03 qye-cms kernel: [184413.804484]  [<e0a9004f>]
iscsi_iter_session_fn+0x1f/0x30 [scsi_transport_iscsi]
Oct 24 14:05:03 qye-cms kernel: [184413.804596]  [<c02805c2>]
device_for_each_child+0x22/0x40
Oct 24 14:05:03 qye-cms kernel: [184413.804667]  [<e0a906f0>]
iscsi_if_rx+0x0/0x860 [scsi_transport_iscsi]
Oct 24 14:05:03 qye-cms kernel: [184413.804740]  [<e0a9bb2d>]
iscsi_host_remove+0xd/0x20 [libiscsi]
Oct 24 14:05:03 qye-cms kernel: [184413.804813]  [<e0aa6324>]
iscsi_tcp_session_destroy+0x34/0x50 [iscsi_tcp]
Oct 24 14:05:03 qye-cms kernel: [184413.804935]  [<e0a90ec4>]
iscsi_if_rx+0x7d4/0x860 [scsi_transport_iscsi]
Oct 24 14:05:03 qye-cms kernel: [184413.805011]  [<c018d13d>]
__slab_free+0x16d/0x2a0
Oct 24 14:05:03 qye-cms kernel: [184413.805133]  [<e0a90714>]
iscsi_if_rx+0x24/0x860 [scsi_transport_iscsi]
Oct 24 14:05:03 qye-cms kernel: [184413.805204]  [<c02a3a98>]
__kfree_skb+0x8/0x80
Oct 24 14:05:03 qye-cms kernel: [184413.805299]  [<e0a906f0>]
iscsi_if_rx+0x0/0x860 [scsi_transport_iscsi]
Oct 24 14:05:03 qye-cms kernel: [184413.805380]  [<c02c282d>]
netlink_unicast+0x1dd/0x210
Oct 24 14:05:03 qye-cms kernel: [184413.805457]  [<c021aeee>]
copy_from_user+0x2e/0x70
Oct 24 14:05:03 qye-cms kernel: [184413.805574]  [<c02c30a6>]
netlink_sendmsg+0x226/0x2f0
Oct 24 14:05:03 qye-cms kernel: [184413.805644]  [<c029dc51>]
sock_sendmsg+0x111/0x130
Oct 24 14:05:03 qye-cms kernel: [184413.805709]  [<c0140c20>]
autoremove_wake_function+0x0/0x40
Oct 24 14:05:03 qye-cms kernel: [184413.805807]  [<c0140000>]
sys_timer_create+0x2c0/0x330
Oct 24 14:05:03 qye-cms kernel: [184413.805885]  [<c0199457>]
do_lookup+0x67/0x1c0
Oct 24 14:05:03 qye-cms kernel: [184413.805967]  [<c019b87b>]
__link_path_walk+0xaab/0xe10
Oct 24 14:05:03 qye-cms kernel: [184413.806034]  [<c021aeee>]
copy_from_user+0x2e/0x70
Oct 24 14:05:03 qye-cms kernel: [184413.806097]  [<c021aeee>]
copy_from_user+0x2e/0x70
Oct 24 14:05:03 qye-cms kernel: [184413.806160]  [<c029ddc9>]
sys_sendmsg+0x159/0x270
Oct 24 14:05:03 qye-cms kernel: [184413.806225]  [<c029ec4d>]
sys_recvmsg+0x17d/0x230
Oct 24 14:05:03 qye-cms kernel: [184413.806289]  [<c01a2ffd>]
d_kill+0x3d/0x60
Oct 24 14:05:03 qye-cms kernel: [184413.806378]  [<c01a2ffd>]
d_kill+0x3d/0x60
Oct 24 14:05:03 qye-cms kernel: [184413.806440]  [<c019abca>]
getname+0xaa/0xe0
Oct 24 14:05:03 qye-cms kernel: [184413.806498]  [<c021b170>]
copy_to_user+0x30/0x60
Oct 24 14:05:03 qye-cms kernel: [184413.806559]  [<c0194f59>]
cp_new_stat64+0xf9/0x110
Oct 24 14:05:03 qye-cms kernel: [184413.806628]  [<c029f334>]
sys_socketcall+0xb4/0x2b0
Oct 24 14:05:03 qye-cms kernel: [184413.806695]  [<c01043b2>]
sysenter_past_esp+0x6b/0xa9
Oct 24 14:05:03 qye-cms kernel: [184413.806840]  =======================
Oct 24 14:05:03 qye-cms kernel: [184413.806908] Code: 08 88 50 07 b8 08 00
00 00 c3 8d b4 26 00 00 00 00 53 89 d0 89 d3 e8 b6 1b 0a 00 85 c0 74 1c 8b
83 50 ff ff ff 8d 8b 50 ff ff ff <8b> 40 60 85 c0 74 09 81 78 1c c0 7c 8a e0
74 06 31 c0 5b c3 66
Oct 24 14:05:03 qye-cms kernel: [184413.807225] EIP: [<e08a212a>]
spi_device_match+0x1a/0x60 [scsi_transport_spi] SS:ESP 0068:deaf3b8c
Oct 24 14:05:03 qye-cms kernel: [184413.807764] ---[ end trace
756dbf4d8889f5b8 ]---
Oct 24 14:05:03 qye-cms kernel: [184413.851184] sd 55415:0:0:91: [sdb]
Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK
Oct 24 14:05:03 qye-cms kernel: [184413.851329] end_request: I/O error, dev
sdb, sector 0
Oct 24 14:05:03 qye-cms kernel: [184413.851395] Buffer I/O error on device
sdb, logical block 0
Oct 24 14:05:03 qye-cms kernel: [184413.851465] Buffer I/O error on device
sdb, logical block 1
Oct 24 14:05:03 qye-cms kernel: [184413.851531] Buffer I/O error on device
sdb, logical block 2
Oct 24 14:05:03 qye-cms kernel: [184413.851597] Buffer I/O error on device
sdb, logical block 3
Oct 24 14:05:03 qye-cms kernel: [184413.851718] sd 55415:0:0:91: [sdb]
Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK
Oct 24 14:05:03 qye-cms kernel: [184413.851845] end_request: I/O error, dev
sdb, sector 0
Oct 24 14:05:03 qye-cms kernel: [184413.851909] Buffer I/O error on device
sdb, logical block 0


On Tue, Oct 27, 2009 at 4:12 PM, Mike Christie <micha...@cs.wisc.edu> wrote:

>
> Kevin Ye wrote:
> > Hi Mike,
> >
> > I reproduced this problem with two simple scripts, one is to continuous
> > login and then logout a target, the other script randomly fails the
> network
> > and resume the network after 30 or 200 seconds.
> >
> > script1:
> > while [ 1 ]
> > do
> >   date
> >   iscsiadm -m node -l -T [target_name]-p [target_ip]
> >   date
> >   iscsiadm -m node -u -T [target_name] -p [target_ip]
> > done
> >
> > script 2:
> > while [ 1 ]
> > do
> >   echo "failing the wan"
> >   ./disconnectip.sh 192.168.1.160
> >   sleep 30
> >   echo "unfailing the wan"
> >   ./reconnectip.sh 192.168.1.160
> >   sleep 300
> >   echo "failing the wan"
> >   ./disconnectip.sh 192.168.1.160
> >   sleep 200
> >   echo "unfailing the wan"
> >   ./reconnectip.sh 192.168.1.160
> > done
> >
> > I hit the oops after 1 days of test. In this test, I didn't hit target
> NULL
> > problem during logout. I think that the target NULL problem I mentioned
> > before is caused by the killing of login process in my script due to
> > timeout.
> >
> > I analyzed all the kernel oops I hit so far, it seems that if the network
> is
> > failed just before the login process finish, then after 15 seconds of
> > network down (less than 15 seconds after we see the kernel messate
> "Attached
> > SCSI disk"), it complains "connectionx:0: ping timeout of 15 secs
> expired,
> > last rx x, last ping x, now x".
> >
> > Any idea what's the problem? Thanks.
> >
>
> Your original problem came when the network was not accessible. The
> iscsi initiator sends a iscsi ping every noop_timout seconds. When you
> saw the ping/nop timeout message it means that a iscsi ping timedout.
> The iscsi initiator will then, drop the connection and try to relogin
> every X seconds.
>
> In the oops you sent we saw that something was forcing a logout and
> shutdown of the session at this time, but what I cannot figure out is
> why the oops is failing in code not related to iscsi. For some reason we
> are in the SPI code (so something related to mptspi maybe, which uses
> scsi_transport_spi which where spi_device_match is.
>
>  > > >>> Oct  9 21:16:10 ian_ser_2 kernel: [28486.052073] EIP is at
>  > > >>> spi_device_match+0x1a/0x60 [scsi_transport_spi]
>  > > >>> Oct  9 21:16:10 ian_ser_2 kernel: [28486.052178] EAX: 00000000
> EBX:
>  > >> c27ff0b0
>  > > >>> ECX: c27ff000 EDX: c27ff0b0
>  > > >>> Oct  9 21:16:10 ian_ser_2 kernel: [28486.052274] ESI: c27ff0b0
> EDI:
>  > >> d0c31800
>
> I think I saw this with older kernels, which is why I asked for you to
> try a newer one before. However, I was looking through the git commits
> and did not see any related fixes so I am not sure.
>
> In the original problem, where you running a logout command manually or
> did you run somehting like /etc/init.d/iscsi restart (script name may be
> different)
>
> In the oopses in your new tests, are you failing in the same place (in
> spi_device_match)?
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~----------~----~----~----~------~----~------~--~---

Reply via email to