Hi Mike,

I reproduced this problem with two simple scripts, one is to continuous
login and then logout a target, the other script randomly fails the network
and resume the network after 30 or 200 seconds.

script1:
while [ 1 ]
do
  date
  iscsiadm -m node -l -T [target_name]-p [target_ip]
  date
  iscsiadm -m node -u -T [target_name] -p [target_ip]
done

script 2:
while [ 1 ]
do
  echo "failing the wan"
  ./disconnectip.sh 192.168.1.160
  sleep 30
  echo "unfailing the wan"
  ./reconnectip.sh 192.168.1.160
  sleep 300
  echo "failing the wan"
  ./disconnectip.sh 192.168.1.160
  sleep 200
  echo "unfailing the wan"
  ./reconnectip.sh 192.168.1.160
done

I hit the oops after 1 days of test. In this test, I didn't hit target NULL
problem during logout. I think that the target NULL problem I mentioned
before is caused by the killing of login process in my script due to
timeout.

I analyzed all the kernel oops I hit so far, it seems that if the network is
failed just before the login process finish, then after 15 seconds of
network down (less than 15 seconds after we see the kernel messate "Attached
SCSI disk"), it complains "connectionx:0: ping timeout of 15 secs expired,
last rx x, last ping x, now x".

Any idea what's the problem? Thanks.

Regards,
Kevin


On Wed, Oct 21, 2009 at 12:46 PM, Mike Christie <micha...@cs.wisc.edu>wrote:

>
> Kevin Ye wrote:
> > Thanks Mike.
> >
> > I did the tests you mentioned a couple of times, and it didn't cause
> kernel
> > oops.
> >
> > The kernel Oops I hit does not happen often. I hit twice in last 4 weeks.
> >
> > kernel patch is welcome and I will give it a try. Thanks.
> >
>
> Shoot, let me do some digging. I was hopping one of those manual
> commands would fire the problem. The one where you pull the cable
> yourself should have run over the same code and caused it.
>
> Are you using multipath? If not, for now you can just disble nops/pings.
> Set the noop timeout and noop interval to 0 for every target you have
> setup, and set this in the iscsid.conf (you could also set it in
> iscsid.conf then rediscovery the targets so it will get picked up).
>
>
>
> > Kevin
> >
> > On Thu, Oct 15, 2009 at 12:06 PM, Mike Christie <micha...@cs.wisc.edu
> >wrote:
> >
> >> On 10/14/2009 05:11 PM, Kevin Ye wrote:
> >>> Hi All,
> >>>
> >>> We hit the kernel oops again on our setup. Any suggestion to fix that?
> >> If you just login then logout manually
> >>
> >> iscsiadm -m session -u
> >>
> >> Does that cause an oops?
> >>
> >>
> >> If you log back in, then pull the network cable, wait to see the ping
> >> timeout messages then manually logout
> >>
> >> iscsiadm -m session -u
> >>
> >> Does that cause an oops?
> >>
> >>
> >> Can you rebuild your kernel, if I send you a patch?
> >>
> >>
> >>
> >>> Thanks.
> >>>
> >>> Our set up is:
> >>> kernel: 2.6.24-24
> >>> open-iscsi: 2.0-870.3
> >>>
> >>> kernel logs:
> >>> Oct  9 21:15:50 ian_ser_2 kernel: [28466.697051] scsi841 : iSCSI
> >> Initiator
> >>> over TCP/IP
> >>> Oct  9 21:15:50 ian_ser_2 kernel: [28466.962031] scsi 841:0:0:201:
> >>> Direct-Access     IET      VIRTUAL-DISK     0    PQ: 0 ANSI: 4
> >>> Oct  9 21:15:50 ian_ser_2 kernel: [28466.969109] sd 841:0:0:201: [sdd]
> >>> 4505472 512-byte hardware sectors (2307 MB)
> >>> Oct  9 21:15:50 ian_ser_2 kernel: [28466.973314] sd 841:0:0:201: [sdd]
> >> Write
> >>> Protect is off
> >>> Oct  9 21:15:50 ian_ser_2 kernel: [28466.973320] sd 841:0:0:201: [sdd]
> >> Mode
> >>> Sense: 77 00 00 08
> >>> Oct  9 21:15:50 ian_ser_2 kernel: [28466.975420] sd 841:0:0:201: [sdd]
> >> Write
> >>> cache: disabled, read cache: disabled, doesn't support DPO or FUA
> >>> Oct  9 21:15:50 ian_ser_2 kernel: [28466.977468] sd 841:0:0:201: [sdd]
> >>> 4505472 512-byte hardware sectors (2307 MB)
> >>> Oct  9 21:15:50 ian_ser_2 kernel: [28466.977938] sd 841:0:0:201: [sdd]
> >> Write
> >>> Protect is off
> >>> Oct  9 21:15:50 ian_ser_2 kernel: [28466.977944] sd 841:0:0:201: [sdd]
> >> Mode
> >>> Sense: 77 00 00 08
> >>> Oct  9 21:15:50 ian_ser_2 kernel: [28466.981749] sd 841:0:0:201: [sdd]
> >> Write
> >>> cache: disabled, read cache: disabled, doesn't support DPO or FUA
> >>> Oct  9 21:15:50 ian_ser_2 kernel: [28466.981761]  sdd: sdd1
> >>> Oct  9 21:15:50 ian_ser_2 kernel: [28467.027801] sd 841:0:0:201: [sdd]
> >>> Attached SCSI disk
> >>> Oct  9 21:15:50 ian_ser_2 kernel: [28467.027886] sd 841:0:0:201:
> Attached
> >>> scsi generic sg4 type 0
> >>> Oct  9 21:16:01 ian_ser_2 kernel: [28477.713280]  connection626:0: ping
> >>> timeout of 15 secs expired, last rx 7049831, last ping 7052331, now
> >> 7056081
> >>> Oct  9 21:16:01 ian_ser_2 kernel: [28477.713467]  connection626:0:
> >> detected
> >>> conn error (1011)
> >>> Oct  9 21:16:01 ian_ser_2 kernel: [28477.717268]  connection627:0: ping
> >>> timeout of 15 secs expired, last rx 7049832, last ping 7052332, now
> >> 7056082
> >>> Oct  9 21:16:01 ian_ser_2 kernel: [28477.717458]  connection627:0:
> >> detected
> >>> conn error (1011)
> >>> Oct  9 21:16:10 ian_ser_2 kernel: [28486.049414] BUG: unable to handle
> >>> kernel NULL pointer dereference at virtual address 00000060
> >>> Oct  9 21:16:10 ian_ser_2 kernel: [28486.049639] printing eip: e08a212a
> >> *pde
> >>> = 00000000
> >>> Oct  9 21:16:10 ian_ser_2 kernel: [28486.049924] Oops: 0000 [#1] SMP
> >>> Oct  9 21:16:10 ian_ser_2 kernel: [28486.050100] Modules linked in:
> >>> iscsi_tcp libiscsi scsi_transport_iscsi iscsi_trgt crc32c libcrc32c
> >>> nls_iso8859_1 nls_cp437 vfat fat vmmemctl cpufreq_conservative
> >>> cpufreq_ondemand cpufreq_userspace cpufreq_stats freq_table
> >>> cpufreq_powersave sbs video output sbshc dock battery iptable_filter
> >>> ip_tables x_tables vmhgfs lp loop ipv6 container serio_raw ac button
> >> evdev
> >>> parport_pc parport i2c_piix4 i2c_core intel_agp agpgart shpchp
> >> pci_hotplug
> >>> psmouse pcspkr ext3 jbd mbcache sd_mod sg sr_mod cdrom pata_acpi
> >> ata_generic
> >>> floppy pcnet32 mii mptspi mptscsih mptbase scsi_transport_spi ata_piix
> >>> libata scsi_mod raid10 raid456 async_xor async_memcpy async_tx xor
> raid1
> >>> raid0 multipath linear md_mod dm_mirror dm_snapshot dm_mod thermal
> >> processor
> >>> fan fbcon tileblit font bitblit softcursor fuse vmxnet
> >>> Oct  9 21:16:10 ian_ser_2 kernel: [28486.051174]
> >>>
> >>> Oct  9 21:16:10 ian_ser_2 kernel: [28486.051174]
> >>> Oct  9 21:16:10 ian_ser_2 kernel: [28486.051286] Pid: 16444, comm:
> >>> iscsi_scan_839 Not tainted (2.6.24-24-generic #1)
> >>> Oct  9 21:16:10 ian_ser_2 kernel: [28486.051433] EIP: 0060:[<e08a212a>]
> >>> EFLAGS: 00010202 CPU: 0
> >>> Oct  9 21:16:10 ian_ser_2 kernel: [28486.052073] EIP is at
> >>> spi_device_match+0x1a/0x60 [scsi_transport_spi]
> >>> Oct  9 21:16:10 ian_ser_2 kernel: [28486.052178] EAX: 00000000 EBX:
> >> c27ff0b0
> >>> ECX: c27ff000 EDX: c27ff0b0
> >>> Oct  9 21:16:10 ian_ser_2 kernel: [28486.052274] ESI: c27ff0b0 EDI:
> >> d0c31800
> >>> EBP: c0286000 ESP: dc1dfef0
> >>> Oct  9 21:16:10 ian_ser_2 kernel: [28486.052367]  DS: 007b ES: 007b FS:
> >> 00d8
> >>> GS: 0000 SS: 0068
> >>> Oct  9 21:16:10 ian_ser_2 kernel: [28486.052473] Process iscsi_scan_839
> >>> (pid: 16444, ti=dc1de000 task=d7823140 task.ti=dc1de000)
> >>> Oct  9 21:16:10 ian_ser_2 kernel: [28486.052586] Stack: e08a7c90
> c0285c8f
> >>> e095e328 c27ff1d8 cc1c3430 c27ff000 c27ff0b0 d0c31800
> >>> Oct  9 21:16:10 ian_ser_2 kernel: [28486.052792]        00000202
> e09449cd
> >>> c27ff000 cc1c3430 e0944a2f c27ff000 cc1c3400 e0944acc
> >>> Oct  9 21:16:10 ian_ser_2 kernel: [28486.052944]        d0c31814
> cc1c30a4
> >>> e0944b50 00000000 e0944b64 00000000 c02805c2 cc1c30a4
> >>> Oct  9 21:16:10 ian_ser_2 kernel: [28486.053101] Call Trace:
> >>> Oct  9 21:16:10 ian_ser_2 kernel: [28486.053340]  [<c0285c8f>]
> >>> attribute_container_device_trigger+0x4f/0xb0
> >>> Oct  9 21:16:10 ian_ser_2 kernel: [28486.053963]  [<e09449cd>]
> >>> __scsi_remove_device+0x3d/0x80 [scsi_mod]
> >>> Oct  9 21:16:10 ian_ser_2 kernel: [28486.054083]  [<e0944a2f>]
> >>> scsi_remove_device+0x1f/0x30 [scsi_mod]
> >>> Oct  9 21:16:10 ian_ser_2 kernel: [28486.054185]  [<e0944acc>]
> >>> __scsi_remove_target+0x8c/0xc0 [scsi_mod]
> >>> Oct  9 21:16:10 ian_ser_2 kernel: [28486.054284]  [<e0944b50>]
> >>> __remove_child+0x0/0x20 [scsi_mod]
> >>> Oct  9 21:16:10 ian_ser_2 kernel: [28486.054380]  [<e0944b64>]
> >>> __remove_child+0x14/0x20 [scsi_mod]
> >>> Oct  9 21:16:10 ian_ser_2 kernel: [28486.054474]  [<c02805c2>]
> >>> device_for_each_child+0x22/0x40
> >>> Oct  9 21:16:10 ian_ser_2 kernel: [28486.054561]  [<e09ae7f0>]
> >>> __iscsi_unbind_session+0x0/0xa0 [scsi_transport_iscsi]
> >>> Oct  9 21:16:10 ian_ser_2 kernel: [28486.054709]  [<e0944b3e>]
> >>> scsi_remove_target+0x3e/0x50 [scsi_mod]
> >>> Oct  9 21:16:10 ian_ser_2 kernel: [28486.054807]  [<e09ae85c>]
> >>> __iscsi_unbind_session+0x6c/0xa0 [scsi_transport_iscsi]
> >>> Oct  9 21:16:10 ian_ser_2 kernel: [28486.054952]  [<c013ce6f>]
> >>> run_workqueue+0xbf/0x160
> >>> Oct  9 21:16:10 ian_ser_2 kernel: [28486.055040]  [<c013d910>]
> >>> worker_thread+0x0/0xe0
> >>> Oct  9 21:16:10 ian_ser_2 kernel: [28486.055116]  [<c013d994>]
> >>> worker_thread+0x84/0xe0
> >>> Oct  9 21:16:10 ian_ser_2 kernel: [28486.055238]  [<c0140c20>]
> >>> autoremove_wake_function+0x0/0x40
> >>> Oct  9 21:16:10 ian_ser_2 kernel: [28486.055323]  [<c013d910>]
> >>> worker_thread+0x0/0xe0
> >>> Oct  9 21:16:10 ian_ser_2 kernel: [28486.055398]  [<c0140962>]
> >>> kthread+0x42/0x70
> >>> Oct  9 21:16:10 ian_ser_2 kernel: [28486.055469]  [<c0140920>]
> >>> kthread+0x0/0x70
> >>> Oct  9 21:16:10 ian_ser_2 kernel: [28486.055541]  [<c0105667>]
> >>> kernel_thread_helper+0x7/0x10
> >>> Oct  9 21:16:10 ian_ser_2 kernel: [28486.055742]
>  =======================
> >>> Oct  9 21:16:10 ian_ser_2 kernel: [28486.055828] Code: 08 88 50 07 b8
> 08
> >> 00
> >>> 00 00 c3 8d b4 26 00 00 00 00 53 89 d0 89 d3 e8 b6 1b 0a 00 85 c0 74 1c
> >> 8b
> >>> 83 50 ff ff ff 8d 8b 50 ff ff ff<8b>  40 60 85 c0 74 09 81 78 1c c0 7c
> 8a
> >> e0
> >>> 74 06 31 c0 5b c3 66
> >>> Oct  9 21:16:10 ian_ser_2 kernel: [28486.056243] EIP: [<e08a212a>]
> >>> spi_device_match+0x1a/0x60 [scsi_transport_spi] SS:ESP 0068:dc1dfef0
> >>> Oct  9 21:16:10 ian_ser_2 kernel: [28486.057126] ---[ end trace
> >>> dda5576840946286 ]---
> >>>
> >>
> >
> > >
> >
>
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~----------~----~----~----~------~----~------~--~---

Reply via email to