Re: kernel oops error
On 09/26/2011 12:28 AM, Vivek S wrote: On Mon, Sep 26, 2011 at 5:51 AM, Mike Christie micha...@cs.wisc.edu wrote: On 09/25/2011 01:57 AM, Vivek S wrote: It's not working because the locking changed in that upstream kernel. Hmm, okay. I will go back to a kernel that is supported by upstream open-iscsi. I think you misunderstood me. With newer kernels you should just use the kernel modules in the kernel. You do not need the open-iscsi/kernel modules in those kernels. The upstream ones are good enough. I was trying to dig into open-iscsi code and do some experiments with session recovery. So I was using open-iscsi code from github. In the case of the stable kernels they are probably best to use since fixes that get sent upstream are sent to the stable kernels. That means I have to use stable kernels for open-iscsi development ? It depends on what you are working on. The userspace tools work with newer kernels (from 2.6.26 and newer) just fine. If you are just doing userspace work then you can use the open-iscsi tools from the github git tree (and kernel.org when it is back up) or open-iscsi.org tarballs with any of those kernels. If you are doing kernel work then it is best to be using the scsi maintainer's scsi-misc or scsi-rc-fixes tree or my linux-2.6.iscsi tree (I have not put up a new kernel tree because it is basically just scsi-misc right now and I have been hoping kernel.org would be back up soon). -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: kernel oops error
On Sun, Sep 25, 2011 at 6:27 AM, Mike Christie micha...@cs.wisc.edu wrote: On 09/24/2011 02:40 PM, Vivek S wrote: Hi, I am on Dell laptop running Ubuntu 11.04, kernel 2.6.38.11. I modified the kernel Makefile to include the line linux_2_6_38: You should just use the kernel modules that come with that kernel. $(unpatch_code) to help me build open-iscsi. After a successful build i loaded all the open-iscsi kernel modules along with IET kernel module. I have setup IET target in /etc/iet/ietd.conf file. When I execute iscsiadm -m discoverydb -t st -p 192.168.1.2:3260 -Dl What are you trying to do? What is the l at the end for? Are you trying to do discovery and login to the portals found? Yeah, I am trying to discover and login to the portals found. Its an lower case L at the end to specify login. It's not working because the locking changed in that upstream kernel. Hmm, okay. I will go back to a kernel that is supported by upstream open-iscsi. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: kernel oops error
On Sat, Sep 24, 2011 at 9:40 PM, Vivek S vivek...@gmail.com wrote: I am on Dell laptop running Ubuntu 11.04, kernel 2.6.38.11. I'm curious to know who generated kernel 2.6.38.11 ? As far as I know the latest official kernel in the 2.6.38 series is 2.6.38.8 (https://lkml.org/lkml/2011/6/2/417). A quote from the 2.6.38.8 announcement: This is the LAST .38 stable kernel release, please move to the .39-stable tree at this point in time. Bart. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: kernel oops error
The Ubuntu distribution I am using has 2.6.38.8 and also 2.6.38.11. The later got installed through Ubuntu OS updates. On Sun, Sep 25, 2011 at 12:55 PM, Bart Van Assche bvanass...@acm.orgwrote: On Sat, Sep 24, 2011 at 9:40 PM, Vivek S vivek...@gmail.com wrote: I am on Dell laptop running Ubuntu 11.04, kernel 2.6.38.11. I'm curious to know who generated kernel 2.6.38.11 ? As far as I know the latest official kernel in the 2.6.38 series is 2.6.38.8 (https://lkml.org/lkml/2011/6/2/417). A quote from the 2.6.38.8 announcement: This is the LAST .38 stable kernel release, please move to the .39-stable tree at this point in time. Bart. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: kernel oops error
On 09/25/2011 01:57 AM, Vivek S wrote: It's not working because the locking changed in that upstream kernel. Hmm, okay. I will go back to a kernel that is supported by upstream open-iscsi. I think you misunderstood me. With newer kernels you should just use the kernel modules in the kernel. You do not need the open-iscsi/kernel modules in those kernels. The upstream ones are good enough. In the case of the stable kernels they are probably best to use since fixes that get sent upstream are sent to the stable kernels. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: kernel oops error
On Mon, Sep 26, 2011 at 5:51 AM, Mike Christie micha...@cs.wisc.edu wrote: On 09/25/2011 01:57 AM, Vivek S wrote: It's not working because the locking changed in that upstream kernel. Hmm, okay. I will go back to a kernel that is supported by upstream open-iscsi. I think you misunderstood me. With newer kernels you should just use the kernel modules in the kernel. You do not need the open-iscsi/kernel modules in those kernels. The upstream ones are good enough. I was trying to dig into open-iscsi code and do some experiments with session recovery. So I was using open-iscsi code from github. In the case of the stable kernels they are probably best to use since fixes that get sent upstream are sent to the stable kernels. That means I have to use stable kernels for open-iscsi development ? -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: kernel oops error
On 09/24/2011 02:40 PM, Vivek S wrote: Hi, I am on Dell laptop running Ubuntu 11.04, kernel 2.6.38.11. I modified the kernel Makefile to include the line linux_2_6_38: You should just use the kernel modules that come with that kernel. $(unpatch_code) to help me build open-iscsi. After a successful build i loaded all the open-iscsi kernel modules along with IET kernel module. I have setup IET target in /etc/iet/ietd.conf file. When I execute iscsiadm -m discoverydb -t st -p 192.168.1.2:3260 -Dl What are you trying to do? What is the l at the end for? Are you trying to do discovery and login to the portals found? It's not working because the locking changed in that upstream kernel. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: Kernel oops on login
On 03/02/2010 02:35 AM, Ulrich Windl wrote: On 2 Mar 2010 at 1:21, Mike Christie wrote: Make sure everything is unmounted when you delete/logout, and make sure something like dm/md raid/multipath is not using the device. Mike, would it make any sense to wait until the devices are no longer in use? It might be preferrable to a kernel error message. You could still add a force option if someone persists on doing a logout on busy devices. Anyone with root privs can use one of the scsi interfaces (like /sys/block/sdb/device/delete) and remove the device from under other users and cause this problem. The scsi and iscsi layer also supports hot unplugging of devices, so in many cases you can remove a HBA or a disk and the disk should just get cleaned up (like when you just rip out a usb device) and we have no control over the user doing this, but we are supposed to handle it. And the FC layer and other drivers like SAS ones (iscsi is supposed to do it too, but I have not got around to changing the code), will begin the removal process if a port has been disconnected long enough (this is why it is common to hit the problem with dm-multipath and FC). So for the above we do not have any control over it, and the problem is just buggy code which Hannes from SUSE has been working on fixing. If that is fixed then the problem where someone or the iscsi scripts runs iscsiadm ...--logout would also get fixed with it. If you wanted to check if a device was in use to prevent a user mistake like logging out before they unmounted a FS, then yeah, I think it could be useful. I am just not sure how to do it nicely. I think we would have to add a API to the SCSI layer to peak at the device's refcount to check and see if there are any external users. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: Kernel oops on login
Ok so I guess working with old versions of open-iscsi is not accepted here :). So I upgraded to the latest and greatest semi-stable release, 871. I no longer see the is not queued messages and my login and logout work fine. However this is the only the case if I don't have my flash device mounted on /dev/sdc. If the flash is mounted I get this kernel oops: -T iqn.1999-02.com.nexsan:p0:sataboy:01731a5a --login Logging in to [iface: default, target: iqn. 1999-02.com.nexsan:p0:sataboy:01731a5a, portal: 172.19.151.169,3260] kobject_add failed for sdc with -EEXIST, don't try to register things with the same name in the same directory. BUG: unable to handle kernel NULL pointer dereference at virtual address 0008 printing eip: *pde = Oops: [#1] Modules linked in: iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi CPU:0 EIP:0060:[c0189241]Not tainted VLI EFLAGS: 00210292 (2.6.22.10-vs2.2.0.5-cisco-nmx #1) EIP is at create_dir+0x21/0x190 eax: c2280584 ebx: f7191bac ecx: c2280588 edx: esi: c2280584 edi: c2280584 ebp: esp: f7191b7c ds: 007b es: 007b fs: gs: 0033 ss: 0068 Process iscsid (pid: 5038, ti=f719 task=f77c75b0 task.ti=f719) Stack: c0378110 c037566c c2280588 c2280584 c037566c c0103c41 c2280584 c2280584 c2280584 c01893d4 f7191bac c0203a1f f7b35ac0 f7d690c0 c0186e80 f7b35b10 f7b35ac0 c22804c0 c2280584 f7d690c0 Call Trace: [c0103c41] dump_stack+0x11/0x20 [c01893d4] sysfs_create_dir+0x24/0x70 [c0203a1f] kobject_shadow_add+0x7f/0x1a0 [c0186e80] register_disk+0x50/0x1f0 [c01f4b72] blk_register_queue+0x52/0x90 [c026c6c8] sd_probe+0x278/0x3f0 [c0189e47] sysfs_create_link+0x57/0x150 [c0230c57] driver_probe_device+0x87/0x190 [c0331fc1] klist_next+0x51/0xb0 [c022ff24] bus_for_each_drv+0x44/0x70 [c0230e19] device_attach+0x79/0x80 [c0230d60] __device_attach+0x0/0x10 [c022fe95] bus_attach_device+0x45/0x90 [c022eb93] device_add+0x493/0x560 [c0268502] scsi_sysfs_add_sdev+0x32/0x230 [c02665bd] scsi_probe_and_add_lun+0x95d/0x980 [c0266e91] __scsi_scan_target+0x491/0x5f0 [c0166dcb] mntput_no_expire+0x1b/0x70 [c015bac3] link_path_walk+0x63/0xc0 [c02676a6] scsi_scan_target+0xb6/0xe0 [f8a008fa] iscsi_user_scan_session+0x9a/0xb0 [scsi_transport_iscsi] [f8a00820] iscsi_user_scan+0x0/0x30 [scsi_transport_iscsi] [f8a00860] iscsi_user_scan_session+0x0/0xb0 [scsi_transport_iscsi] [c022dd42] device_for_each_child+0x22/0x40 [f8a00820] iscsi_user_scan+0x0/0x30 [scsi_transport_iscsi] [f8a00843] iscsi_user_scan+0x23/0x30 [scsi_transport_iscsi] [c026818b] store_scan+0xbb/0xf0 [c013b614] __alloc_pages+0x64/0x2f0 [c02680d0] store_scan+0x0/0xf0 [c0232196] class_device_attr_store+0x26/0x40 [c0188151] sysfs_write_file+0xb1/0x110 [c01880a0] sysfs_write_file+0x0/0x110 [c0153820] vfs_write+0xa0/0x140 [c0153da1] sys_write+0x41/0x70 [c010280e] sysenter_past_esp+0x5f/0x85 === Code: 74 26 00 8d bc 27 00 00 00 00 83 ec 28 89 5c 24 18 8b 5c 24 2c 89 74 24 1c 89 7c 24 20 89 6c 24 24 89 d5 89 4c 24 08 89 44 24 0c 8b 42 08 83 c0 68 e8 94 a2 1a 0031 c0 b9 ff ff ff ff 8b 7c 24 EIP: [c0189241] create_dir+0x21/0x190 SS:ESP 0068:f7191b7c Mar 1 21:09:07 localhost kernel: BUG: unable to handle kernel NULL pointer dereference at virtual address 0008 Mar 1 21:09:07 localhost kClocksource tsc unstable (delta = 2941960870 ns) ernel: printingTime: pit clocksource has been installed. eip: Mar 1 21:09:07 localhost kernel: *pde = Mar 1 21:09:07 localhost kernel: Oops: [#1] Mar 1 21:09:07 localhost kernel: CPU:0 Mar 1 21:09:07 localhost kernel: EIP:0060:[c0189241]Not tainted VLI Mar 1 21:09:07 localhost kernel: EFLAGS: 00210292 (2.6.22.10- vs2.2.0.5-cisco-nmx #1) Mar 1 21:09:07 localhost kernel: EIP is at create_dir+0x21/0x190 Mar 1 21:09:07 localhost kernel: eax: c2280584 ebx: f7191bac ecx: c2280588 edx: Mar 1 21:09:07 localhost kernel: esi: c2280584 edi: c2280584 ebp: esp: f7191b7c Mar 1 21:09:07 localhost kernel: ds: 007b es: 007b fs: gs: 0033 ss: 0068 Mar 1 21:09:07 localhost kernel: Process iscsid (pid: 5038, ti=f719 task=f77c75b0 task.ti=f719) Mar 1 21:09:07 localhost kernel: Stack: c0378110 c037566c c2280588 c2280584 c037566c c0103c41 c2280584 c2280584 Mar 1 21:09:07 localhost kernel:c2280584 c01893d4 f7191bac c0203a1f f7b35ac0 Mar 1 21:09:07 localhost kernel:f7d690c0 c0186e80 f7b35b10 f7b35ac0 c22804c0 c2280584 f7d690c0 Mar 1 21:09:07 localhost kernel: Call Trace: Mar 1 21:09:07 localhost kernel: [c0103c41] dump_stack+0x11/0x20 Mar 1 21:09:07 localhost kernel: [c01893d4] sysfs_create_dir +0x24/0x70 Mar 1 21:09:07 localhost kernel: [c0203a1f] kobject_shadow_add +0x7f/0x1a0 Mar 1 21:09:07 localhost kernel: [c0186e80] register_disk +0x50/0x1f0 Mar 1 21:09:07 localhost kernel: [c01f4b72] blk_register_queue +0x52/0x90
Re: kernel oops in resched_task
On Wed, Jan 06, 2010 at 04:28:32PM +0200, Erez Zilber wrote: Hi, I got this oops while running open-iscsi on a CentOS 5.3 machine (don't know how to recreate it). it crashes while trying to wake up the work queue after queuecommand was called. Has anyone seen something similar? I'd suggest you to upgrade to 5.4 and see if it still happens. -- Pasi -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: kernel oops in resched_task
On 01/06/2010 08:28 AM, Erez Zilber wrote: Hi, I got this oops while running open-iscsi on a CentOS 5.3 machine (don't know how to recreate it). it crashes while trying to wake up the work queue after queuecommand was called. Has anyone seen something similar? I have not seen it. Are you using the kernel modules that come with 5.3 or the open-iscsi ported ones? -- You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-is...@googlegroups.com. To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
Re: Kernel Oops
Kevin Ye wrote: Hi Mike, I reproduced this problem with two simple scripts, one is to continuous login and then logout a target, the other script randomly fails the network and resume the network after 30 or 200 seconds. script1: while [ 1 ] do date iscsiadm -m node -l -T [target_name]-p [target_ip] date iscsiadm -m node -u -T [target_name] -p [target_ip] done script 2: while [ 1 ] do echo failing the wan ./disconnectip.sh 192.168.1.160 sleep 30 echo unfailing the wan ./reconnectip.sh 192.168.1.160 sleep 300 echo failing the wan ./disconnectip.sh 192.168.1.160 sleep 200 echo unfailing the wan ./reconnectip.sh 192.168.1.160 done I hit the oops after 1 days of test. In this test, I didn't hit target NULL problem during logout. I think that the target NULL problem I mentioned before is caused by the killing of login process in my script due to timeout. I analyzed all the kernel oops I hit so far, it seems that if the network is failed just before the login process finish, then after 15 seconds of network down (less than 15 seconds after we see the kernel messate Attached SCSI disk), it complains connectionx:0: ping timeout of 15 secs expired, last rx x, last ping x, now x. Any idea what's the problem? Thanks. Your original problem came when the network was not accessible. The iscsi initiator sends a iscsi ping every noop_timout seconds. When you saw the ping/nop timeout message it means that a iscsi ping timedout. The iscsi initiator will then, drop the connection and try to relogin every X seconds. In the oops you sent we saw that something was forcing a logout and shutdown of the session at this time, but what I cannot figure out is why the oops is failing in code not related to iscsi. For some reason we are in the SPI code (so something related to mptspi maybe, which uses scsi_transport_spi which where spi_device_match is. Oct 9 21:16:10 ian_ser_2 kernel: [28486.052073] EIP is at spi_device_match+0x1a/0x60 [scsi_transport_spi] Oct 9 21:16:10 ian_ser_2 kernel: [28486.052178] EAX: EBX: c27ff0b0 ECX: c27ff000 EDX: c27ff0b0 Oct 9 21:16:10 ian_ser_2 kernel: [28486.052274] ESI: c27ff0b0 EDI: d0c31800 I think I saw this with older kernels, which is why I asked for you to try a newer one before. However, I was looking through the git commits and did not see any related fixes so I am not sure. In the original problem, where you running a logout command manually or did you run somehting like /etc/init.d/iscsi restart (script name may be different) In the oopses in your new tests, are you failing in the same place (in spi_device_match)? --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: Kernel Oops
Thanks a lot for your explaination, Mike. In my orginal problem, a logout was issued manually after login. At the time of this command was issued, kernel oops happened. For the new test, it failed int the same place (in spi_device_match). I cannot track the time when logout was issued, but I suspect it was the same time when kernel oops happened. Below is the kernel log in my new test. -Kevin Oct 24 14:00:48 qye-cms kernel: [184273.268108] scsi55415 : iSCSI Initiator over TCP/IP Oct 24 14:00:48 qye-cms kernel: [184273.523222] scsi 55415:0:0:91: Direct-Access IET VIRTUAL-DISK 0PQ: 0 ANSI: 4 Oct 24 14:00:48 qye-cms kernel: [184273.528678] sd 55415:0:0:91: [sdb] 1024000 512-byte hardware sectors (524 MB) Oct 24 14:00:48 qye-cms kernel: [184273.529105] sd 55415:0:0:91: [sdb] Write Protect is off Oct 24 14:00:48 qye-cms kernel: [184273.529109] sd 55415:0:0:91: [sdb] Mode Sense: 77 00 00 08 Oct 24 14:00:48 qye-cms kernel: [184273.529545] sd 55415:0:0:91: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA Oct 24 14:00:48 qye-cms kernel: [184273.530040] sd 55415:0:0:91: [sdb] 1024000 512-byte hardware sectors (524 MB) Oct 24 14:00:48 qye-cms kernel: [184273.531361] sd 55415:0:0:91: [sdb] Write Protect is off Oct 24 14:00:48 qye-cms kernel: [184273.531367] sd 55415:0:0:91: [sdb] Mode Sense: 77 00 00 08 Oct 24 14:00:48 qye-cms kernel: [184273.533646] sd 55415:0:0:91: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA Oct 24 14:01:08 qye-cms kernel: [184273.533654] sdb:3 connection55413:0: ping timeout of 15 secs expired, last rx 46079163, last ping 46080413, now 46084163 Oct 24 14:01:08 qye-cms kernel: [184293.496601] connection55413:0: detected conn error (1011) Oct 24 14:03:08 qye-cms kernel: [184413.530473] session55413: session recovery timed out after 120 secs Oct 24 14:03:08 qye-cms kernel: [184413.534610] sd 55415:0:0:91: [sdb] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK Oct 24 14:03:08 qye-cms kernel: [184413.534697] end_request: I/O error, dev sdb, sector 8 Oct 24 14:03:08 qye-cms kernel: [184413.534804] Buffer I/O error on device sdb, logical block 1 Oct 24 14:03:08 qye-cms kernel: [184413.535039] sd 55415:0:0:91: [sdb] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK Oct 24 14:03:08 qye-cms kernel: [184413.535044] end_request: I/O error, dev sdb, sector 8 Oct 24 14:03:08 qye-cms kernel: [184413.535053] Buffer I/O error on device sdb, logical block 1 Oct 24 14:03:08 qye-cms kernel: [184413.535163] Dev sdb: unable to read RDB block 8 Oct 24 14:03:08 qye-cms kernel: [184413.537038] sd 55415:0:0:91: [sdb] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK Oct 24 14:03:08 qye-cms kernel: [184413.537048] end_request: I/O error, dev sdb, sector 24 Oct 24 14:03:08 qye-cms kernel: [184413.537057] Buffer I/O error on device sdb, logical block 3 Oct 24 14:03:08 qye-cms kernel: [184413.537185] sd 55415:0:0:91: [sdb] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK Oct 24 14:03:08 qye-cms kernel: [184413.537189] end_request: I/O error, dev sdb, sector 24 Oct 24 14:03:08 qye-cms kernel: [184413.537192] Buffer I/O error on device sdb, logical block 3 Oct 24 14:03:08 qye-cms kernel: [184413.537326] unable to read partition table Oct 24 14:03:08 qye-cms kernel: [184413.537398] sd 55415:0:0:91: [sdb] Attached SCSI disk Oct 24 14:03:08 qye-cms kernel: [184413.537478] sd 55415:0:0:91: Attached scsi generic sg1 type 0 Oct 24 14:05:03 qye-cms kernel: [184413.597688] BUG: unable to handle kernel NULL pointer dereference at virtual address 0060 Oct 24 14:05:03 qye-cms kernel: [184413.684629] printing eip: e08a212a *pde = Oct 24 14:05:03 qye-cms kernel: [184413.684907] Oops: [#1] SMP Oct 24 14:05:03 qye-cms kernel: [184413.685202] Modules linked in: iscsi_trgt crc32c libcrc32c vmblock vmmemctl cpufreq_conservative cpufreq_ondemand cpufreq_userspace cpufreq_stats freq_table cpufreq_powersave sbs video output sbshc dock battery iptable_filter ip_tables x_tables vmhgfs iscsi_tcp libiscsi scsi_transport_iscsi lp loop ipv6 parport_pc parport evdev container serio_raw psmouse ac button i2c_piix4 intel_agp i2c_core pcspkr agpgart shpchp pci_hotplug ext3 jbd mbcache sg sd_mod floppy pcnet32 mptspi mptscsih mptbase mii pata_acpi ata_generic scsi_transport_spi ata_piix libata scsi_mod raid10 raid456 async_xor async_memcpy async_tx xor raid1 raid0 multipath linear md_mod dm_mirror dm_snapshot dm_mod thermal processor fan fbcon tileblit font bitblit softcursor fuse vmxnet Oct 24 14:05:03 qye-cms kernel: [184413.686686] Oct 24 14:05:03 qye-cms kernel: [184413.686773] Pid: 3770, comm: iscsid Not tainted (2.6.24-24-generic #1) Oct 24 14:05:03 qye-cms kernel: [184413.686850] EIP: 0060:[e08a212a] EFLAGS: 00010202 CPU: 0 Oct 24 14:05:03 qye-cms kernel: [184413.725583] EIP is at spi_device_match+0x1a/0x60 [scsi_transport_spi] Oct 24 14:05:03 qye-cms kernel: [184413.725680] EAX: EBX:
Re: Kernel Oops
Hi Mike, I reproduced this problem with two simple scripts, one is to continuous login and then logout a target, the other script randomly fails the network and resume the network after 30 or 200 seconds. script1: while [ 1 ] do date iscsiadm -m node -l -T [target_name]-p [target_ip] date iscsiadm -m node -u -T [target_name] -p [target_ip] done script 2: while [ 1 ] do echo failing the wan ./disconnectip.sh 192.168.1.160 sleep 30 echo unfailing the wan ./reconnectip.sh 192.168.1.160 sleep 300 echo failing the wan ./disconnectip.sh 192.168.1.160 sleep 200 echo unfailing the wan ./reconnectip.sh 192.168.1.160 done I hit the oops after 1 days of test. In this test, I didn't hit target NULL problem during logout. I think that the target NULL problem I mentioned before is caused by the killing of login process in my script due to timeout. I analyzed all the kernel oops I hit so far, it seems that if the network is failed just before the login process finish, then after 15 seconds of network down (less than 15 seconds after we see the kernel messate Attached SCSI disk), it complains connectionx:0: ping timeout of 15 secs expired, last rx x, last ping x, now x. Any idea what's the problem? Thanks. Regards, Kevin On Wed, Oct 21, 2009 at 12:46 PM, Mike Christie micha...@cs.wisc.eduwrote: Kevin Ye wrote: Thanks Mike. I did the tests you mentioned a couple of times, and it didn't cause kernel oops. The kernel Oops I hit does not happen often. I hit twice in last 4 weeks. kernel patch is welcome and I will give it a try. Thanks. Shoot, let me do some digging. I was hopping one of those manual commands would fire the problem. The one where you pull the cable yourself should have run over the same code and caused it. Are you using multipath? If not, for now you can just disble nops/pings. Set the noop timeout and noop interval to 0 for every target you have setup, and set this in the iscsid.conf (you could also set it in iscsid.conf then rediscovery the targets so it will get picked up). Kevin On Thu, Oct 15, 2009 at 12:06 PM, Mike Christie micha...@cs.wisc.edu wrote: On 10/14/2009 05:11 PM, Kevin Ye wrote: Hi All, We hit the kernel oops again on our setup. Any suggestion to fix that? If you just login then logout manually iscsiadm -m session -u Does that cause an oops? If you log back in, then pull the network cable, wait to see the ping timeout messages then manually logout iscsiadm -m session -u Does that cause an oops? Can you rebuild your kernel, if I send you a patch? Thanks. Our set up is: kernel: 2.6.24-24 open-iscsi: 2.0-870.3 kernel logs: Oct 9 21:15:50 ian_ser_2 kernel: [28466.697051] scsi841 : iSCSI Initiator over TCP/IP Oct 9 21:15:50 ian_ser_2 kernel: [28466.962031] scsi 841:0:0:201: Direct-Access IET VIRTUAL-DISK 0PQ: 0 ANSI: 4 Oct 9 21:15:50 ian_ser_2 kernel: [28466.969109] sd 841:0:0:201: [sdd] 4505472 512-byte hardware sectors (2307 MB) Oct 9 21:15:50 ian_ser_2 kernel: [28466.973314] sd 841:0:0:201: [sdd] Write Protect is off Oct 9 21:15:50 ian_ser_2 kernel: [28466.973320] sd 841:0:0:201: [sdd] Mode Sense: 77 00 00 08 Oct 9 21:15:50 ian_ser_2 kernel: [28466.975420] sd 841:0:0:201: [sdd] Write cache: disabled, read cache: disabled, doesn't support DPO or FUA Oct 9 21:15:50 ian_ser_2 kernel: [28466.977468] sd 841:0:0:201: [sdd] 4505472 512-byte hardware sectors (2307 MB) Oct 9 21:15:50 ian_ser_2 kernel: [28466.977938] sd 841:0:0:201: [sdd] Write Protect is off Oct 9 21:15:50 ian_ser_2 kernel: [28466.977944] sd 841:0:0:201: [sdd] Mode Sense: 77 00 00 08 Oct 9 21:15:50 ian_ser_2 kernel: [28466.981749] sd 841:0:0:201: [sdd] Write cache: disabled, read cache: disabled, doesn't support DPO or FUA Oct 9 21:15:50 ian_ser_2 kernel: [28466.981761] sdd: sdd1 Oct 9 21:15:50 ian_ser_2 kernel: [28467.027801] sd 841:0:0:201: [sdd] Attached SCSI disk Oct 9 21:15:50 ian_ser_2 kernel: [28467.027886] sd 841:0:0:201: Attached scsi generic sg4 type 0 Oct 9 21:16:01 ian_ser_2 kernel: [28477.713280] connection626:0: ping timeout of 15 secs expired, last rx 7049831, last ping 7052331, now 7056081 Oct 9 21:16:01 ian_ser_2 kernel: [28477.713467] connection626:0: detected conn error (1011) Oct 9 21:16:01 ian_ser_2 kernel: [28477.717268] connection627:0: ping timeout of 15 secs expired, last rx 7049832, last ping 7052332, now 7056082 Oct 9 21:16:01 ian_ser_2 kernel: [28477.717458] connection627:0: detected conn error (1011) Oct 9 21:16:10 ian_ser_2 kernel: [28486.049414] BUG: unable to handle kernel NULL pointer dereference at virtual address 0060 Oct 9 21:16:10 ian_ser_2 kernel: [28486.049639] printing eip: e08a212a *pde = Oct 9 21:16:10 ian_ser_2 kernel: [28486.049924] Oops: [#1] SMP Oct 9 21:16:10 ian_ser_2 kernel:
Re: Kernel Oops
Thanks Mike. I suspect that this problem is related the half-way done login when the network is down. That is, before the login process finish, the network is down and after 15 seconds of network down, it tries to fail the connection but the connection is half-way done, the device or other properties is NULL which causes kernel oops. I tried to reproduce this but introduce other problem. I may failed the network not the right time. The other problem I hit is login fails, but leave the target name in the kernel NULL, which cause logout fail. The only way to resolve this is to restart the machine. The error output of logout is: iscsiadm --mode node --logout all iscsiadm: could not read session targetname: 61 iscsiadm: could not find session info for session127 iscsiadm: initiator reported error (15 - already exists) So my conclusion is that the network failure in the middle of login is not nicely handled. I am still trying to reproduce the kernel oops other then the second error. If you have any suggestion we should do, please let me know. Thanks. Kevin On Wed, Oct 21, 2009 at 12:46 PM, Mike Christie micha...@cs.wisc.eduwrote: Shoot, let me do some digging. I was hopping one of those manual commands would fire the problem. The one where you pull the cable yourself should have run over the same code and caused it. Are you using multipath? If not, for now you can just disble nops/pings. Set the noop timeout and noop interval to 0 for every target you have setup, and set this in the iscsid.conf (you could also set it in iscsid.conf then rediscovery the targets so it will get picked up). Kevin On Thu, Oct 15, 2009 at 12:06 PM, Mike Christie micha...@cs.wisc.edu wrote: On 10/14/2009 05:11 PM, Kevin Ye wrote: Hi All, We hit the kernel oops again on our setup. Any suggestion to fix that? If you just login then logout manually iscsiadm -m session -u Does that cause an oops? If you log back in, then pull the network cable, wait to see the ping timeout messages then manually logout iscsiadm -m session -u Does that cause an oops? Can you rebuild your kernel, if I send you a patch? Thanks. Our set up is: kernel: 2.6.24-24 open-iscsi: 2.0-870.3 kernel logs: Oct 9 21:15:50 ian_ser_2 kernel: [28466.697051] scsi841 : iSCSI Initiator over TCP/IP Oct 9 21:15:50 ian_ser_2 kernel: [28466.962031] scsi 841:0:0:201: Direct-Access IET VIRTUAL-DISK 0PQ: 0 ANSI: 4 Oct 9 21:15:50 ian_ser_2 kernel: [28466.969109] sd 841:0:0:201: [sdd] 4505472 512-byte hardware sectors (2307 MB) Oct 9 21:15:50 ian_ser_2 kernel: [28466.973314] sd 841:0:0:201: [sdd] Write Protect is off Oct 9 21:15:50 ian_ser_2 kernel: [28466.973320] sd 841:0:0:201: [sdd] Mode Sense: 77 00 00 08 Oct 9 21:15:50 ian_ser_2 kernel: [28466.975420] sd 841:0:0:201: [sdd] Write cache: disabled, read cache: disabled, doesn't support DPO or FUA Oct 9 21:15:50 ian_ser_2 kernel: [28466.977468] sd 841:0:0:201: [sdd] 4505472 512-byte hardware sectors (2307 MB) Oct 9 21:15:50 ian_ser_2 kernel: [28466.977938] sd 841:0:0:201: [sdd] Write Protect is off Oct 9 21:15:50 ian_ser_2 kernel: [28466.977944] sd 841:0:0:201: [sdd] Mode Sense: 77 00 00 08 Oct 9 21:15:50 ian_ser_2 kernel: [28466.981749] sd 841:0:0:201: [sdd] Write cache: disabled, read cache: disabled, doesn't support DPO or FUA Oct 9 21:15:50 ian_ser_2 kernel: [28466.981761] sdd: sdd1 Oct 9 21:15:50 ian_ser_2 kernel: [28467.027801] sd 841:0:0:201: [sdd] Attached SCSI disk Oct 9 21:15:50 ian_ser_2 kernel: [28467.027886] sd 841:0:0:201: Attached scsi generic sg4 type 0 Oct 9 21:16:01 ian_ser_2 kernel: [28477.713280] connection626:0: ping timeout of 15 secs expired, last rx 7049831, last ping 7052331, now 7056081 Oct 9 21:16:01 ian_ser_2 kernel: [28477.713467] connection626:0: detected conn error (1011) Oct 9 21:16:01 ian_ser_2 kernel: [28477.717268] connection627:0: ping timeout of 15 secs expired, last rx 7049832, last ping 7052332, now 7056082 Oct 9 21:16:01 ian_ser_2 kernel: [28477.717458] connection627:0: detected conn error (1011) Oct 9 21:16:10 ian_ser_2 kernel: [28486.049414] BUG: unable to handle kernel NULL pointer dereference at virtual address 0060 Oct 9 21:16:10 ian_ser_2 kernel: [28486.049639] printing eip: e08a212a *pde = Oct 9 21:16:10 ian_ser_2 kernel: [28486.049924] Oops: [#1] SMP Oct 9 21:16:10 ian_ser_2 kernel: [28486.050100] Modules linked in: iscsi_tcp libiscsi scsi_transport_iscsi iscsi_trgt crc32c libcrc32c nls_iso8859_1 nls_cp437 vfat fat vmmemctl cpufreq_conservative cpufreq_ondemand cpufreq_userspace cpufreq_stats freq_table cpufreq_powersave sbs video output sbshc dock battery iptable_filter ip_tables x_tables vmhgfs lp loop ipv6 container serio_raw ac button evdev parport_pc parport i2c_piix4 i2c_core intel_agp agpgart shpchp pci_hotplug
Re: Kernel Oops
Kevin Ye wrote: Thanks Mike. I did the tests you mentioned a couple of times, and it didn't cause kernel oops. The kernel Oops I hit does not happen often. I hit twice in last 4 weeks. kernel patch is welcome and I will give it a try. Thanks. Shoot, let me do some digging. I was hopping one of those manual commands would fire the problem. The one where you pull the cable yourself should have run over the same code and caused it. Are you using multipath? If not, for now you can just disble nops/pings. Set the noop timeout and noop interval to 0 for every target you have setup, and set this in the iscsid.conf (you could also set it in iscsid.conf then rediscovery the targets so it will get picked up). Kevin On Thu, Oct 15, 2009 at 12:06 PM, Mike Christie micha...@cs.wisc.eduwrote: On 10/14/2009 05:11 PM, Kevin Ye wrote: Hi All, We hit the kernel oops again on our setup. Any suggestion to fix that? If you just login then logout manually iscsiadm -m session -u Does that cause an oops? If you log back in, then pull the network cable, wait to see the ping timeout messages then manually logout iscsiadm -m session -u Does that cause an oops? Can you rebuild your kernel, if I send you a patch? Thanks. Our set up is: kernel: 2.6.24-24 open-iscsi: 2.0-870.3 kernel logs: Oct 9 21:15:50 ian_ser_2 kernel: [28466.697051] scsi841 : iSCSI Initiator over TCP/IP Oct 9 21:15:50 ian_ser_2 kernel: [28466.962031] scsi 841:0:0:201: Direct-Access IET VIRTUAL-DISK 0PQ: 0 ANSI: 4 Oct 9 21:15:50 ian_ser_2 kernel: [28466.969109] sd 841:0:0:201: [sdd] 4505472 512-byte hardware sectors (2307 MB) Oct 9 21:15:50 ian_ser_2 kernel: [28466.973314] sd 841:0:0:201: [sdd] Write Protect is off Oct 9 21:15:50 ian_ser_2 kernel: [28466.973320] sd 841:0:0:201: [sdd] Mode Sense: 77 00 00 08 Oct 9 21:15:50 ian_ser_2 kernel: [28466.975420] sd 841:0:0:201: [sdd] Write cache: disabled, read cache: disabled, doesn't support DPO or FUA Oct 9 21:15:50 ian_ser_2 kernel: [28466.977468] sd 841:0:0:201: [sdd] 4505472 512-byte hardware sectors (2307 MB) Oct 9 21:15:50 ian_ser_2 kernel: [28466.977938] sd 841:0:0:201: [sdd] Write Protect is off Oct 9 21:15:50 ian_ser_2 kernel: [28466.977944] sd 841:0:0:201: [sdd] Mode Sense: 77 00 00 08 Oct 9 21:15:50 ian_ser_2 kernel: [28466.981749] sd 841:0:0:201: [sdd] Write cache: disabled, read cache: disabled, doesn't support DPO or FUA Oct 9 21:15:50 ian_ser_2 kernel: [28466.981761] sdd: sdd1 Oct 9 21:15:50 ian_ser_2 kernel: [28467.027801] sd 841:0:0:201: [sdd] Attached SCSI disk Oct 9 21:15:50 ian_ser_2 kernel: [28467.027886] sd 841:0:0:201: Attached scsi generic sg4 type 0 Oct 9 21:16:01 ian_ser_2 kernel: [28477.713280] connection626:0: ping timeout of 15 secs expired, last rx 7049831, last ping 7052331, now 7056081 Oct 9 21:16:01 ian_ser_2 kernel: [28477.713467] connection626:0: detected conn error (1011) Oct 9 21:16:01 ian_ser_2 kernel: [28477.717268] connection627:0: ping timeout of 15 secs expired, last rx 7049832, last ping 7052332, now 7056082 Oct 9 21:16:01 ian_ser_2 kernel: [28477.717458] connection627:0: detected conn error (1011) Oct 9 21:16:10 ian_ser_2 kernel: [28486.049414] BUG: unable to handle kernel NULL pointer dereference at virtual address 0060 Oct 9 21:16:10 ian_ser_2 kernel: [28486.049639] printing eip: e08a212a *pde = Oct 9 21:16:10 ian_ser_2 kernel: [28486.049924] Oops: [#1] SMP Oct 9 21:16:10 ian_ser_2 kernel: [28486.050100] Modules linked in: iscsi_tcp libiscsi scsi_transport_iscsi iscsi_trgt crc32c libcrc32c nls_iso8859_1 nls_cp437 vfat fat vmmemctl cpufreq_conservative cpufreq_ondemand cpufreq_userspace cpufreq_stats freq_table cpufreq_powersave sbs video output sbshc dock battery iptable_filter ip_tables x_tables vmhgfs lp loop ipv6 container serio_raw ac button evdev parport_pc parport i2c_piix4 i2c_core intel_agp agpgart shpchp pci_hotplug psmouse pcspkr ext3 jbd mbcache sd_mod sg sr_mod cdrom pata_acpi ata_generic floppy pcnet32 mii mptspi mptscsih mptbase scsi_transport_spi ata_piix libata scsi_mod raid10 raid456 async_xor async_memcpy async_tx xor raid1 raid0 multipath linear md_mod dm_mirror dm_snapshot dm_mod thermal processor fan fbcon tileblit font bitblit softcursor fuse vmxnet Oct 9 21:16:10 ian_ser_2 kernel: [28486.051174] Oct 9 21:16:10 ian_ser_2 kernel: [28486.051174] Oct 9 21:16:10 ian_ser_2 kernel: [28486.051286] Pid: 16444, comm: iscsi_scan_839 Not tainted (2.6.24-24-generic #1) Oct 9 21:16:10 ian_ser_2 kernel: [28486.051433] EIP: 0060:[e08a212a] EFLAGS: 00010202 CPU: 0 Oct 9 21:16:10 ian_ser_2 kernel: [28486.052073] EIP is at spi_device_match+0x1a/0x60 [scsi_transport_spi] Oct 9 21:16:10 ian_ser_2 kernel: [28486.052178] EAX: EBX: c27ff0b0 ECX: c27ff000 EDX: c27ff0b0 Oct 9 21:16:10 ian_ser_2 kernel: [28486.052274] ESI: c27ff0b0 EDI: d0c31800 EBP: c0286000 ESP:
Re: Kernel Oops
On 10/14/2009 05:11 PM, Kevin Ye wrote: Hi All, We hit the kernel oops again on our setup. Any suggestion to fix that? If you just login then logout manually iscsiadm -m session -u Does that cause an oops? If you log back in, then pull the network cable, wait to see the ping timeout messages then manually logout iscsiadm -m session -u Does that cause an oops? Can you rebuild your kernel, if I send you a patch? Thanks. Our set up is: kernel: 2.6.24-24 open-iscsi: 2.0-870.3 kernel logs: Oct 9 21:15:50 ian_ser_2 kernel: [28466.697051] scsi841 : iSCSI Initiator over TCP/IP Oct 9 21:15:50 ian_ser_2 kernel: [28466.962031] scsi 841:0:0:201: Direct-Access IET VIRTUAL-DISK 0PQ: 0 ANSI: 4 Oct 9 21:15:50 ian_ser_2 kernel: [28466.969109] sd 841:0:0:201: [sdd] 4505472 512-byte hardware sectors (2307 MB) Oct 9 21:15:50 ian_ser_2 kernel: [28466.973314] sd 841:0:0:201: [sdd] Write Protect is off Oct 9 21:15:50 ian_ser_2 kernel: [28466.973320] sd 841:0:0:201: [sdd] Mode Sense: 77 00 00 08 Oct 9 21:15:50 ian_ser_2 kernel: [28466.975420] sd 841:0:0:201: [sdd] Write cache: disabled, read cache: disabled, doesn't support DPO or FUA Oct 9 21:15:50 ian_ser_2 kernel: [28466.977468] sd 841:0:0:201: [sdd] 4505472 512-byte hardware sectors (2307 MB) Oct 9 21:15:50 ian_ser_2 kernel: [28466.977938] sd 841:0:0:201: [sdd] Write Protect is off Oct 9 21:15:50 ian_ser_2 kernel: [28466.977944] sd 841:0:0:201: [sdd] Mode Sense: 77 00 00 08 Oct 9 21:15:50 ian_ser_2 kernel: [28466.981749] sd 841:0:0:201: [sdd] Write cache: disabled, read cache: disabled, doesn't support DPO or FUA Oct 9 21:15:50 ian_ser_2 kernel: [28466.981761] sdd: sdd1 Oct 9 21:15:50 ian_ser_2 kernel: [28467.027801] sd 841:0:0:201: [sdd] Attached SCSI disk Oct 9 21:15:50 ian_ser_2 kernel: [28467.027886] sd 841:0:0:201: Attached scsi generic sg4 type 0 Oct 9 21:16:01 ian_ser_2 kernel: [28477.713280] connection626:0: ping timeout of 15 secs expired, last rx 7049831, last ping 7052331, now 7056081 Oct 9 21:16:01 ian_ser_2 kernel: [28477.713467] connection626:0: detected conn error (1011) Oct 9 21:16:01 ian_ser_2 kernel: [28477.717268] connection627:0: ping timeout of 15 secs expired, last rx 7049832, last ping 7052332, now 7056082 Oct 9 21:16:01 ian_ser_2 kernel: [28477.717458] connection627:0: detected conn error (1011) Oct 9 21:16:10 ian_ser_2 kernel: [28486.049414] BUG: unable to handle kernel NULL pointer dereference at virtual address 0060 Oct 9 21:16:10 ian_ser_2 kernel: [28486.049639] printing eip: e08a212a *pde = Oct 9 21:16:10 ian_ser_2 kernel: [28486.049924] Oops: [#1] SMP Oct 9 21:16:10 ian_ser_2 kernel: [28486.050100] Modules linked in: iscsi_tcp libiscsi scsi_transport_iscsi iscsi_trgt crc32c libcrc32c nls_iso8859_1 nls_cp437 vfat fat vmmemctl cpufreq_conservative cpufreq_ondemand cpufreq_userspace cpufreq_stats freq_table cpufreq_powersave sbs video output sbshc dock battery iptable_filter ip_tables x_tables vmhgfs lp loop ipv6 container serio_raw ac button evdev parport_pc parport i2c_piix4 i2c_core intel_agp agpgart shpchp pci_hotplug psmouse pcspkr ext3 jbd mbcache sd_mod sg sr_mod cdrom pata_acpi ata_generic floppy pcnet32 mii mptspi mptscsih mptbase scsi_transport_spi ata_piix libata scsi_mod raid10 raid456 async_xor async_memcpy async_tx xor raid1 raid0 multipath linear md_mod dm_mirror dm_snapshot dm_mod thermal processor fan fbcon tileblit font bitblit softcursor fuse vmxnet Oct 9 21:16:10 ian_ser_2 kernel: [28486.051174] Oct 9 21:16:10 ian_ser_2 kernel: [28486.051174] Oct 9 21:16:10 ian_ser_2 kernel: [28486.051286] Pid: 16444, comm: iscsi_scan_839 Not tainted (2.6.24-24-generic #1) Oct 9 21:16:10 ian_ser_2 kernel: [28486.051433] EIP: 0060:[e08a212a] EFLAGS: 00010202 CPU: 0 Oct 9 21:16:10 ian_ser_2 kernel: [28486.052073] EIP is at spi_device_match+0x1a/0x60 [scsi_transport_spi] Oct 9 21:16:10 ian_ser_2 kernel: [28486.052178] EAX: EBX: c27ff0b0 ECX: c27ff000 EDX: c27ff0b0 Oct 9 21:16:10 ian_ser_2 kernel: [28486.052274] ESI: c27ff0b0 EDI: d0c31800 EBP: c0286000 ESP: dc1dfef0 Oct 9 21:16:10 ian_ser_2 kernel: [28486.052367] DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 Oct 9 21:16:10 ian_ser_2 kernel: [28486.052473] Process iscsi_scan_839 (pid: 16444, ti=dc1de000 task=d7823140 task.ti=dc1de000) Oct 9 21:16:10 ian_ser_2 kernel: [28486.052586] Stack: e08a7c90 c0285c8f e095e328 c27ff1d8 cc1c3430 c27ff000 c27ff0b0 d0c31800 Oct 9 21:16:10 ian_ser_2 kernel: [28486.052792]0202 e09449cd c27ff000 cc1c3430 e0944a2f c27ff000 cc1c3400 e0944acc Oct 9 21:16:10 ian_ser_2 kernel: [28486.052944]d0c31814 cc1c30a4 e0944b50 e0944b64 c02805c2 cc1c30a4 Oct 9 21:16:10 ian_ser_2 kernel: [28486.053101] Call Trace: Oct 9 21:16:10 ian_ser_2 kernel: [28486.053340] [c0285c8f] attribute_container_device_trigger+0x4f/0xb0 Oct 9 21:16:10 ian_ser_2 kernel: [28486.053963]
Re: Kernel Oops
Thanks Mike. I did the tests you mentioned a couple of times, and it didn't cause kernel oops. The kernel Oops I hit does not happen often. I hit twice in last 4 weeks. kernel patch is welcome and I will give it a try. Thanks. Kevin On Thu, Oct 15, 2009 at 12:06 PM, Mike Christie micha...@cs.wisc.eduwrote: On 10/14/2009 05:11 PM, Kevin Ye wrote: Hi All, We hit the kernel oops again on our setup. Any suggestion to fix that? If you just login then logout manually iscsiadm -m session -u Does that cause an oops? If you log back in, then pull the network cable, wait to see the ping timeout messages then manually logout iscsiadm -m session -u Does that cause an oops? Can you rebuild your kernel, if I send you a patch? Thanks. Our set up is: kernel: 2.6.24-24 open-iscsi: 2.0-870.3 kernel logs: Oct 9 21:15:50 ian_ser_2 kernel: [28466.697051] scsi841 : iSCSI Initiator over TCP/IP Oct 9 21:15:50 ian_ser_2 kernel: [28466.962031] scsi 841:0:0:201: Direct-Access IET VIRTUAL-DISK 0PQ: 0 ANSI: 4 Oct 9 21:15:50 ian_ser_2 kernel: [28466.969109] sd 841:0:0:201: [sdd] 4505472 512-byte hardware sectors (2307 MB) Oct 9 21:15:50 ian_ser_2 kernel: [28466.973314] sd 841:0:0:201: [sdd] Write Protect is off Oct 9 21:15:50 ian_ser_2 kernel: [28466.973320] sd 841:0:0:201: [sdd] Mode Sense: 77 00 00 08 Oct 9 21:15:50 ian_ser_2 kernel: [28466.975420] sd 841:0:0:201: [sdd] Write cache: disabled, read cache: disabled, doesn't support DPO or FUA Oct 9 21:15:50 ian_ser_2 kernel: [28466.977468] sd 841:0:0:201: [sdd] 4505472 512-byte hardware sectors (2307 MB) Oct 9 21:15:50 ian_ser_2 kernel: [28466.977938] sd 841:0:0:201: [sdd] Write Protect is off Oct 9 21:15:50 ian_ser_2 kernel: [28466.977944] sd 841:0:0:201: [sdd] Mode Sense: 77 00 00 08 Oct 9 21:15:50 ian_ser_2 kernel: [28466.981749] sd 841:0:0:201: [sdd] Write cache: disabled, read cache: disabled, doesn't support DPO or FUA Oct 9 21:15:50 ian_ser_2 kernel: [28466.981761] sdd: sdd1 Oct 9 21:15:50 ian_ser_2 kernel: [28467.027801] sd 841:0:0:201: [sdd] Attached SCSI disk Oct 9 21:15:50 ian_ser_2 kernel: [28467.027886] sd 841:0:0:201: Attached scsi generic sg4 type 0 Oct 9 21:16:01 ian_ser_2 kernel: [28477.713280] connection626:0: ping timeout of 15 secs expired, last rx 7049831, last ping 7052331, now 7056081 Oct 9 21:16:01 ian_ser_2 kernel: [28477.713467] connection626:0: detected conn error (1011) Oct 9 21:16:01 ian_ser_2 kernel: [28477.717268] connection627:0: ping timeout of 15 secs expired, last rx 7049832, last ping 7052332, now 7056082 Oct 9 21:16:01 ian_ser_2 kernel: [28477.717458] connection627:0: detected conn error (1011) Oct 9 21:16:10 ian_ser_2 kernel: [28486.049414] BUG: unable to handle kernel NULL pointer dereference at virtual address 0060 Oct 9 21:16:10 ian_ser_2 kernel: [28486.049639] printing eip: e08a212a *pde = Oct 9 21:16:10 ian_ser_2 kernel: [28486.049924] Oops: [#1] SMP Oct 9 21:16:10 ian_ser_2 kernel: [28486.050100] Modules linked in: iscsi_tcp libiscsi scsi_transport_iscsi iscsi_trgt crc32c libcrc32c nls_iso8859_1 nls_cp437 vfat fat vmmemctl cpufreq_conservative cpufreq_ondemand cpufreq_userspace cpufreq_stats freq_table cpufreq_powersave sbs video output sbshc dock battery iptable_filter ip_tables x_tables vmhgfs lp loop ipv6 container serio_raw ac button evdev parport_pc parport i2c_piix4 i2c_core intel_agp agpgart shpchp pci_hotplug psmouse pcspkr ext3 jbd mbcache sd_mod sg sr_mod cdrom pata_acpi ata_generic floppy pcnet32 mii mptspi mptscsih mptbase scsi_transport_spi ata_piix libata scsi_mod raid10 raid456 async_xor async_memcpy async_tx xor raid1 raid0 multipath linear md_mod dm_mirror dm_snapshot dm_mod thermal processor fan fbcon tileblit font bitblit softcursor fuse vmxnet Oct 9 21:16:10 ian_ser_2 kernel: [28486.051174] Oct 9 21:16:10 ian_ser_2 kernel: [28486.051174] Oct 9 21:16:10 ian_ser_2 kernel: [28486.051286] Pid: 16444, comm: iscsi_scan_839 Not tainted (2.6.24-24-generic #1) Oct 9 21:16:10 ian_ser_2 kernel: [28486.051433] EIP: 0060:[e08a212a] EFLAGS: 00010202 CPU: 0 Oct 9 21:16:10 ian_ser_2 kernel: [28486.052073] EIP is at spi_device_match+0x1a/0x60 [scsi_transport_spi] Oct 9 21:16:10 ian_ser_2 kernel: [28486.052178] EAX: EBX: c27ff0b0 ECX: c27ff000 EDX: c27ff0b0 Oct 9 21:16:10 ian_ser_2 kernel: [28486.052274] ESI: c27ff0b0 EDI: d0c31800 EBP: c0286000 ESP: dc1dfef0 Oct 9 21:16:10 ian_ser_2 kernel: [28486.052367] DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 Oct 9 21:16:10 ian_ser_2 kernel: [28486.052473] Process iscsi_scan_839 (pid: 16444, ti=dc1de000 task=d7823140 task.ti=dc1de000) Oct 9 21:16:10 ian_ser_2 kernel: [28486.052586] Stack: e08a7c90 c0285c8f e095e328 c27ff1d8 cc1c3430 c27ff000 c27ff0b0 d0c31800 Oct 9 21:16:10 ian_ser_2 kernel: [28486.052792]
Re: kernel oops in iscsi_tcp_recv
Erez Zilber wrote: Mike, I got a kernel oops while logging in from v870-1 to an iSCSI-SCST target. it happens before 'iscsiadm -m node -L all' returns. Is this a known bug? Here's the log: I think this is a new bug. I will give scst a try and try to replicate. Are you running v870-1 in a normal machine (no vm)? What nic are you using? What kernel are you running against? If you are using a intel 10 gig nic, it might be a known bug with the skb_seq_read function handling of LRO, but I think the oops looks a little different. Dec 17 19:56:21 172.16.4.12 Unable to handle kernel paging request Dec 17 19:56:26 172.16.4.12 at 1fd8 RIP: Dec 17 19:56:31 172.16.4.12 [883d759b] :iscsi_tcp:iscsi_tcp_recv+0xb9/0x498 Dec 17 19:56:36 172.16.4.12 PGD 406b54067 Dec 17 19:56:41 172.16.4.12 PUD 409354067 Dec 17 19:56:46 172.16.4.12 PMD 0 Dec 17 19:56:51 172.16.4.12 Dec 17 19:56:56 172.16.4.12 Oops: [1] Dec 17 19:57:01 172.16.4.12 SMP Dec 17 19:57:06 172.16.4.12 Dec 17 19:57:11 172.16.4.12 last sysfs file: /block/sdc/removable Dec 17 19:57:16 172.16.4.12 CPU 0 I will try to add more debug prints tomorrow, and see if I can give more details. Erez --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---