Re: kernel oops error

2011-09-26 Thread Mike Christie
On 09/26/2011 12:28 AM, Vivek S wrote:
 On Mon, Sep 26, 2011 at 5:51 AM, Mike Christie micha...@cs.wisc.edu wrote:
 
 On 09/25/2011 01:57 AM, Vivek S wrote:

 It's not working because the locking changed in that upstream kernel.


 Hmm, okay. I will go back to a kernel that is supported by upstream
 open-iscsi.

 I think you misunderstood me. With newer kernels you should just use the
 kernel modules in the kernel. You do not need the open-iscsi/kernel
 modules in those kernels. The upstream ones are good enough.
 
 
 I was trying to dig into open-iscsi code and do some experiments with
 session recovery. So I was using open-iscsi code from github.
 
 
 In the case of the stable kernels they are probably best to use since fixes
 that get
 sent upstream are sent to the stable kernels.

 
 That means I have to use stable kernels for open-iscsi development ?
 

It depends on what you are working on. The userspace tools work with
newer kernels (from 2.6.26 and newer) just fine. If you are just doing
userspace work then you can use the open-iscsi tools from the github git
tree (and kernel.org when it is back up) or open-iscsi.org tarballs with
any of those kernels.

If you are doing kernel work then it is best to be using the scsi
maintainer's scsi-misc or scsi-rc-fixes tree or my linux-2.6.iscsi tree
(I have not put up a new kernel tree because it is basically just
scsi-misc right now and I have been hoping kernel.org would be back up
soon).

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: kernel oops error

2011-09-25 Thread Vivek S
On Sun, Sep 25, 2011 at 6:27 AM, Mike Christie micha...@cs.wisc.edu wrote:

 On 09/24/2011 02:40 PM, Vivek S wrote:
  Hi,
 
  I am on Dell laptop running Ubuntu 11.04, kernel 2.6.38.11.
 
  I modified the kernel Makefile to include the line linux_2_6_38:

 You should just use the kernel modules that come with that kernel.

  $(unpatch_code) to help me build open-iscsi.
 
  After a successful build i loaded all the open-iscsi kernel modules along
  with IET kernel module.
 
  I have setup IET target in /etc/iet/ietd.conf file.
 
  When I execute iscsiadm -m discoverydb -t st -p 192.168.1.2:3260 -Dl

 What are you trying to do? What is the l at the end for? Are you
 trying to do discovery and login to the portals found?


Yeah, I am trying to discover and login to the portals found. Its an lower
case L at the end to specify login.


 It's not working because the locking changed in that upstream kernel.


Hmm, okay. I will go back to a kernel that is supported by upstream
open-iscsi.

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: kernel oops error

2011-09-25 Thread Bart Van Assche
On Sat, Sep 24, 2011 at 9:40 PM, Vivek S vivek...@gmail.com wrote:
 I am on Dell laptop running Ubuntu 11.04, kernel 2.6.38.11.

I'm curious to know who generated kernel 2.6.38.11 ? As far as I know
the latest official kernel in the 2.6.38 series is 2.6.38.8
(https://lkml.org/lkml/2011/6/2/417). A quote from the 2.6.38.8
announcement:

This is the LAST .38 stable kernel release, please move to the
.39-stable tree at this point in time.

Bart.

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: kernel oops error

2011-09-25 Thread Vivek S
The Ubuntu distribution I am using has 2.6.38.8 and also 2.6.38.11. The
later got installed through Ubuntu OS updates.

On Sun, Sep 25, 2011 at 12:55 PM, Bart Van Assche bvanass...@acm.orgwrote:

 On Sat, Sep 24, 2011 at 9:40 PM, Vivek S vivek...@gmail.com wrote:
  I am on Dell laptop running Ubuntu 11.04, kernel 2.6.38.11.

 I'm curious to know who generated kernel 2.6.38.11 ? As far as I know
 the latest official kernel in the 2.6.38 series is 2.6.38.8
 (https://lkml.org/lkml/2011/6/2/417). A quote from the 2.6.38.8
 announcement:

 This is the LAST .38 stable kernel release, please move to the
 .39-stable tree at this point in time.

 Bart.

 --
 You received this message because you are subscribed to the Google Groups
 open-iscsi group.
 To post to this group, send email to open-iscsi@googlegroups.com.
 To unsubscribe from this group, send email to
 open-iscsi+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/open-iscsi?hl=en.



-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: kernel oops error

2011-09-25 Thread Mike Christie
On 09/25/2011 01:57 AM, Vivek S wrote:
 
 It's not working because the locking changed in that upstream kernel.
 
 
 Hmm, okay. I will go back to a kernel that is supported by upstream
 open-iscsi.

I think you misunderstood me. With newer kernels you should just use the
kernel modules in the kernel. You do not need the open-iscsi/kernel
modules in those kernels. The upstream ones are good enough. In the case
of the stable kernels they are probably best to use since fixes that get
sent upstream are sent to the stable kernels.

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: kernel oops error

2011-09-25 Thread Vivek S
On Mon, Sep 26, 2011 at 5:51 AM, Mike Christie micha...@cs.wisc.edu wrote:

 On 09/25/2011 01:57 AM, Vivek S wrote:
 
  It's not working because the locking changed in that upstream kernel.
 
 
  Hmm, okay. I will go back to a kernel that is supported by upstream
  open-iscsi.

 I think you misunderstood me. With newer kernels you should just use the
 kernel modules in the kernel. You do not need the open-iscsi/kernel
 modules in those kernels. The upstream ones are good enough.


I was trying to dig into open-iscsi code and do some experiments with
session recovery. So I was using open-iscsi code from github.


 In the case of the stable kernels they are probably best to use since fixes
 that get
 sent upstream are sent to the stable kernels.


That means I have to use stable kernels for open-iscsi development ?

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: kernel oops error

2011-09-24 Thread Mike Christie
On 09/24/2011 02:40 PM, Vivek S wrote:
 Hi,
 
 I am on Dell laptop running Ubuntu 11.04, kernel 2.6.38.11.
 
 I modified the kernel Makefile to include the line linux_2_6_38:

You should just use the kernel modules that come with that kernel.

 $(unpatch_code) to help me build open-iscsi.
 
 After a successful build i loaded all the open-iscsi kernel modules along
 with IET kernel module.
 
 I have setup IET target in /etc/iet/ietd.conf file.
 
 When I execute iscsiadm -m discoverydb -t st -p 192.168.1.2:3260 -Dl

What are you trying to do? What is the l at the end for? Are you
trying to do discovery and login to the portals found?

It's not working because the locking changed in that upstream kernel.

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: Kernel oops on login

2010-03-02 Thread Mike Christie

On 03/02/2010 02:35 AM, Ulrich Windl wrote:

On 2 Mar 2010 at 1:21, Mike Christie wrote:


Make sure everything is unmounted when you delete/logout, and make sure
something like dm/md raid/multipath is not using the device.



Mike,

would it make any sense to wait until the devices are no longer in use?
It might be preferrable to a kernel error message. You could still add
a force option if someone persists on doing a logout on busy devices.




Anyone with root privs can use one of the scsi interfaces (like 
/sys/block/sdb/device/delete) and remove the device from under other 
users and cause this problem.


The scsi and iscsi layer also supports hot unplugging of devices, so in 
many cases you can remove a HBA or a disk and the disk should just get 
cleaned up (like when you just rip out a usb device) and we have no 
control over the user doing this, but we are supposed to handle it.


And the FC layer and other drivers like SAS ones (iscsi is supposed to 
do it too, but I have not got around to changing the code), will begin 
the removal process if a port has been disconnected long enough (this is 
why it is common to hit the problem with dm-multipath and FC).


So for the above we do not have any control over it, and the problem is 
just buggy code which Hannes from SUSE has been working on fixing. If 
that is fixed then the problem where someone or the iscsi scripts runs 
iscsiadm ...--logout would also get fixed with it.


If you wanted to check if a device was in use to prevent a user mistake 
like logging out before they unmounted a FS, then yeah, I think it could 
be useful. I am just not sure how to do it nicely. I think we would have 
to add a API to the SCSI layer to peak at the device's refcount to check 
and see if there are any external users.


--
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: Kernel oops on login

2010-03-01 Thread An Oneironaut
Ok so I guess working with old versions of open-iscsi is not accepted
here :).  So I upgraded to the latest and greatest semi-stable
release, 871.  I no longer see the is not queued messages and my
login and logout work fine.  However this is the only the case if I
don't have my flash device mounted on /dev/sdc.  If the flash is
mounted I get this kernel oops:

 -T iqn.1999-02.com.nexsan:p0:sataboy:01731a5a --login
Logging in to [iface: default, target: iqn.
1999-02.com.nexsan:p0:sataboy:01731a5a, portal: 172.19.151.169,3260]
kobject_add failed for sdc with -EEXIST, don't try to register things
with the same name in the same directory.
BUG: unable to handle kernel NULL pointer dereference at virtual
address 0008
 printing eip:
*pde = 
Oops:  [#1]
Modules linked in: iscsi_tcp libiscsi_tcp libiscsi
scsi_transport_iscsi
CPU:0
EIP:0060:[c0189241]Not tainted VLI
EFLAGS: 00210292   (2.6.22.10-vs2.2.0.5-cisco-nmx #1)
EIP is at create_dir+0x21/0x190
eax: c2280584   ebx: f7191bac   ecx: c2280588   edx: 
esi: c2280584   edi: c2280584   ebp:    esp: f7191b7c
ds: 007b   es: 007b   fs:   gs: 0033  ss: 0068
Process iscsid (pid: 5038, ti=f719 task=f77c75b0 task.ti=f719)
Stack: c0378110 c037566c c2280588 c2280584 c037566c c0103c41 c2280584
c2280584
   c2280584  c01893d4 f7191bac   c0203a1f
f7b35ac0
   f7d690c0 c0186e80  f7b35b10 f7b35ac0 c22804c0 c2280584
f7d690c0
Call Trace:
 [c0103c41] dump_stack+0x11/0x20
 [c01893d4] sysfs_create_dir+0x24/0x70
 [c0203a1f] kobject_shadow_add+0x7f/0x1a0
 [c0186e80] register_disk+0x50/0x1f0
 [c01f4b72] blk_register_queue+0x52/0x90
 [c026c6c8] sd_probe+0x278/0x3f0
 [c0189e47] sysfs_create_link+0x57/0x150
 [c0230c57] driver_probe_device+0x87/0x190
 [c0331fc1] klist_next+0x51/0xb0
 [c022ff24] bus_for_each_drv+0x44/0x70
 [c0230e19] device_attach+0x79/0x80
 [c0230d60] __device_attach+0x0/0x10
 [c022fe95] bus_attach_device+0x45/0x90
 [c022eb93] device_add+0x493/0x560
 [c0268502] scsi_sysfs_add_sdev+0x32/0x230
 [c02665bd] scsi_probe_and_add_lun+0x95d/0x980
 [c0266e91] __scsi_scan_target+0x491/0x5f0
 [c0166dcb] mntput_no_expire+0x1b/0x70
 [c015bac3] link_path_walk+0x63/0xc0
 [c02676a6] scsi_scan_target+0xb6/0xe0
 [f8a008fa] iscsi_user_scan_session+0x9a/0xb0 [scsi_transport_iscsi]
 [f8a00820] iscsi_user_scan+0x0/0x30 [scsi_transport_iscsi]
 [f8a00860] iscsi_user_scan_session+0x0/0xb0 [scsi_transport_iscsi]
 [c022dd42] device_for_each_child+0x22/0x40
 [f8a00820] iscsi_user_scan+0x0/0x30 [scsi_transport_iscsi]
 [f8a00843] iscsi_user_scan+0x23/0x30 [scsi_transport_iscsi]
 [c026818b] store_scan+0xbb/0xf0
 [c013b614] __alloc_pages+0x64/0x2f0
 [c02680d0] store_scan+0x0/0xf0
 [c0232196] class_device_attr_store+0x26/0x40
 [c0188151] sysfs_write_file+0xb1/0x110
 [c01880a0] sysfs_write_file+0x0/0x110
 [c0153820] vfs_write+0xa0/0x140
 [c0153da1] sys_write+0x41/0x70
 [c010280e] sysenter_past_esp+0x5f/0x85
 ===
Code: 74 26 00 8d bc 27 00 00 00 00 83 ec 28 89 5c 24 18 8b 5c 24 2c
89 74 24 1c 89 7c 24 20 89 6c 24 24 89 d5 89 4c 24 08 89 44 24 0c 8b
42 08 83 c0 68 e8 94 a2 1a 0031 c0 b9 ff ff ff ff 8b 7c 24
EIP: [c0189241] create_dir+0x21/0x190 SS:ESP 0068:f7191b7c
Mar  1 21:09:07 localhost kernel: BUG: unable to handle kernel NULL
pointer dereference at virtual address 0008
Mar  1 21:09:07 localhost kClocksource tsc unstable (delta =
2941960870 ns)
ernel:  printingTime: pit clocksource has been installed.
 eip:
Mar  1 21:09:07 localhost kernel: *pde = 
Mar  1 21:09:07 localhost kernel: Oops:  [#1]
Mar  1 21:09:07 localhost kernel: CPU:0
Mar  1 21:09:07 localhost kernel: EIP:0060:[c0189241]Not
tainted VLI
Mar  1 21:09:07 localhost kernel: EFLAGS: 00210292   (2.6.22.10-
vs2.2.0.5-cisco-nmx #1)
Mar  1 21:09:07 localhost kernel: EIP is at create_dir+0x21/0x190
Mar  1 21:09:07 localhost kernel: eax: c2280584   ebx: f7191bac   ecx:
c2280588   edx: 
Mar  1 21:09:07 localhost kernel: esi: c2280584   edi: c2280584   ebp:
   esp: f7191b7c
Mar  1 21:09:07 localhost kernel: ds: 007b   es: 007b   fs:   gs:
0033  ss: 0068
Mar  1 21:09:07 localhost kernel: Process iscsid (pid: 5038,
ti=f719 task=f77c75b0 task.ti=f719)
Mar  1 21:09:07 localhost kernel: Stack: c0378110 c037566c c2280588
c2280584 c037566c c0103c41 c2280584 c2280584
Mar  1 21:09:07 localhost kernel:c2280584  c01893d4
f7191bac   c0203a1f f7b35ac0
Mar  1 21:09:07 localhost kernel:f7d690c0 c0186e80 
f7b35b10 f7b35ac0 c22804c0 c2280584 f7d690c0
Mar  1 21:09:07 localhost kernel: Call Trace:
Mar  1 21:09:07 localhost kernel:  [c0103c41] dump_stack+0x11/0x20
Mar  1 21:09:07 localhost kernel:  [c01893d4] sysfs_create_dir
+0x24/0x70
Mar  1 21:09:07 localhost kernel:  [c0203a1f] kobject_shadow_add
+0x7f/0x1a0
Mar  1 21:09:07 localhost kernel:  [c0186e80] register_disk
+0x50/0x1f0
Mar  1 21:09:07 localhost kernel:  [c01f4b72] blk_register_queue
+0x52/0x90

Re: kernel oops in resched_task

2010-01-06 Thread Pasi Kärkkäinen
On Wed, Jan 06, 2010 at 04:28:32PM +0200, Erez Zilber wrote:
 Hi,
 
 I got this oops while running open-iscsi on a CentOS 5.3 machine
 (don't know how to recreate it). it crashes while trying to wake up
 the work queue after queuecommand was called. Has anyone seen
 something similar?


I'd suggest you to upgrade to 5.4 and see if it still happens.

-- Pasi

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.




Re: kernel oops in resched_task

2010-01-06 Thread Mike Christie

On 01/06/2010 08:28 AM, Erez Zilber wrote:

Hi,

I got this oops while running open-iscsi on a CentOS 5.3 machine
(don't know how to recreate it). it crashes while trying to wake up
the work queue after queuecommand was called. Has anyone seen
something similar?



I have not seen it.

Are you using the kernel modules that come with 5.3 or the open-iscsi 
ported ones?
-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-is...@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.




Re: Kernel Oops

2009-10-27 Thread Mike Christie

Kevin Ye wrote:
 Hi Mike,
 
 I reproduced this problem with two simple scripts, one is to continuous
 login and then logout a target, the other script randomly fails the network
 and resume the network after 30 or 200 seconds.
 
 script1:
 while [ 1 ]
 do
   date
   iscsiadm -m node -l -T [target_name]-p [target_ip]
   date
   iscsiadm -m node -u -T [target_name] -p [target_ip]
 done
 
 script 2:
 while [ 1 ]
 do
   echo failing the wan
   ./disconnectip.sh 192.168.1.160
   sleep 30
   echo unfailing the wan
   ./reconnectip.sh 192.168.1.160
   sleep 300
   echo failing the wan
   ./disconnectip.sh 192.168.1.160
   sleep 200
   echo unfailing the wan
   ./reconnectip.sh 192.168.1.160
 done
 
 I hit the oops after 1 days of test. In this test, I didn't hit target NULL
 problem during logout. I think that the target NULL problem I mentioned
 before is caused by the killing of login process in my script due to
 timeout.
 
 I analyzed all the kernel oops I hit so far, it seems that if the network is
 failed just before the login process finish, then after 15 seconds of
 network down (less than 15 seconds after we see the kernel messate Attached
 SCSI disk), it complains connectionx:0: ping timeout of 15 secs expired,
 last rx x, last ping x, now x.
 
 Any idea what's the problem? Thanks.
 

Your original problem came when the network was not accessible. The 
iscsi initiator sends a iscsi ping every noop_timout seconds. When you 
saw the ping/nop timeout message it means that a iscsi ping timedout. 
The iscsi initiator will then, drop the connection and try to relogin 
every X seconds.

In the oops you sent we saw that something was forcing a logout and 
shutdown of the session at this time, but what I cannot figure out is 
why the oops is failing in code not related to iscsi. For some reason we 
are in the SPI code (so something related to mptspi maybe, which uses 
scsi_transport_spi which where spi_device_match is.

Oct  9 21:16:10 ian_ser_2 kernel: [28486.052073] EIP is at
spi_device_match+0x1a/0x60 [scsi_transport_spi]
Oct  9 21:16:10 ian_ser_2 kernel: [28486.052178] EAX:  EBX:
   c27ff0b0
ECX: c27ff000 EDX: c27ff0b0
Oct  9 21:16:10 ian_ser_2 kernel: [28486.052274] ESI: c27ff0b0 EDI:
   d0c31800

I think I saw this with older kernels, which is why I asked for you to 
try a newer one before. However, I was looking through the git commits 
and did not see any related fixes so I am not sure.

In the original problem, where you running a logout command manually or 
did you run somehting like /etc/init.d/iscsi restart (script name may be 
different)

In the oopses in your new tests, are you failing in the same place (in 
spi_device_match)?

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: Kernel Oops

2009-10-27 Thread Kevin Ye
Thanks a lot for your explaination, Mike.

In my orginal problem, a logout was issued manually after login. At the time
of this command was issued, kernel oops happened.

For the new test, it failed int the same place (in spi_device_match). I
cannot track the time when logout was issued, but I suspect it was the same
time when kernel oops happened. Below is the kernel log in my new test.

-Kevin


Oct 24 14:00:48 qye-cms kernel: [184273.268108] scsi55415 : iSCSI Initiator
over TCP/IP
Oct 24 14:00:48 qye-cms kernel: [184273.523222] scsi 55415:0:0:91:
Direct-Access IET  VIRTUAL-DISK 0PQ: 0 ANSI: 4
Oct 24 14:00:48 qye-cms kernel: [184273.528678] sd 55415:0:0:91: [sdb]
1024000 512-byte hardware sectors (524 MB)
Oct 24 14:00:48 qye-cms kernel: [184273.529105] sd 55415:0:0:91: [sdb] Write
Protect is off
Oct 24 14:00:48 qye-cms kernel: [184273.529109] sd 55415:0:0:91: [sdb] Mode
Sense: 77 00 00 08
Oct 24 14:00:48 qye-cms kernel: [184273.529545] sd 55415:0:0:91: [sdb] Write
cache: disabled, read cache: enabled, doesn't support DPO or FUA
Oct 24 14:00:48 qye-cms kernel: [184273.530040] sd 55415:0:0:91: [sdb]
1024000 512-byte hardware sectors (524 MB)
Oct 24 14:00:48 qye-cms kernel: [184273.531361] sd 55415:0:0:91: [sdb] Write
Protect is off
Oct 24 14:00:48 qye-cms kernel: [184273.531367] sd 55415:0:0:91: [sdb] Mode
Sense: 77 00 00 08
Oct 24 14:00:48 qye-cms kernel: [184273.533646] sd 55415:0:0:91: [sdb] Write
cache: disabled, read cache: enabled, doesn't support DPO or FUA
Oct 24 14:01:08 qye-cms kernel: [184273.533654]  sdb:3 connection55413:0:
ping timeout of 15 secs expired, last rx 46079163, last ping 46080413, now
46084163
Oct 24 14:01:08 qye-cms kernel: [184293.496601]  connection55413:0: detected
conn error (1011)
Oct 24 14:03:08 qye-cms kernel: [184413.530473]  session55413: session
recovery timed out after 120 secs
Oct 24 14:03:08 qye-cms kernel: [184413.534610] sd 55415:0:0:91: [sdb]
Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK
Oct 24 14:03:08 qye-cms kernel: [184413.534697] end_request: I/O error, dev
sdb, sector 8
Oct 24 14:03:08 qye-cms kernel: [184413.534804] Buffer I/O error on device
sdb, logical block 1
Oct 24 14:03:08 qye-cms kernel: [184413.535039] sd 55415:0:0:91: [sdb]
Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK
Oct 24 14:03:08 qye-cms kernel: [184413.535044] end_request: I/O error, dev
sdb, sector 8
Oct 24 14:03:08 qye-cms kernel: [184413.535053] Buffer I/O error on device
sdb, logical block 1
Oct 24 14:03:08 qye-cms kernel: [184413.535163] Dev sdb: unable to read RDB
block 8
Oct 24 14:03:08 qye-cms kernel: [184413.537038] sd 55415:0:0:91: [sdb]
Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK
Oct 24 14:03:08 qye-cms kernel: [184413.537048] end_request: I/O error, dev
sdb, sector 24
Oct 24 14:03:08 qye-cms kernel: [184413.537057] Buffer I/O error on device
sdb, logical block 3
Oct 24 14:03:08 qye-cms kernel: [184413.537185] sd 55415:0:0:91: [sdb]
Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK
Oct 24 14:03:08 qye-cms kernel: [184413.537189] end_request: I/O error, dev
sdb, sector 24
Oct 24 14:03:08 qye-cms kernel: [184413.537192] Buffer I/O error on device
sdb, logical block 3
Oct 24 14:03:08 qye-cms kernel: [184413.537326]  unable to read partition
table
Oct 24 14:03:08 qye-cms kernel: [184413.537398] sd 55415:0:0:91: [sdb]
Attached SCSI disk
Oct 24 14:03:08 qye-cms kernel: [184413.537478] sd 55415:0:0:91: Attached
scsi generic sg1 type 0
Oct 24 14:05:03 qye-cms kernel: [184413.597688] BUG: unable to handle kernel
NULL pointer dereference at virtual address 0060
Oct 24 14:05:03 qye-cms kernel: [184413.684629] printing eip: e08a212a *pde
= 
Oct 24 14:05:03 qye-cms kernel: [184413.684907] Oops:  [#1] SMP
Oct 24 14:05:03 qye-cms kernel: [184413.685202] Modules linked in:
iscsi_trgt crc32c libcrc32c vmblock vmmemctl cpufreq_conservative
cpufreq_ondemand cpufreq_userspace cpufreq_stats freq_table
cpufreq_powersave sbs video output sbshc dock battery iptable_filter
ip_tables x_tables vmhgfs iscsi_tcp libiscsi scsi_transport_iscsi lp loop
ipv6 parport_pc parport evdev container serio_raw psmouse ac button
i2c_piix4 intel_agp i2c_core pcspkr agpgart shpchp pci_hotplug ext3 jbd
mbcache sg sd_mod floppy pcnet32 mptspi mptscsih mptbase mii pata_acpi
ata_generic scsi_transport_spi ata_piix libata scsi_mod raid10 raid456
async_xor async_memcpy async_tx xor raid1 raid0 multipath linear md_mod
dm_mirror dm_snapshot dm_mod thermal processor fan fbcon tileblit font
bitblit softcursor fuse vmxnet
Oct 24 14:05:03 qye-cms kernel: [184413.686686]
Oct 24 14:05:03 qye-cms kernel: [184413.686773] Pid: 3770, comm: iscsid Not
tainted (2.6.24-24-generic #1)
Oct 24 14:05:03 qye-cms kernel: [184413.686850] EIP: 0060:[e08a212a]
EFLAGS: 00010202 CPU: 0
Oct 24 14:05:03 qye-cms kernel: [184413.725583] EIP is at
spi_device_match+0x1a/0x60 [scsi_transport_spi]
Oct 24 14:05:03 qye-cms kernel: [184413.725680] EAX:  EBX: 

Re: Kernel Oops

2009-10-26 Thread Kevin Ye
Hi Mike,

I reproduced this problem with two simple scripts, one is to continuous
login and then logout a target, the other script randomly fails the network
and resume the network after 30 or 200 seconds.

script1:
while [ 1 ]
do
  date
  iscsiadm -m node -l -T [target_name]-p [target_ip]
  date
  iscsiadm -m node -u -T [target_name] -p [target_ip]
done

script 2:
while [ 1 ]
do
  echo failing the wan
  ./disconnectip.sh 192.168.1.160
  sleep 30
  echo unfailing the wan
  ./reconnectip.sh 192.168.1.160
  sleep 300
  echo failing the wan
  ./disconnectip.sh 192.168.1.160
  sleep 200
  echo unfailing the wan
  ./reconnectip.sh 192.168.1.160
done

I hit the oops after 1 days of test. In this test, I didn't hit target NULL
problem during logout. I think that the target NULL problem I mentioned
before is caused by the killing of login process in my script due to
timeout.

I analyzed all the kernel oops I hit so far, it seems that if the network is
failed just before the login process finish, then after 15 seconds of
network down (less than 15 seconds after we see the kernel messate Attached
SCSI disk), it complains connectionx:0: ping timeout of 15 secs expired,
last rx x, last ping x, now x.

Any idea what's the problem? Thanks.

Regards,
Kevin


On Wed, Oct 21, 2009 at 12:46 PM, Mike Christie micha...@cs.wisc.eduwrote:


 Kevin Ye wrote:
  Thanks Mike.
 
  I did the tests you mentioned a couple of times, and it didn't cause
 kernel
  oops.
 
  The kernel Oops I hit does not happen often. I hit twice in last 4 weeks.
 
  kernel patch is welcome and I will give it a try. Thanks.
 

 Shoot, let me do some digging. I was hopping one of those manual
 commands would fire the problem. The one where you pull the cable
 yourself should have run over the same code and caused it.

 Are you using multipath? If not, for now you can just disble nops/pings.
 Set the noop timeout and noop interval to 0 for every target you have
 setup, and set this in the iscsid.conf (you could also set it in
 iscsid.conf then rediscovery the targets so it will get picked up).



  Kevin
 
  On Thu, Oct 15, 2009 at 12:06 PM, Mike Christie micha...@cs.wisc.edu
 wrote:
 
  On 10/14/2009 05:11 PM, Kevin Ye wrote:
  Hi All,
 
  We hit the kernel oops again on our setup. Any suggestion to fix that?
  If you just login then logout manually
 
  iscsiadm -m session -u
 
  Does that cause an oops?
 
 
  If you log back in, then pull the network cable, wait to see the ping
  timeout messages then manually logout
 
  iscsiadm -m session -u
 
  Does that cause an oops?
 
 
  Can you rebuild your kernel, if I send you a patch?
 
 
 
  Thanks.
 
  Our set up is:
  kernel: 2.6.24-24
  open-iscsi: 2.0-870.3
 
  kernel logs:
  Oct  9 21:15:50 ian_ser_2 kernel: [28466.697051] scsi841 : iSCSI
  Initiator
  over TCP/IP
  Oct  9 21:15:50 ian_ser_2 kernel: [28466.962031] scsi 841:0:0:201:
  Direct-Access IET  VIRTUAL-DISK 0PQ: 0 ANSI: 4
  Oct  9 21:15:50 ian_ser_2 kernel: [28466.969109] sd 841:0:0:201: [sdd]
  4505472 512-byte hardware sectors (2307 MB)
  Oct  9 21:15:50 ian_ser_2 kernel: [28466.973314] sd 841:0:0:201: [sdd]
  Write
  Protect is off
  Oct  9 21:15:50 ian_ser_2 kernel: [28466.973320] sd 841:0:0:201: [sdd]
  Mode
  Sense: 77 00 00 08
  Oct  9 21:15:50 ian_ser_2 kernel: [28466.975420] sd 841:0:0:201: [sdd]
  Write
  cache: disabled, read cache: disabled, doesn't support DPO or FUA
  Oct  9 21:15:50 ian_ser_2 kernel: [28466.977468] sd 841:0:0:201: [sdd]
  4505472 512-byte hardware sectors (2307 MB)
  Oct  9 21:15:50 ian_ser_2 kernel: [28466.977938] sd 841:0:0:201: [sdd]
  Write
  Protect is off
  Oct  9 21:15:50 ian_ser_2 kernel: [28466.977944] sd 841:0:0:201: [sdd]
  Mode
  Sense: 77 00 00 08
  Oct  9 21:15:50 ian_ser_2 kernel: [28466.981749] sd 841:0:0:201: [sdd]
  Write
  cache: disabled, read cache: disabled, doesn't support DPO or FUA
  Oct  9 21:15:50 ian_ser_2 kernel: [28466.981761]  sdd: sdd1
  Oct  9 21:15:50 ian_ser_2 kernel: [28467.027801] sd 841:0:0:201: [sdd]
  Attached SCSI disk
  Oct  9 21:15:50 ian_ser_2 kernel: [28467.027886] sd 841:0:0:201:
 Attached
  scsi generic sg4 type 0
  Oct  9 21:16:01 ian_ser_2 kernel: [28477.713280]  connection626:0: ping
  timeout of 15 secs expired, last rx 7049831, last ping 7052331, now
  7056081
  Oct  9 21:16:01 ian_ser_2 kernel: [28477.713467]  connection626:0:
  detected
  conn error (1011)
  Oct  9 21:16:01 ian_ser_2 kernel: [28477.717268]  connection627:0: ping
  timeout of 15 secs expired, last rx 7049832, last ping 7052332, now
  7056082
  Oct  9 21:16:01 ian_ser_2 kernel: [28477.717458]  connection627:0:
  detected
  conn error (1011)
  Oct  9 21:16:10 ian_ser_2 kernel: [28486.049414] BUG: unable to handle
  kernel NULL pointer dereference at virtual address 0060
  Oct  9 21:16:10 ian_ser_2 kernel: [28486.049639] printing eip: e08a212a
  *pde
  = 
  Oct  9 21:16:10 ian_ser_2 kernel: [28486.049924] Oops:  [#1] SMP
  Oct  9 21:16:10 ian_ser_2 kernel: 

Re: Kernel Oops

2009-10-22 Thread Kevin Ye
Thanks Mike.

I suspect that this problem is related the half-way done login when the
network is down. That is, before the login process finish, the network is
down and after 15 seconds of network down, it tries to fail the connection
but the connection is half-way done, the device or other properties is NULL
which causes kernel oops.

I tried to reproduce this but introduce other problem. I may failed the
network not the right time.

The other problem I hit is login fails, but leave the target name in the
kernel NULL, which cause logout fail. The only way to resolve this is to
restart the machine. The error output of logout is:
iscsiadm --mode node --logout all
iscsiadm: could not read session targetname: 61
iscsiadm: could not find session info for session127

iscsiadm: initiator reported error (15 - already exists)

So my conclusion is that the network failure in the middle of login is not
nicely handled.

I am still trying to reproduce the kernel oops other then the second error.

If you have any suggestion we should do, please let me know. Thanks.

Kevin

On Wed, Oct 21, 2009 at 12:46 PM, Mike Christie micha...@cs.wisc.eduwrote:


 Shoot, let me do some digging. I was hopping one of those manual
 commands would fire the problem. The one where you pull the cable
 yourself should have run over the same code and caused it.

 Are you using multipath? If not, for now you can just disble nops/pings.
 Set the noop timeout and noop interval to 0 for every target you have
 setup, and set this in the iscsid.conf (you could also set it in
 iscsid.conf then rediscovery the targets so it will get picked up).



  Kevin
 
  On Thu, Oct 15, 2009 at 12:06 PM, Mike Christie micha...@cs.wisc.edu
 wrote:
 
  On 10/14/2009 05:11 PM, Kevin Ye wrote:
  Hi All,
 
  We hit the kernel oops again on our setup. Any suggestion to fix that?
  If you just login then logout manually
 
  iscsiadm -m session -u
 
  Does that cause an oops?
 
 
  If you log back in, then pull the network cable, wait to see the ping
  timeout messages then manually logout
 
  iscsiadm -m session -u
 
  Does that cause an oops?
 
 
  Can you rebuild your kernel, if I send you a patch?
 
 
 
  Thanks.
 
  Our set up is:
  kernel: 2.6.24-24
  open-iscsi: 2.0-870.3
 
  kernel logs:
  Oct  9 21:15:50 ian_ser_2 kernel: [28466.697051] scsi841 : iSCSI
  Initiator
  over TCP/IP
  Oct  9 21:15:50 ian_ser_2 kernel: [28466.962031] scsi 841:0:0:201:
  Direct-Access IET  VIRTUAL-DISK 0PQ: 0 ANSI: 4
  Oct  9 21:15:50 ian_ser_2 kernel: [28466.969109] sd 841:0:0:201: [sdd]
  4505472 512-byte hardware sectors (2307 MB)
  Oct  9 21:15:50 ian_ser_2 kernel: [28466.973314] sd 841:0:0:201: [sdd]
  Write
  Protect is off
  Oct  9 21:15:50 ian_ser_2 kernel: [28466.973320] sd 841:0:0:201: [sdd]
  Mode
  Sense: 77 00 00 08
  Oct  9 21:15:50 ian_ser_2 kernel: [28466.975420] sd 841:0:0:201: [sdd]
  Write
  cache: disabled, read cache: disabled, doesn't support DPO or FUA
  Oct  9 21:15:50 ian_ser_2 kernel: [28466.977468] sd 841:0:0:201: [sdd]
  4505472 512-byte hardware sectors (2307 MB)
  Oct  9 21:15:50 ian_ser_2 kernel: [28466.977938] sd 841:0:0:201: [sdd]
  Write
  Protect is off
  Oct  9 21:15:50 ian_ser_2 kernel: [28466.977944] sd 841:0:0:201: [sdd]
  Mode
  Sense: 77 00 00 08
  Oct  9 21:15:50 ian_ser_2 kernel: [28466.981749] sd 841:0:0:201: [sdd]
  Write
  cache: disabled, read cache: disabled, doesn't support DPO or FUA
  Oct  9 21:15:50 ian_ser_2 kernel: [28466.981761]  sdd: sdd1
  Oct  9 21:15:50 ian_ser_2 kernel: [28467.027801] sd 841:0:0:201: [sdd]
  Attached SCSI disk
  Oct  9 21:15:50 ian_ser_2 kernel: [28467.027886] sd 841:0:0:201:
 Attached
  scsi generic sg4 type 0
  Oct  9 21:16:01 ian_ser_2 kernel: [28477.713280]  connection626:0: ping
  timeout of 15 secs expired, last rx 7049831, last ping 7052331, now
  7056081
  Oct  9 21:16:01 ian_ser_2 kernel: [28477.713467]  connection626:0:
  detected
  conn error (1011)
  Oct  9 21:16:01 ian_ser_2 kernel: [28477.717268]  connection627:0: ping
  timeout of 15 secs expired, last rx 7049832, last ping 7052332, now
  7056082
  Oct  9 21:16:01 ian_ser_2 kernel: [28477.717458]  connection627:0:
  detected
  conn error (1011)
  Oct  9 21:16:10 ian_ser_2 kernel: [28486.049414] BUG: unable to handle
  kernel NULL pointer dereference at virtual address 0060
  Oct  9 21:16:10 ian_ser_2 kernel: [28486.049639] printing eip: e08a212a
  *pde
  = 
  Oct  9 21:16:10 ian_ser_2 kernel: [28486.049924] Oops:  [#1] SMP
  Oct  9 21:16:10 ian_ser_2 kernel: [28486.050100] Modules linked in:
  iscsi_tcp libiscsi scsi_transport_iscsi iscsi_trgt crc32c libcrc32c
  nls_iso8859_1 nls_cp437 vfat fat vmmemctl cpufreq_conservative
  cpufreq_ondemand cpufreq_userspace cpufreq_stats freq_table
  cpufreq_powersave sbs video output sbshc dock battery iptable_filter
  ip_tables x_tables vmhgfs lp loop ipv6 container serio_raw ac button
  evdev
  parport_pc parport i2c_piix4 i2c_core intel_agp agpgart shpchp
  pci_hotplug
  

Re: Kernel Oops

2009-10-21 Thread Mike Christie

Kevin Ye wrote:
 Thanks Mike.
 
 I did the tests you mentioned a couple of times, and it didn't cause kernel
 oops.
 
 The kernel Oops I hit does not happen often. I hit twice in last 4 weeks.
 
 kernel patch is welcome and I will give it a try. Thanks.
 

Shoot, let me do some digging. I was hopping one of those manual 
commands would fire the problem. The one where you pull the cable 
yourself should have run over the same code and caused it.

Are you using multipath? If not, for now you can just disble nops/pings. 
Set the noop timeout and noop interval to 0 for every target you have 
setup, and set this in the iscsid.conf (you could also set it in 
iscsid.conf then rediscovery the targets so it will get picked up).



 Kevin
 
 On Thu, Oct 15, 2009 at 12:06 PM, Mike Christie micha...@cs.wisc.eduwrote:
 
 On 10/14/2009 05:11 PM, Kevin Ye wrote:
 Hi All,

 We hit the kernel oops again on our setup. Any suggestion to fix that?
 If you just login then logout manually

 iscsiadm -m session -u

 Does that cause an oops?


 If you log back in, then pull the network cable, wait to see the ping
 timeout messages then manually logout

 iscsiadm -m session -u

 Does that cause an oops?


 Can you rebuild your kernel, if I send you a patch?



 Thanks.

 Our set up is:
 kernel: 2.6.24-24
 open-iscsi: 2.0-870.3

 kernel logs:
 Oct  9 21:15:50 ian_ser_2 kernel: [28466.697051] scsi841 : iSCSI
 Initiator
 over TCP/IP
 Oct  9 21:15:50 ian_ser_2 kernel: [28466.962031] scsi 841:0:0:201:
 Direct-Access IET  VIRTUAL-DISK 0PQ: 0 ANSI: 4
 Oct  9 21:15:50 ian_ser_2 kernel: [28466.969109] sd 841:0:0:201: [sdd]
 4505472 512-byte hardware sectors (2307 MB)
 Oct  9 21:15:50 ian_ser_2 kernel: [28466.973314] sd 841:0:0:201: [sdd]
 Write
 Protect is off
 Oct  9 21:15:50 ian_ser_2 kernel: [28466.973320] sd 841:0:0:201: [sdd]
 Mode
 Sense: 77 00 00 08
 Oct  9 21:15:50 ian_ser_2 kernel: [28466.975420] sd 841:0:0:201: [sdd]
 Write
 cache: disabled, read cache: disabled, doesn't support DPO or FUA
 Oct  9 21:15:50 ian_ser_2 kernel: [28466.977468] sd 841:0:0:201: [sdd]
 4505472 512-byte hardware sectors (2307 MB)
 Oct  9 21:15:50 ian_ser_2 kernel: [28466.977938] sd 841:0:0:201: [sdd]
 Write
 Protect is off
 Oct  9 21:15:50 ian_ser_2 kernel: [28466.977944] sd 841:0:0:201: [sdd]
 Mode
 Sense: 77 00 00 08
 Oct  9 21:15:50 ian_ser_2 kernel: [28466.981749] sd 841:0:0:201: [sdd]
 Write
 cache: disabled, read cache: disabled, doesn't support DPO or FUA
 Oct  9 21:15:50 ian_ser_2 kernel: [28466.981761]  sdd: sdd1
 Oct  9 21:15:50 ian_ser_2 kernel: [28467.027801] sd 841:0:0:201: [sdd]
 Attached SCSI disk
 Oct  9 21:15:50 ian_ser_2 kernel: [28467.027886] sd 841:0:0:201: Attached
 scsi generic sg4 type 0
 Oct  9 21:16:01 ian_ser_2 kernel: [28477.713280]  connection626:0: ping
 timeout of 15 secs expired, last rx 7049831, last ping 7052331, now
 7056081
 Oct  9 21:16:01 ian_ser_2 kernel: [28477.713467]  connection626:0:
 detected
 conn error (1011)
 Oct  9 21:16:01 ian_ser_2 kernel: [28477.717268]  connection627:0: ping
 timeout of 15 secs expired, last rx 7049832, last ping 7052332, now
 7056082
 Oct  9 21:16:01 ian_ser_2 kernel: [28477.717458]  connection627:0:
 detected
 conn error (1011)
 Oct  9 21:16:10 ian_ser_2 kernel: [28486.049414] BUG: unable to handle
 kernel NULL pointer dereference at virtual address 0060
 Oct  9 21:16:10 ian_ser_2 kernel: [28486.049639] printing eip: e08a212a
 *pde
 = 
 Oct  9 21:16:10 ian_ser_2 kernel: [28486.049924] Oops:  [#1] SMP
 Oct  9 21:16:10 ian_ser_2 kernel: [28486.050100] Modules linked in:
 iscsi_tcp libiscsi scsi_transport_iscsi iscsi_trgt crc32c libcrc32c
 nls_iso8859_1 nls_cp437 vfat fat vmmemctl cpufreq_conservative
 cpufreq_ondemand cpufreq_userspace cpufreq_stats freq_table
 cpufreq_powersave sbs video output sbshc dock battery iptable_filter
 ip_tables x_tables vmhgfs lp loop ipv6 container serio_raw ac button
 evdev
 parport_pc parport i2c_piix4 i2c_core intel_agp agpgart shpchp
 pci_hotplug
 psmouse pcspkr ext3 jbd mbcache sd_mod sg sr_mod cdrom pata_acpi
 ata_generic
 floppy pcnet32 mii mptspi mptscsih mptbase scsi_transport_spi ata_piix
 libata scsi_mod raid10 raid456 async_xor async_memcpy async_tx xor raid1
 raid0 multipath linear md_mod dm_mirror dm_snapshot dm_mod thermal
 processor
 fan fbcon tileblit font bitblit softcursor fuse vmxnet
 Oct  9 21:16:10 ian_ser_2 kernel: [28486.051174]

 Oct  9 21:16:10 ian_ser_2 kernel: [28486.051174]
 Oct  9 21:16:10 ian_ser_2 kernel: [28486.051286] Pid: 16444, comm:
 iscsi_scan_839 Not tainted (2.6.24-24-generic #1)
 Oct  9 21:16:10 ian_ser_2 kernel: [28486.051433] EIP: 0060:[e08a212a]
 EFLAGS: 00010202 CPU: 0
 Oct  9 21:16:10 ian_ser_2 kernel: [28486.052073] EIP is at
 spi_device_match+0x1a/0x60 [scsi_transport_spi]
 Oct  9 21:16:10 ian_ser_2 kernel: [28486.052178] EAX:  EBX:
 c27ff0b0
 ECX: c27ff000 EDX: c27ff0b0
 Oct  9 21:16:10 ian_ser_2 kernel: [28486.052274] ESI: c27ff0b0 EDI:
 d0c31800
 EBP: c0286000 ESP: 

Re: Kernel Oops

2009-10-15 Thread Mike Christie

On 10/14/2009 05:11 PM, Kevin Ye wrote:
 Hi All,

 We hit the kernel oops again on our setup. Any suggestion to fix that?

If you just login then logout manually

iscsiadm -m session -u

Does that cause an oops?


If you log back in, then pull the network cable, wait to see the ping 
timeout messages then manually logout

iscsiadm -m session -u

Does that cause an oops?


Can you rebuild your kernel, if I send you a patch?



 Thanks.

 Our set up is:
 kernel: 2.6.24-24
 open-iscsi: 2.0-870.3

 kernel logs:
 Oct  9 21:15:50 ian_ser_2 kernel: [28466.697051] scsi841 : iSCSI Initiator
 over TCP/IP
 Oct  9 21:15:50 ian_ser_2 kernel: [28466.962031] scsi 841:0:0:201:
 Direct-Access IET  VIRTUAL-DISK 0PQ: 0 ANSI: 4
 Oct  9 21:15:50 ian_ser_2 kernel: [28466.969109] sd 841:0:0:201: [sdd]
 4505472 512-byte hardware sectors (2307 MB)
 Oct  9 21:15:50 ian_ser_2 kernel: [28466.973314] sd 841:0:0:201: [sdd] Write
 Protect is off
 Oct  9 21:15:50 ian_ser_2 kernel: [28466.973320] sd 841:0:0:201: [sdd] Mode
 Sense: 77 00 00 08
 Oct  9 21:15:50 ian_ser_2 kernel: [28466.975420] sd 841:0:0:201: [sdd] Write
 cache: disabled, read cache: disabled, doesn't support DPO or FUA
 Oct  9 21:15:50 ian_ser_2 kernel: [28466.977468] sd 841:0:0:201: [sdd]
 4505472 512-byte hardware sectors (2307 MB)
 Oct  9 21:15:50 ian_ser_2 kernel: [28466.977938] sd 841:0:0:201: [sdd] Write
 Protect is off
 Oct  9 21:15:50 ian_ser_2 kernel: [28466.977944] sd 841:0:0:201: [sdd] Mode
 Sense: 77 00 00 08
 Oct  9 21:15:50 ian_ser_2 kernel: [28466.981749] sd 841:0:0:201: [sdd] Write
 cache: disabled, read cache: disabled, doesn't support DPO or FUA
 Oct  9 21:15:50 ian_ser_2 kernel: [28466.981761]  sdd: sdd1
 Oct  9 21:15:50 ian_ser_2 kernel: [28467.027801] sd 841:0:0:201: [sdd]
 Attached SCSI disk
 Oct  9 21:15:50 ian_ser_2 kernel: [28467.027886] sd 841:0:0:201: Attached
 scsi generic sg4 type 0
 Oct  9 21:16:01 ian_ser_2 kernel: [28477.713280]  connection626:0: ping
 timeout of 15 secs expired, last rx 7049831, last ping 7052331, now 7056081
 Oct  9 21:16:01 ian_ser_2 kernel: [28477.713467]  connection626:0: detected
 conn error (1011)
 Oct  9 21:16:01 ian_ser_2 kernel: [28477.717268]  connection627:0: ping
 timeout of 15 secs expired, last rx 7049832, last ping 7052332, now 7056082
 Oct  9 21:16:01 ian_ser_2 kernel: [28477.717458]  connection627:0: detected
 conn error (1011)
 Oct  9 21:16:10 ian_ser_2 kernel: [28486.049414] BUG: unable to handle
 kernel NULL pointer dereference at virtual address 0060
 Oct  9 21:16:10 ian_ser_2 kernel: [28486.049639] printing eip: e08a212a *pde
 = 
 Oct  9 21:16:10 ian_ser_2 kernel: [28486.049924] Oops:  [#1] SMP
 Oct  9 21:16:10 ian_ser_2 kernel: [28486.050100] Modules linked in:
 iscsi_tcp libiscsi scsi_transport_iscsi iscsi_trgt crc32c libcrc32c
 nls_iso8859_1 nls_cp437 vfat fat vmmemctl cpufreq_conservative
 cpufreq_ondemand cpufreq_userspace cpufreq_stats freq_table
 cpufreq_powersave sbs video output sbshc dock battery iptable_filter
 ip_tables x_tables vmhgfs lp loop ipv6 container serio_raw ac button evdev
 parport_pc parport i2c_piix4 i2c_core intel_agp agpgart shpchp pci_hotplug
 psmouse pcspkr ext3 jbd mbcache sd_mod sg sr_mod cdrom pata_acpi ata_generic
 floppy pcnet32 mii mptspi mptscsih mptbase scsi_transport_spi ata_piix
 libata scsi_mod raid10 raid456 async_xor async_memcpy async_tx xor raid1
 raid0 multipath linear md_mod dm_mirror dm_snapshot dm_mod thermal processor
 fan fbcon tileblit font bitblit softcursor fuse vmxnet
 Oct  9 21:16:10 ian_ser_2 kernel: [28486.051174]

 Oct  9 21:16:10 ian_ser_2 kernel: [28486.051174]
 Oct  9 21:16:10 ian_ser_2 kernel: [28486.051286] Pid: 16444, comm:
 iscsi_scan_839 Not tainted (2.6.24-24-generic #1)
 Oct  9 21:16:10 ian_ser_2 kernel: [28486.051433] EIP: 0060:[e08a212a]
 EFLAGS: 00010202 CPU: 0
 Oct  9 21:16:10 ian_ser_2 kernel: [28486.052073] EIP is at
 spi_device_match+0x1a/0x60 [scsi_transport_spi]
 Oct  9 21:16:10 ian_ser_2 kernel: [28486.052178] EAX:  EBX: c27ff0b0
 ECX: c27ff000 EDX: c27ff0b0
 Oct  9 21:16:10 ian_ser_2 kernel: [28486.052274] ESI: c27ff0b0 EDI: d0c31800
 EBP: c0286000 ESP: dc1dfef0
 Oct  9 21:16:10 ian_ser_2 kernel: [28486.052367]  DS: 007b ES: 007b FS: 00d8
 GS:  SS: 0068
 Oct  9 21:16:10 ian_ser_2 kernel: [28486.052473] Process iscsi_scan_839
 (pid: 16444, ti=dc1de000 task=d7823140 task.ti=dc1de000)
 Oct  9 21:16:10 ian_ser_2 kernel: [28486.052586] Stack: e08a7c90 c0285c8f
 e095e328 c27ff1d8 cc1c3430 c27ff000 c27ff0b0 d0c31800
 Oct  9 21:16:10 ian_ser_2 kernel: [28486.052792]0202 e09449cd
 c27ff000 cc1c3430 e0944a2f c27ff000 cc1c3400 e0944acc
 Oct  9 21:16:10 ian_ser_2 kernel: [28486.052944]d0c31814 cc1c30a4
 e0944b50  e0944b64  c02805c2 cc1c30a4
 Oct  9 21:16:10 ian_ser_2 kernel: [28486.053101] Call Trace:
 Oct  9 21:16:10 ian_ser_2 kernel: [28486.053340]  [c0285c8f]
 attribute_container_device_trigger+0x4f/0xb0
 Oct  9 21:16:10 ian_ser_2 kernel: [28486.053963] 

Re: Kernel Oops

2009-10-15 Thread Kevin Ye
Thanks Mike.

I did the tests you mentioned a couple of times, and it didn't cause kernel
oops.

The kernel Oops I hit does not happen often. I hit twice in last 4 weeks.

kernel patch is welcome and I will give it a try. Thanks.

Kevin

On Thu, Oct 15, 2009 at 12:06 PM, Mike Christie micha...@cs.wisc.eduwrote:


 On 10/14/2009 05:11 PM, Kevin Ye wrote:
  Hi All,
 
  We hit the kernel oops again on our setup. Any suggestion to fix that?

 If you just login then logout manually

 iscsiadm -m session -u

 Does that cause an oops?


 If you log back in, then pull the network cable, wait to see the ping
 timeout messages then manually logout

 iscsiadm -m session -u

 Does that cause an oops?


 Can you rebuild your kernel, if I send you a patch?



  Thanks.
 
  Our set up is:
  kernel: 2.6.24-24
  open-iscsi: 2.0-870.3
 
  kernel logs:
  Oct  9 21:15:50 ian_ser_2 kernel: [28466.697051] scsi841 : iSCSI
 Initiator
  over TCP/IP
  Oct  9 21:15:50 ian_ser_2 kernel: [28466.962031] scsi 841:0:0:201:
  Direct-Access IET  VIRTUAL-DISK 0PQ: 0 ANSI: 4
  Oct  9 21:15:50 ian_ser_2 kernel: [28466.969109] sd 841:0:0:201: [sdd]
  4505472 512-byte hardware sectors (2307 MB)
  Oct  9 21:15:50 ian_ser_2 kernel: [28466.973314] sd 841:0:0:201: [sdd]
 Write
  Protect is off
  Oct  9 21:15:50 ian_ser_2 kernel: [28466.973320] sd 841:0:0:201: [sdd]
 Mode
  Sense: 77 00 00 08
  Oct  9 21:15:50 ian_ser_2 kernel: [28466.975420] sd 841:0:0:201: [sdd]
 Write
  cache: disabled, read cache: disabled, doesn't support DPO or FUA
  Oct  9 21:15:50 ian_ser_2 kernel: [28466.977468] sd 841:0:0:201: [sdd]
  4505472 512-byte hardware sectors (2307 MB)
  Oct  9 21:15:50 ian_ser_2 kernel: [28466.977938] sd 841:0:0:201: [sdd]
 Write
  Protect is off
  Oct  9 21:15:50 ian_ser_2 kernel: [28466.977944] sd 841:0:0:201: [sdd]
 Mode
  Sense: 77 00 00 08
  Oct  9 21:15:50 ian_ser_2 kernel: [28466.981749] sd 841:0:0:201: [sdd]
 Write
  cache: disabled, read cache: disabled, doesn't support DPO or FUA
  Oct  9 21:15:50 ian_ser_2 kernel: [28466.981761]  sdd: sdd1
  Oct  9 21:15:50 ian_ser_2 kernel: [28467.027801] sd 841:0:0:201: [sdd]
  Attached SCSI disk
  Oct  9 21:15:50 ian_ser_2 kernel: [28467.027886] sd 841:0:0:201: Attached
  scsi generic sg4 type 0
  Oct  9 21:16:01 ian_ser_2 kernel: [28477.713280]  connection626:0: ping
  timeout of 15 secs expired, last rx 7049831, last ping 7052331, now
 7056081
  Oct  9 21:16:01 ian_ser_2 kernel: [28477.713467]  connection626:0:
 detected
  conn error (1011)
  Oct  9 21:16:01 ian_ser_2 kernel: [28477.717268]  connection627:0: ping
  timeout of 15 secs expired, last rx 7049832, last ping 7052332, now
 7056082
  Oct  9 21:16:01 ian_ser_2 kernel: [28477.717458]  connection627:0:
 detected
  conn error (1011)
  Oct  9 21:16:10 ian_ser_2 kernel: [28486.049414] BUG: unable to handle
  kernel NULL pointer dereference at virtual address 0060
  Oct  9 21:16:10 ian_ser_2 kernel: [28486.049639] printing eip: e08a212a
 *pde
  = 
  Oct  9 21:16:10 ian_ser_2 kernel: [28486.049924] Oops:  [#1] SMP
  Oct  9 21:16:10 ian_ser_2 kernel: [28486.050100] Modules linked in:
  iscsi_tcp libiscsi scsi_transport_iscsi iscsi_trgt crc32c libcrc32c
  nls_iso8859_1 nls_cp437 vfat fat vmmemctl cpufreq_conservative
  cpufreq_ondemand cpufreq_userspace cpufreq_stats freq_table
  cpufreq_powersave sbs video output sbshc dock battery iptable_filter
  ip_tables x_tables vmhgfs lp loop ipv6 container serio_raw ac button
 evdev
  parport_pc parport i2c_piix4 i2c_core intel_agp agpgart shpchp
 pci_hotplug
  psmouse pcspkr ext3 jbd mbcache sd_mod sg sr_mod cdrom pata_acpi
 ata_generic
  floppy pcnet32 mii mptspi mptscsih mptbase scsi_transport_spi ata_piix
  libata scsi_mod raid10 raid456 async_xor async_memcpy async_tx xor raid1
  raid0 multipath linear md_mod dm_mirror dm_snapshot dm_mod thermal
 processor
  fan fbcon tileblit font bitblit softcursor fuse vmxnet
  Oct  9 21:16:10 ian_ser_2 kernel: [28486.051174]
 
  Oct  9 21:16:10 ian_ser_2 kernel: [28486.051174]
  Oct  9 21:16:10 ian_ser_2 kernel: [28486.051286] Pid: 16444, comm:
  iscsi_scan_839 Not tainted (2.6.24-24-generic #1)
  Oct  9 21:16:10 ian_ser_2 kernel: [28486.051433] EIP: 0060:[e08a212a]
  EFLAGS: 00010202 CPU: 0
  Oct  9 21:16:10 ian_ser_2 kernel: [28486.052073] EIP is at
  spi_device_match+0x1a/0x60 [scsi_transport_spi]
  Oct  9 21:16:10 ian_ser_2 kernel: [28486.052178] EAX:  EBX:
 c27ff0b0
  ECX: c27ff000 EDX: c27ff0b0
  Oct  9 21:16:10 ian_ser_2 kernel: [28486.052274] ESI: c27ff0b0 EDI:
 d0c31800
  EBP: c0286000 ESP: dc1dfef0
  Oct  9 21:16:10 ian_ser_2 kernel: [28486.052367]  DS: 007b ES: 007b FS:
 00d8
  GS:  SS: 0068
  Oct  9 21:16:10 ian_ser_2 kernel: [28486.052473] Process iscsi_scan_839
  (pid: 16444, ti=dc1de000 task=d7823140 task.ti=dc1de000)
  Oct  9 21:16:10 ian_ser_2 kernel: [28486.052586] Stack: e08a7c90 c0285c8f
  e095e328 c27ff1d8 cc1c3430 c27ff000 c27ff0b0 d0c31800
  Oct  9 21:16:10 ian_ser_2 kernel: [28486.052792]

Re: kernel oops in iscsi_tcp_recv

2008-12-17 Thread Mike Christie

Erez Zilber wrote:
 Mike,
 
 I got a kernel oops while logging in from v870-1 to an iSCSI-SCST
 target. it happens before 'iscsiadm -m node -L all' returns. Is this a
 known bug? Here's the log:
 

I think this is a new bug. I will give scst a try and try to replicate.

Are you running v870-1 in a normal machine (no vm)? What nic are you 
using? What kernel are you running against?

If you are using a intel 10 gig nic, it might be a known bug with the 
skb_seq_read function handling of LRO, but I think the oops looks a 
little different.


 Dec 17 19:56:21 172.16.4.12 Unable to handle kernel paging request
 Dec 17 19:56:26 172.16.4.12  at 1fd8 RIP:
 Dec 17 19:56:31 172.16.4.12  [883d759b]
 :iscsi_tcp:iscsi_tcp_recv+0xb9/0x498
 Dec 17 19:56:36 172.16.4.12 PGD 406b54067
 Dec 17 19:56:41 172.16.4.12 PUD 409354067
 Dec 17 19:56:46 172.16.4.12 PMD 0
 Dec 17 19:56:51 172.16.4.12
 Dec 17 19:56:56 172.16.4.12 Oops:  [1]
 Dec 17 19:57:01 172.16.4.12 SMP
 Dec 17 19:57:06 172.16.4.12
 Dec 17 19:57:11 172.16.4.12 last sysfs file: /block/sdc/removable
 Dec 17 19:57:16 172.16.4.12 CPU 0
 
 I will try to add more debug prints tomorrow, and see if I can give
 more details.
 
 Erez
 
  


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---