CAM Target Layer, Linux and camcontrol readcap

2012-09-26 Thread Nikolay Denev
Hi,

I'm running RELENG_9 and I'm trying to play with CTL.
Initially I've setup an isp(4) interface in TARGET mode and tried to export a 
LUN to a directly connected
Linux RHEL host, but for some reason that failed with the block backend 
(ramdisk was exported properly) :

This is how I export the volume :

zfs create -V1000G tank/oracle
ctladm create -b block -o file=/dev/zvol/tank/oracle -S ZFSSERIAL001 -d 
ZFSLUN001
ctladm port -o on
ctladm realsync off 

camcontrol and ctladm show the device correctly :

[10:25]root@goliath:/home/ndenev# ctladm port -l
Port Online Type Name pp vp WWNN   WWPN  
0YESIOCTLCTL ioctl0  0  0  0 
1YESINTERNAL ctl2cam  0  0  0x50091a1f4700 0x50091a1f4702
2YESINTERNAL CTL internal 0  0  0  0 
3YESFC   isp0 0  0  0x2024ff376b98 0x2124ff376b98
4YESFC   isp1 1  0  0x2024ff376b99 0x2124ff376b99

[10:26]root@goliath:/home/ndenev# ctladm devlist -v
LUN Backend   Size (Blocks)   BS Serial NumberDevice ID   
  0 block2097152000  512 ZFSSERIAL001 ZFSLUN001   
  lun_type=0
  num_threads=14
  file=/dev/zvol/tank/oracle
[10:26]root@goliath:/home/ndenev# camcontrol devlist
FREEBSD CTLDISK 0001 at scbus2 target 1 lun 0 (da0,pass0)


This is what I see on the Linux host for the exported zvol:

qla2xxx :0a:00.1: LOOP UP detected (8 Gbps).
qla2xxx :0a:00.1: qla2xxx_eh_host_reset: reset succeeded
qla2xxx :0a:00.1: scsi(4:0:0): Abort command issued -- 1 c 2002.
sd 4:0:0:0: scsi: Device offlined - not ready after error recovery
sd 4:0:0:0: rejecting I/O to offline device
sd 4:0:0:0: rejecting I/O to offline device
sd 4:0:0:0: rejecting I/O to offline device
sdq : READ CAPACITY failed.
sdq : status=0, message=00, host=1, driver=00 
sdq : sense not available. 
sd 4:0:0:0: rejecting I/O to offline device
sdq: Write Protect is off
sdq: Mode Sense: 00 00 00 00
sd 4:0:0:0: rejecting I/O to offline device
sdq: asking for cache data failed
sdq: assuming drive cache: write through
sd 4:0:0:0: Attached scsi disk sdq
sd 4:0:0:0: Attached scsi generic sg18 type 0
sd 4:0:0:0: rejecting I/O to offline device
sd 4:0:0:0: rejecting I/O to offline device
sd 4:0:0:0: rejecting I/O to offline device
sd 4:0:0:0: rejecting I/O to offline device
sd 4:0:0:0: rejecting I/O to offline device
sd 4:0:0:0: rejecting I/O to offline device

I've noticed the READ CAPACITY failed message and tried to issue a several 
camcontrol readcap commands,
but they got stuck and are now unkillable :

[10:21]root@goliath:/home/ndenev# ps axuw|grep cam
root297390.0  0.0  16300 1628  0- DL+   8:09AM 0:00.00 camcontrol 
readcap da0
root300330.0  0.0  16300 1628  1- DL+   8:12AM 0:00.00 camcontrol 
readcap 2:1:0
root301350.0  0.0  16300 1632  2  D+8:21AM 0:00.00 camcontrol 
start pass0

procstat shows the same kernel stack for them :

[10:23]root@goliath.:/home/ndenev# procstat -kk 30033
  PIDTID COMM TDNAME   KSTACK   
30033 101161 camcontrol   -mi_switch+0x186 sleepq_wait+0x42 
_sleep+0x390 cam_periph_runccb+0x5a passioctl+0x171 devfs_ioctl_f+0x7b 
kern_ioctl+0x115 sys_ioctl+0xfd amd64_syscall+0x546 Xfast_syscall+0xf7 

Also there is this suspicious message on the FreeBSD machine :

cfcs_action: unsupported CCB type 0x918

Then I've tried to remove the volume from CTL but the command also got stuck :

[10:29]root@goliath:/home/ndenev# ps axuww|grep ctladm
root542690.0  0.0  18484 1692  3  I 8:24AM 0:00.00 ctladm 
remove -b block -l 0
root209690.0  0.0  16280 1720  5  S+   10:30AM 0:00.00 grep ctladm

[10:30]root@goliath:/home/ndenev# procstat -kk 54269
  PIDTID COMM TDNAME   KSTACK   
54269 101580 ctladm   -mi_switch+0x186 
sleepq_catch_signals+0x2cc sleepq_wait_sig+0xc _sleep+0x2b9 
ctl_be_block_ioctl+0x9aa ctl_ioctl+0x9e6 devfs_ioctl_f+0x7b kern_ioctl+0x115 
sys_ioctl+0xfd amd64_syscall+0x546 Xfast_syscall+0xf7 

Some Linux mailing lists suggest that maybe a firmware upgrade on the Linux box 
can help, but
this does not explain the stuck readcap and ctladm commands.

Any ideas on how to debug this further are 
welcome.___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: CAM Target Layer, Linux and camcontrol readcap

2012-09-26 Thread Chuck Tuffli
On Wed, Sep 26, 2012 at 1:33 AM, Nikolay Denev nde...@gmail.com wrote:
 Hi,

 I'm running RELENG_9 and I'm trying to play with CTL.
 Initially I've setup an isp(4) interface in TARGET mode and tried to export a 
 LUN to a directly connected
 Linux RHEL host, but for some reason that failed with the block backend 
 (ramdisk was exported properly) :

 This is how I export the volume :

 zfs create -V1000G tank/oracle
 ctladm create -b block -o file=/dev/zvol/tank/oracle -S ZFSSERIAL001 -d 
 ZFSLUN001
 ctladm port -o on
 ctladm realsync off

This is similar to what I do, but you might try turning off realsync
before turning the port on and only turning on the FC ports. I.e.

   ctladm realsync off
   ctladm port -o on -t fc

If that doesn't help, it would be interesting to see if something is stuck via

   ctladm dumpooa

---chuck
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: CAM Target Layer, Linux and camcontrol readcap

2012-09-26 Thread Nikolay Denev

On Sep 26, 2012, at 5:29 PM, Chuck Tuffli ctuf...@gmail.com wrote:

 On Wed, Sep 26, 2012 at 1:33 AM, Nikolay Denev nde...@gmail.com wrote:
 Hi,
 
 I'm running RELENG_9 and I'm trying to play with CTL.
 Initially I've setup an isp(4) interface in TARGET mode and tried to export 
 a LUN to a directly connected
 Linux RHEL host, but for some reason that failed with the block backend 
 (ramdisk was exported properly) :
 
 This is how I export the volume :
 
 zfs create -V1000G tank/oracle
 ctladm create -b block -o file=/dev/zvol/tank/oracle -S ZFSSERIAL001 -d 
 ZFSLUN001
 ctladm port -o on
 ctladm realsync off
 
 This is similar to what I do, but you might try turning off realsync
 before turning the port on and only turning on the FC ports. I.e.
 
   ctladm realsync off
   ctladm port -o on -t fc
 
 If that doesn't help, it would be interesting to see if something is stuck via
 
   ctladm dumpooa
 
 ---chuck

I've did ctladm dumooa :

Dumping OOA queues
LUN 0 tag 0x000d RTR: SYNCHRONIZE CACHE(10). CDB: 35 0 0 0 0 0 0 0 0 0  
(91807313 ms)
LUN 0 tag 0x0011 BLOCKED: READ CAPACITY(10). CDB: 25 0 0 0 0 0 0 0 0 0  
(35411180 ms)
LUN 0 tag 0x0012 BLOCKED: READ CAPACITY(10). CDB: 25 0 0 0 0 0 0 0 0 0  
(35228663 ms)
LUN 0 tag 0x0013 BLOCKED: START STOP UNIT. CDB: 1b 0 0 0 1 0  (34721150 ms)
OOA queues dump done

And if I'm reading this right the other commands blocked because of the sync 
cache.

I will try now to set realsync to off before enabling the ports and retest.

Thanks for the suggestions!


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: CAM Target Layer, Linux and camcontrol readcap

2012-09-26 Thread Chuck Tuffli
On Wed, Sep 26, 2012 at 9:02 AM, Nikolay Denev nde...@gmail.com wrote:
...
 I've did ctladm dumooa :

 Dumping OOA queues
 LUN 0 tag 0x000d RTR: SYNCHRONIZE CACHE(10). CDB: 35 0 0 0 0 0 0 0 0 0
 (91807313 ms)
 LUN 0 tag 0x0011 BLOCKED: READ CAPACITY(10). CDB: 25 0 0 0 0 0 0 0 0 0
 (35411180 ms)
 LUN 0 tag 0x0012 BLOCKED: READ CAPACITY(10). CDB: 25 0 0 0 0 0 0 0 0 0
 (35228663 ms)
 LUN 0 tag 0x0013 BLOCKED: START STOP UNIT. CDB: 1b 0 0 0 1 0  (34721150 ms)
 OOA queues dump done


 And if I'm reading this right the other commands blocked because of the sync
 cache.

 I will try now to set realsync to off before enabling the ports and retest.

Yup, this is the symptom you will see unless realsync is off (ie. GEOM
is barfing on the sync and this blocks subsequent commands from
completing).

---chuck
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: CAM Target Layer, Linux and camcontrol readcap

2012-09-26 Thread Nikolay Denev
On 26.09.2012, at 19:08, Chuck Tuffli ctuf...@gmail.com wrote:

 On Wed, Sep 26, 2012 at 9:02 AM, Nikolay Denev nde...@gmail.com wrote:
 ...
 I've did ctladm dumooa :

 Dumping OOA queues
 LUN 0 tag 0x000d RTR: SYNCHRONIZE CACHE(10). CDB: 35 0 0 0 0 0 0 0 0 0
 (91807313 ms)
 LUN 0 tag 0x0011 BLOCKED: READ CAPACITY(10). CDB: 25 0 0 0 0 0 0 0 0 0
 (35411180 ms)
 LUN 0 tag 0x0012 BLOCKED: READ CAPACITY(10). CDB: 25 0 0 0 0 0 0 0 0 0
 (35228663 ms)
 LUN 0 tag 0x0013 BLOCKED: START STOP UNIT. CDB: 1b 0 0 0 1 0  (34721150 ms)
 OOA queues dump done


 And if I'm reading this right the other commands blocked because of the sync
 cache.

 I will try now to set realsync to off before enabling the ports and retest.

 Yup, this is the symptom you will see unless realsync is off (ie. GEOM
 is barfing on the sync and this blocks subsequent commands from
 completing).

 ---chuck

Any idea if the machine can recover from this state without rebooting?
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: CAM Target Layer, Linux and camcontrol readcap

2012-09-26 Thread Chuck Tuffli
On Wed, Sep 26, 2012 at 9:54 AM, Nikolay Denev nde...@gmail.com wrote:
 On 26.09.2012, at 19:08, Chuck Tuffli ctuf...@gmail.com wrote:
...
 Yup, this is the symptom you will see unless realsync is off (ie. GEOM
 is barfing on the sync and this blocks subsequent commands from
 completing).

 ---chuck

 Any idea if the machine can recover from this state without rebooting?

I haven't found a way other than rebooting. In fact, if you have
INVARIANTS enabled in your kernel, you will hit an assert in GEOM and
panic.

---chuck
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: CAM Target Layer, Linux and camcontrol readcap

2012-09-26 Thread Nikolay Denev

On Sep 26, 2012, at 8:00 PM, Chuck Tuffli ctuf...@gmail.com wrote:

 On Wed, Sep 26, 2012 at 9:54 AM, Nikolay Denev nde...@gmail.com wrote:
 On 26.09.2012, at 19:08, Chuck Tuffli ctuf...@gmail.com wrote:
 ...
 Yup, this is the symptom you will see unless realsync is off (ie. GEOM
 is barfing on the sync and this blocks subsequent commands from
 completing).
 
 ---chuck
 
 Any idea if the machine can recover from this state without rebooting?
 
 I haven't found a way other than rebooting. In fact, if you have
 INVARIANTS enabled in your kernel, you will hit an assert in GEOM and
 panic.
 
 ---chuck

I see.

Thanks for your help Chuck!


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org