I finally managed to resolve this. I received some useful info from Richard 
Elling (without List CC): 

>> (ME) However I sill think, also the plain IDE driver needs a timeout to 
>> hande disk failures, cause cables etc can fail.

>(Richard) Yes, this is a little bit odd.  The sd driver should be in the stack 
>above
the IDE driver and the sd driver tends to manage timeouts as well.
Could you send the "prtconf -D" output? 

>> (ME) prtconf -D::
>>
>> System Configuration:  Sun Microsystems  i86pc
>> Memory size: 8191 Megabytes
>> System Peripherals (Software Nodes):
>>
>> i86pc (driver name: rootnex)
>>   scsi_vhci, instance #0 (driver name: scsi_vhci)
>>   isa, instance #0 (driver name: isa)
>>       asy, instance #0 (driver name: asy)
>>       lp, instance #0 (driver name: ecpp)
>>       i8042, instance #0 (driver name: i8042)
>>           keyboard, instance #0 (driver name: kb8042)
>>       motherboard
>>       pit_beep, instance #0 (driver name: pit_beep)
>>   pci, instance #0 (driver name: npe)
>>       pci1002,5957
>>       pci1002,5978, instance #0 (driver name: pcie_pci)
>>           display, instance #1 (driver name: vgatext)
>>       pci1002,597b, instance #1 (driver name: pcie_pci)
>>           pci8086,1083, instance #1 (driver name: e1000g)
>>       pci1002,597c, instance #2 (driver name: pcie_pci)
>>           pci8086,1083, instance #2 (driver name: e1000g)
>>       pci1002,597f, instance #3 (driver name: pcie_pci)
>>           pci1458,e000 (driver name: gani)
>>       pci-ide, instance #3 (driver name: pci-ide)
>>           ide, instance #6 (driver name: ata)
>>               cmdk, instance #1 (driver name: cmdk)

> (Richard) Here is where you see the driver stack. Inverted it looks like:

>    cmdk
>    ata
>    pci-ide
>    npe

>I/O from the file system will go directly to the cmdk driver.
>I'm not familiar with that driver, mostly because if you change
>to AHCI, then you will see something more like:
>
>    sd
>    ahci
>    pci-ide
>    npe

>The important detail to remember is that ZFS does not have any
>timeouts.  It will patiently wait for a response from cmdk or sd.
>The cmdk and sd drivers manage timeouts and retries farther
>down the stack.

> For sd, I know that disk selection errors are propagated quickly. 

>> (ME) Is it possible to set AHCI without reinstalling OSol ?

> (Richard) Yes. But you might need to re-import the non-syspool pools 
> manually. 

---- 
OK so I wanted to switch from IDE to AHCI while keeping my Installation and 
Test again. When setting the Mode for my IDE devices to AHCI in the BIOS, the 
machine paniced with "Error could not import root volume: error 19" in Grub. So 
the machine could not boot. Afer some googeling I found: 


http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6795637 (     
Implement full root disk portability, boot must not be dependent on install 
hardware config) 

and 

http://defect.opensolaris.org/bz/show_bug.cgi?id=5785 (guide how to change the 
boot device for P2V, which is actually similar)

So I did as described in the guide. Maybe this is of use for someone else 
finding this. 
---
Overview: My Storage Server's disk mode was set to IDE (If your server is set 
to SATA or IDE can be tested with cfgadm. If the devices are not shown, you are 
in IDE mode). To Enable AHCI / SATA mode for you drives, you have to go to the 
BIOS and set the mode to AHCI. However, after you have done this  - your 
machine will (may?) not boot anymore. You will get a Panic after GRUB saying 
"cannot mount rootfs" (actally this screen is there only VERY SHORT. To 
actually see it add "-k -v" to the grub boot options and you will fall into the 
debugger to read the message)

IDE MODE:

    * NO hot plug
    * The system hangs 100% if a cache or non device is removed (see thread 
above)
    * NO NCQ available

AHCI Mode:

    * Full support for NCQ (?)
    * Full support for Hot Plug (devices shown via cfgadm as sata/X:disk) 

To switch from IDE mode to AHCI for a running installation of NexentaStor I did 
the following:

    * Create a checkpoint just to be sure
    * node (write down) which checkpoint is the safety checkpoint you just 
created
    * note (write down) which checkpoint is currently bootet
    * export your data volumes
    * reboot
    * Enter BIOS and set mode to AHCI
    * Boot rescue CD (USB CDROM not working, must be IDE, PXE maybe later added)
    * In the Rescue CD do (login root / passwd empty): 
          o mkdir -mnt
          o zfs import -f syspool
          o mount -F zfs syspool/rootfs-nms-XXX /mnt (this is the active 
snapshot / clone you are booting normally, not the rescue checkpoint you 
created)
          o mv /mnt/etc/path_to_inst /mnt/etc/path_to_inst.ORG
          o touch /mnt/etc/path_to_inst
          o devfsadm -C -r /mnt
          o devfsadm -c disk -r /mnt
          o devfsadm -i e1000g -r /mnt
          o cp -a /mnt/etc/zfs/zpool.cache /mnt/etc/zfs/zpool.cache.ORG
          o cp -a /etc/zfs/zpool.cache /mnt/etc/zfs/zpool.cache
          o touch /mnt/reconfigure
          o bootadm update-archive -v -R /mnt
          o umount /mnt
          o sync
          o reboot
    * Now your system should come up.
    * Verify that your SATA drives can be seen with cfgadm (cfgadm should list 
sata/X:disk)

After doing this I tested the PULL of the power cable of the L2ARC SSD again. 
No hang here and the device was detected as failed "immediatly". I could also 
reenable the device by removing it, adding the power again, "cfgadm -c 
configure devicename" and zpool grow
--

Consclusion (this is build 104): 
- do not use IDE :)
- ZFS does not have timeouts for commands but relies on the hardware layer (as 
with cache flush command) 
- switching from IDE 2 AHCI requires some manual steps


Regards, 
Robert 

p.s. Thanks Richard for the tips.
-- 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to