[zfs-discuss] Attempting to delete clone locks pool

2009-11-11 Thread Ian Collins
I've just managed to lock up a pool on a Solaris 10 update 7 system 
(even creating files in the pool hangs and can't be killed) by 
attempting to delete a clone.


Has anyone seen anything like this?

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ls -l hang, process unkillable

2009-11-11 Thread roland
hello, 

one of my colleague has a problem with an application. the sysadmins, 
responsible for that server told him that it was the applications fault, but i 
think they are wrong, and so does he.

from time to time, the app gets unkillable and when trying to list the contents 
of some dir which is being read/written by the app, ls can list the contents, 
but ls -l gets stuck and simply hangs. cp,rm,mv on some file in that dir 
doesn`t work either.

i think this is a solaris/kernel/zfs problem.

can somebody give a hint how to analyze/fix this? 

i can provide more input on request.

regards
roland
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ls -l hang, process unkillable

2009-11-11 Thread Victor Latushkin

roland wrote:
hello, 


one of my colleague has a problem with an application. the sysadmins, 
responsible for that server told him that it was the applications fault, but i 
think they are wrong, and so does he.

from time to time, the app gets unkillable and when trying to list the contents of some dir which 
is being read/written by the app, ls can list the contents, but ls -l gets 
stuck and simply hangs. cp,rm,mv on some file in that dir doesn`t work either.

i think this is a solaris/kernel/zfs problem.

can somebody give a hint how to analyze/fix this? 


Start with something like:

pgrep ls

pstack PID of ls

echo 0tPID of ls::pid2proc|::walk thread|::findstack -v | mdb -k

victor
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] marvell88sx2 driver build126

2009-11-11 Thread Orvar Korvar
Other drivers in the stack? Which drivers? And have anyone of them been changed 
between b125 and b126?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs inotify?

2009-11-11 Thread James Andrewartha

Carson Gaspar wrote:

On 10/26/09 5:33 PM, p...@paularcher.org wrote:
I can't find much on gam_server on Solaris (couldn't find too much 
on it
at all, really), and port_create is apparently a system call. (I'm 
not a

developer--if I can't write it in BASH, Perl, or Ruby, I can't write
it.)
I appreciate the suggestions, but I need something a little more
pret-a-porte.


Your Google-fu needs work ;-)

Main Gamin page: http://www.gnome.org/~veillard/gamin/index.html


Actually, I found this page, which has this gem: At this point Gamin is
fairly tied to Linux, portability is not a primary goal at this stage but
if you have portability patches they are welcome.


Much has changed since that text was written, including support for the 
event completion framework (port_create() and friends, introduced with 
Sol 10) on Solaris, thus the recommendation for gam_server / gamin.


$ nm /usr/lib/gam_server | grep port_create
[458]   | 134589544| 0|FUNC |GLOB |0|UNDEF  |port_create


The patch for port_create has never gone upstream however, while gvfs uses 
glib's gio, which has backends for inotify, solaris, fam and win32.


--
James Andrewartha
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] High load when 'zfs send' to the file

2009-11-11 Thread Jan Hlodan

Hello,
when I run 'zfs send' into the file, system (Ultra Sparc 45) had this load:
# zfs send -R backup/zo...@moving_09112009  
/tank/archive_snapshots/exa_all_zones_09112009.snap


Total: 107 processes, 951 lwps, load averages: 54.95, 59.46, 50.25

Is it normal?
Regards,

Jan Hlodan


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] marvell88sx2 driver build126

2009-11-11 Thread rwalists

On Nov 11, 2009, at 12:01 AM, Tim Cook wrote:


On Tue, Nov 10, 2009 at 5:15 PM, Tim Cook t...@cook.ms wrote:

  One thing I'm
noticing is a lot of checksum errors being generated during the  
resilver.

Is this normal?


Anyone?  It's up to 7.35M checksum errors and it's rebuilding  
extremely
slowly (as evidenced by the 10 hour time).  The errors are only  
showing on

the replacing-9 line, not the individual drive.


I've only replaced a drive once, but it didn't show any checksum  
errors during the resilver.  This was a 2 TB WD Green drive in a  
mirror pool that had started to show write errors.  It was attached to  
a SuperMicro AOC-SAT2-MV8.


Good luck,
Ware
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ls -l hang, process unkillable

2009-11-11 Thread roland
thanks.

we will try that if the error happens again - needed to reboot as a quick-fix, 
as the machine is in production

regards
roland
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] marvell88sx2 driver build126

2009-11-11 Thread Tim Cook
On Wed, Nov 11, 2009 at 3:38 AM, Orvar Korvar 
knatte_fnatte_tja...@yahoo.com wrote:

 Other drivers in the stack? Which drivers? And have anyone of them been
 changed between b125 and b126?



Looks like the sd drive for one.

http://dlc.sun.com/osol/on/downloads/b126/on-changelog-b126.html

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs eradication

2009-11-11 Thread Brian Kolaci

Thanks all,

It was a government customer that I was talking too and it sounded like a good 
idea, however with the certification paper trails required today, I don't think 
it would be of such a benefit after all.  It may be useful on the disk 
evacuation, but they're still going to need their paper trail with a 
certification that it was done and confirmed.

David Magda wrote:

On Nov 10, 2009, at 20:55, Mark A. Carlson wrote:


Typically this is called Sanitization and could be done as part of
an evacuation of data from the disk in preparation for removal.

You would want to specify the patterns to write and the number of
passes.


See also remanence:

http://en.wikipedia.org/wiki/Data_remanence

(S)ATA actually has a protocol command (secure erase) that will cause 
the disk to over write all of its sectors, and not be usable until its 
done. This doesn't exist in SCSI / SAS / FC as far as I know.


Generally speaking one over write is sufficient to prevent data from 
being accessible, but various government standards specify anywhere 
between one and four passes:


http://en.wikipedia.org/wiki/Data_erasure

Degaussing or complete destruction is usually necessary for the top 
secret stuff. DBAN is a useful (open-source) utility that I tend to 
recommend for regular folk:


http://www.dban.org/

While it could be useful, there are penalties in various jurisdictions 
for leaking data (especially with government-related stuff), so I'm not 
sure if Sun would want to potentially expose itself to inappropriate use 
that doesn't clean everything properly.


With ZFS encryption coming up, it could be sufficient to have your data 
sets encrypted and then simply forget the key. The data is still 
technically there, but (theoretically) completely inaccessible.




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs eradication

2009-11-11 Thread Darren J Moffat

Brian Kolaci wrote:

Hi,

I was discussing the common practice of disk eradication used by many 
firms for security.  I was thinking this may be a useful feature of ZFS 
to have an option to eradicate data as its removed, meaning after the 
last reference/snapshot is done and a block is freed, then write the 
eradication patterns back to the removed blocks.


By any chance, has this been discussed or considered before?


Yes it has been discussed here before.

It is one of the things I want to look at after ZFS Crypto and block 
pointer rewritter have integrated.


Also in some juristictions if the data was always encrypted on disk then 
you don't need to write any patterns to erase the blocks.  So ZFS Crypto 
can help there.


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on JBOD storage, mpt driver issue - server not responding

2009-11-11 Thread Markus Kovero
Hi, you could try LSI itmpt driver as well, it seems to handle this better, 
although I think it only supports 8 devices at once or so.

You could also try more recent version of opensolaris (123 or even 126), as 
there seems to be a lot fixes regarding mpt-driver (which still seems to have 
issues).

Yours
Markus Kovero

-Original Message-
From: zfs-discuss-boun...@opensolaris.org 
[mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of M P
Sent: 11. marraskuuta 2009 18:08
To: zfs-discuss@opensolaris.org
Subject: [zfs-discuss] ZFS on JBOD storage, mpt driver issue - server not 
responding

Server using [b]Sun StorageTek 8-port external SAS PCIe HBA [/b](mpt driver) 
connected to external JBOD array with 12 disks. 

Here is link to the exact SAS (Sun) adapter: 
http://www.sun.com/storage/storage_networking/hba/sas/PCIe.pdf  (LSI SAS3801)

When running IO intensive operations (zpool scrub) for couple of hours, the 
server locks with the following repeating messages:

Nov 10 16:31:45 sunserver scsi: [ID 365881 kern.info] 
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Nov 10 16:31:45 sunserver   Log info 0x3114 received for target 17.
Nov 10 16:31:45 sunserver   scsi_status=0x0, ioc_status=0x8048, 
scsi_state=0xc
Nov 10 16:32:55 sunserver scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Nov 10 16:32:55 sunserver   Disconnected command timeout for Target 19
Nov 10 16:32:56 sunserver scsi: [ID 365881 kern.info] 
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Nov 10 16:32:56 sunserver   Log info 0x3114 received for target 19.
Nov 10 16:32:56 sunserver   scsi_status=0x0, ioc_status=0x8048, 
scsi_state=0xc
Nov 10 16:34:16 sunserver scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Nov 10 16:34:16 sunserver   Disconnected command timeout for Target 21

I tested this on two servers:
- [b]Sun Fire X2200[/b] using [b]Sun Storage J4200 JBOD[/b] array and
- [b]Dell R410 Server[/b] with [b]Promise VTJ-310SS JBOD array[/b] 

They both are showing the same repeating messages and locking after couple of 
hours of zpool scrub.

Solaris appears to be more stable (than OpenSolaris) - it doesn't lock when 
scrubbing, but still locks after 5-6 hours reading from the JBOD array - 10TB 
size.

So at this point this looks like an issue with the MPT driver or these SAS 
cards (I tested two) when under heavy load. I put the latest firmware for the 
SAS card from LSI's web site - v1.29.00 without any changes, server still locks.

Any ideas, suggestions how to fix or workaround this issue? The adapter is 
suppose to be enterprise-class.

Here is more detailed log info:

Sun Fire X2200 and Sun Storage J4200 JBOD array

SAS card: Sun StorageTek 8-port external SAS PCIe HBA

http://www.sun.com/storage/storage_networking/hba/sas/PCIe.pdf  (LSI SAS3801)

Operation System: SunOS sunserver 5.11 snv_111b i86pc i386 i86pc Solaris

Nov 10 16:30:33 sunserver scsi: [ID 365881 kern.info] 
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Nov 10 16:30:33 sunserver   Log info 0x3114 received for target 0.
Nov 10 16:30:33 sunserver   scsi_status=0x0, ioc_status=0x8048, 
scsi_state=0xc
Nov 10 16:31:43 sunserver scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Nov 10 16:31:43 sunserver   Disconnected command timeout for Target 17
Nov 10 16:32:55 sunserver scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Nov 10 16:32:55 sunserver   Disconnected command timeout for Target 19
Nov 10 16:32:56 sunserver scsi: [ID 365881 kern.info] 
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Nov 10 16:32:56 sunserver   Log info 0x3114 received for target 19.
Nov 10 16:32:56 sunserver   scsi_status=0x0, ioc_status=0x8048, 
scsi_state=0xc
Nov 10 16:34:16 sunserver scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Nov 10 16:34:16 sunserver   Disconnected command timeout for Target 21


Dell R410 Server and Promise VTJ-310SS JBOD array

SAS card: Sun StorageTek 8-port external SAS PCIe HBA

Operating System: SunOS dellserver 5.10 Generic_141445-09 i86pc i386 i86pc

Nov 11 00:18:22 dellserver scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci8086,3...@3/pci1028,1...@0 (mpt0):
Nov 11 00:18:22 dellserver Disconnected command timeout for Target 0
Nov 11 00:18:22 dellserver scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci8086,3...@3/pci1028,1...@0/s...@0,0 (sd13):
Nov 11 00:18:22 dellserver Error for Command: read(10)
Error Level: Retryable
Nov 11 00:18:22 dellserver scsi: [ID 107833 kern.notice]   Requested Block: 
276886498 Error Block: 276886498
Nov 11 00:18:22 dellserver scsi: [ID 107833 kern.notice]   Vendor: Dell 
  Serial Number: Dell Interna

Re: [zfs-discuss] zfs eradication

2009-11-11 Thread Cindy Swearingen

This feature is described in this RFE:

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=4930014
Secure delete option: erase blocks after they're freed

cs

On 11/11/09 09:17, Darren J Moffat wrote:

Brian Kolaci wrote:

Hi,

I was discussing the common practice of disk eradication used by many 
firms for security.  I was thinking this may be a useful feature of 
ZFS to have an option to eradicate data as its removed, meaning after 
the last reference/snapshot is done and a block is freed, then write 
the eradication patterns back to the removed blocks.


By any chance, has this been discussed or considered before?


Yes it has been discussed here before.

It is one of the things I want to look at after ZFS Crypto and block 
pointer rewritter have integrated.


Also in some juristictions if the data was always encrypted on disk then 
you don't need to write any patterns to erase the blocks.  So ZFS Crypto 
can help there.



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] marvell88sx2 driver build126

2009-11-11 Thread Eric C. Taylor
The checksum errors are fixed in build 128 with:

6807339 spurious checksum errors when replacing a vdev

No; you're not losing any data due to this.

-  Eric
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Odd sparing problem

2009-11-11 Thread Cindy Swearingen

Hi Tim,

I always have to detach the spare.

I haven't tested it yet, but I see an improvement in this behavior,
with the integration of this CR:

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6893090
clearing a vdev should automatically detach spare

Cindy

On 11/10/09 16:03, Tim Cook wrote:



On Tue, Nov 10, 2009 at 4:38 PM, Cindy Swearingen 
cindy.swearin...@sun.com mailto:cindy.swearin...@sun.com wrote:


Hi Tim,

I'm not sure I understand this output completely, but have you
tried detaching the spare?

Cindy


Hey Cindy,

Detaching did in fact solve the issue.  During my previous issues when 
the spare kicked in, it actually automatically detached itself once I 
replaced the failed drive, so I didn't understand what was going on this 
time around.


Thanks!
--Tim

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zpool hosed during testing

2009-11-11 Thread Mark J Musante


On 10 Nov, 2009, at 21.02, Ron Mexico wrote:

This didn't occur on a production server, but I thought I'd post  
this anyway because it might be interesting.


This is CR 6895446 and a fix for it should be going into build 129.


Regards,
markm


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] libzfs zfs_create() fails on sun4u daily bits (daily.1110)

2009-11-11 Thread Jordan Vaughan
I encountered a strange libzfs behavior while testing a zone fix and 
want to make sure that I found a genuine bug.  I'm creating zones whose 
zonepaths reside in ZFS datasets (i.e., the parent directories of the 
zones' zonepaths are ZFS datasets).  In this scenario, zoneadm(1M) 
attempts to create ZFS datasets for zonepaths.  zoneadm(1M) has done 
this for a long time (since zones started working on ZFS?) and worked 
fine until recently.  Now I'm seeing zoneadm(1M) fail to create ZFS 
datasets for new zones while running sparc bits from the daily.1110 ONNV 
nightly build:


8
root krb-v210-4 [20:05:57 0]# zoneadm list -cv
  ID NAME STATUS PATH   BRAND 
  IP
   0 global   running/  native 
  shared
   - godelinstalled  /export/godel  native 
  shared
   - turing   configured /export/turing native 
  shared


root krb-v210-4 [20:05:58 0]# zonecfg -z turing info
zonename: turing
zonepath: /export/turing
brand: native
autoboot: false
bootargs:
pool:
limitpriv:
scheduling-class:
ip-type: shared
hostid: 900d833f
inherit-pkg-dir:
dir: /lib
inherit-pkg-dir:
dir: /platform
inherit-pkg-dir:
dir: /sbin
inherit-pkg-dir:
dir: /usr

root krb-v210-4 [20:06:07 0]# zfs list
NAME USED  AVAIL  REFER  MOUNTPOINT
rpool   17.2G  16.0G66K  /rpool
rpool/ROOT  8.51G  16.0G21K  legacy
rpool/ROOT/snv_126  8.51G  16.0G  8.51G  /
rpool/dump  4.00G  16.0G  4.00G  -
rpool/export 688M  16.0G   688M  /export
rpool/export/home 21K  16.0G21K  /export/home
rpool/swap 4G  20.0G  5.60M  -
zonepool9.28M  33.2G28K  /zonepool
zonepool/zones21K  33.2G21K  /zonepool/zones

root krb-v210-4 [20:06:11 0]# uname -a
SunOS krb-v210-4 5.11 onnv-gate:2009-11-10 sun4u sparc SUNW,Sun-Fire-V210

root krb-v210-4 [20:06:30 0]# zpool list
NAME   SIZE  ALLOC   FREECAP  DEDUP  HEALTH  ALTROOT
rpool 33.8G  13.2G  20.6G39%  1.00x  ONLINE  -
zonepool  33.8G  9.43M  33.7G 0%  1.00x  ONLINE  -

root krb-v210-4 [20:06:46 0]# zpool get all rpool
NAME   PROPERTY   VALUE   SOURCE
rpool  size   33.8G   -
rpool  capacity   39% -
rpool  altroot-   default
rpool  health ONLINE  -
rpool  guid   12880404862636496284  local
rpool  version19  local
rpool  bootfs rpool/ROOT/snv_126  local
rpool  delegation on  default
rpool  autoreplaceoff default
rpool  cachefile  -   default
rpool  failmode   continuelocal
rpool  listsnapshots  off default
rpool  autoexpand off default
rpool  dedupratio 1.00x   -
rpool  free   20.6G   -
rpool  allocated  13.2G   -

root krb-v210-4 [20:06:51 0]# zfs get all rpool/export
NAME  PROPERTY  VALUE  SOURCE
rpool/export  type  filesystem -
rpool/export  creation  Sun Oct 25 19:43 2009  -
rpool/export  used  688M   -
rpool/export  available 16.0G  -
rpool/export  referenced688M   -
rpool/export  compressratio 1.00x  -
rpool/export  mounted   yes-
rpool/export  quota none   default
rpool/export  reservation   none   default
rpool/export  recordsize128K   default
rpool/export  mountpoint/exportlocal
rpool/export  sharenfs  offdefault
rpool/export  checksum  on default
rpool/export  compression   offdefault
rpool/export  atime on default
rpool/export  devices   on default
rpool/export  exec  on default
rpool/export  setuidon default
rpool/export  readonly  offdefault
rpool/export  zoned offdefault
rpool/export  snapdir   hidden default
rpool/export  aclmode   groupmask  default
rpool/export  aclinheritrestricted default
rpool/export  canmount  on default
rpool/export  shareiscsioffdefault
rpool/export  xattr on default
rpool/export  copies1  default
rpool/export  version   4  -
rpool/export  utf8only  

Re: [zfs-discuss] Fwd: [ilugb] Does ZFS support Hole Punching/Discard

2009-11-11 Thread Bob Friesenhahn

On Tue, 10 Nov 2009, Tim Cook wrote:


My personal thought would be that it doesn't really make sense to 
even have it, at least for readzilla.  In theory, you always want 
the SSD to be full, or nearly full, as it's a cache.  The whole 
point of TRIM, from my understanding, is to speed up the drive by 
zeroing out unused blocks so they next time you try to write to 
them, they don't have to be cleared, then written to.  When dealing 
with a cache, there shouldn't (again in theory) be any free blocks, 
a warmed cache should be full of data.


This thought is wrong because SSDs actually have many more blocks that 
they don't admit to in their declared size.  The extreme or 
enterprise units will have more extra blocks.  These extra blocks 
are necessarily in order to replace failing blocks, and to spread the 
write load over many more underlying blocks, and thereby decrease the 
chance of failure.  If a FLASH block is to be overwritten, then the 
device can reassign the old FLASH block to the spare pool, and update 
its tables so that a different FLASH block (from the spare pool) is 
used for the write.


Logzilla is kind of in the same boat, it should constantly be 
filling and emptying as new data comes in.  I'd imagine the TRIM 
would just add unnecessary overhead.  It could in theory help there 
by zeroing out blocks ahead of time before a new batch of writes 
come in if you have a period of little I/O.  My thought is it would 
be far more work than it's worth, but I'll let the coders decide 
that one.


The problem with TRIM is that its goal is to decrease write latency 
at low/medium writing loads, or at high load for a short duration.  It 
does not do anything to increase maximum sustained write performance 
since the maximum write performance then depends on how fast the 
device can erase blocks.  Some server environments will write to the 
device at close to 100% most of the time, and especially for 
relatively slow devices like the X25-E.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Fwd: [ilugb] Does ZFS support Hole Punching/Discard

2009-11-11 Thread Tim Cook
On Wed, Nov 11, 2009 at 11:51 AM, Bob Friesenhahn 
bfrie...@simple.dallas.tx.us wrote:

 On Tue, 10 Nov 2009, Tim Cook wrote:


 My personal thought would be that it doesn't really make sense to even
 have it, at least for readzilla.  In theory, you always want the SSD to be
 full, or nearly full, as it's a cache.  The whole point of TRIM, from my
 understanding, is to speed up the drive by zeroing out unused blocks so they
 next time you try to write to them, they don't have to be cleared, then
 written to.  When dealing with a cache, there shouldn't (again in theory) be
 any free blocks, a warmed cache should be full of data.


 This thought is wrong because SSDs actually have many more blocks that they
 don't admit to in their declared size.  The extreme or enterprise units
 will have more extra blocks.  These extra blocks are necessarily in order to
 replace failing blocks, and to spread the write load over many more
 underlying blocks, and thereby decrease the chance of failure.  If a FLASH
 block is to be overwritten, then the device can reassign the old FLASH block
 to the spare pool, and update its tables so that a different FLASH block
 (from the spare pool) is used for the write.


I'm well aware of the fact that SSD mfg's put extra blocks into the device
to increase both performance and MTBF.  I'm not sure how that invalidates
what I've said though, or even plays a roll, and you haven't done a very
good job of explaining why you think I'm wrong.  TRIM is simply letting the
device know that a block has been deleted from the OS perspective.  In a
caching scenario, you aren't deleting anything, you're continually
over-writing.  How exactly do you foresee TRIM being useful when the command
wouldn't even be invoked?






  Logzilla is kind of in the same boat, it should constantly be filling and
 emptying as new data comes in.  I'd imagine the TRIM would just add
 unnecessary overhead.  It could in theory help there by zeroing out blocks
 ahead of time before a new batch of writes come in if you have a period of
 little I/O.  My thought is it would be far more work than it's worth, but
 I'll let the coders decide that one.


 The problem with TRIM is that its goal is to decrease write latency at
 low/medium writing loads, or at high load for a short duration.  It does not
 do anything to increase maximum sustained write performance since the
 maximum write performance then depends on how fast the device can erase
 blocks.  Some server environments will write to the device at close to 100%
 most of the time, and especially for relatively slow devices like the X25-E.


Right... you just repeated what I said with different wording.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Manual drive failure?

2009-11-11 Thread Tim Cook
So, I've done a bit of research and RTFM, and haven't found an answer.  If
I've missed something obvious, please point me in the right direction.

Is there a way to manually fail a drive via ZFS?  (this is a raid-z2
raidset)  In my case, I'm pre-emptively replacing old drives with newer,
faster, larger drives.  So far, I've only been able to come up with two
solutions to the issue, neither of which is very graceful.

The first option is to simply yank the old drive out of the chassis.  I
could go on at-length about why I dislike doing that, but I think it's safe
to say everyone agrees this isn't a good option.

The second option is to export the zpool, then I can cfgadm  -c disconnect
the drive, and finally gracefully pull it from the system.  Unfortunately,
this means my data has to go offline.  While that's not a big deal for a
home box, it is for something in the enterprise with uptime concerns.

From my experimentation, you can't disconnect or unconfigure a drive that is
part of a live zpool.  So, is there a way to tell zfs to pre-emptively fail
it so that you can use cfgadm to put the drive into a state for a graceful
hotswap?  Am I just missing something obvious?  Detach seems to only apply
to mirrors and hot spares.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs eradication

2009-11-11 Thread Joerg Moellenkamp
Hi,

Well ... i think Darren should implement this as a part of zfs-crypto. Secure 
Delete on SSD looks like quite challenge, when wear leveling and bad block 
relocation kicks in ;)

Regards
 Joerg

Am 11.11.2009 um 17:53 schrieb Cindy Swearingen:

 This feature is described in this RFE:
 
 http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=4930014
 Secure delete option: erase blocks after they're freed
 
 cs
 
 On 11/11/09 09:17, Darren J Moffat wrote:
 Brian Kolaci wrote:
 Hi,
 
 I was discussing the common practice of disk eradication used by many firms 
 for security.  I was thinking this may be a useful feature of ZFS to have 
 an option to eradicate data as its removed, meaning after the last 
 reference/snapshot is done and a block is freed, then write the eradication 
 patterns back to the removed blocks.
 
 By any chance, has this been discussed or considered before?
 Yes it has been discussed here before.
 It is one of the things I want to look at after ZFS Crypto and block pointer 
 rewritter have integrated.
 Also in some juristictions if the data was always encrypted on disk then you 
 don't need to write any patterns to erase the blocks.  So ZFS Crypto can 
 help there.
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Manual drive failure?

2009-11-11 Thread Ed Plese
Tim,

I think you're looking for zpool offline:

 zpool offline [-t] pool device ...

 Takes the specified physical device offline.  While  the
 device  is  offline, no attempt is made to read or write
 to the device.

 This command is not applicable to spares or  cache  dev-
 ices.

 -tTemporary. Upon  reboot,  the  specified  physical
   device reverts to its previous state.


Ed Plese

On Wed, Nov 11, 2009 at 12:15 PM, Tim Cook t...@cook.ms wrote:
 So, I've done a bit of research and RTFM, and haven't found an answer.  If
 I've missed something obvious, please point me in the right direction.

 Is there a way to manually fail a drive via ZFS?  (this is a raid-z2
 raidset)  In my case, I'm pre-emptively replacing old drives with newer,
 faster, larger drives.  So far, I've only been able to come up with two
 solutions to the issue, neither of which is very graceful.

 The first option is to simply yank the old drive out of the chassis.  I
 could go on at-length about why I dislike doing that, but I think it's safe
 to say everyone agrees this isn't a good option.

 The second option is to export the zpool, then I can cfgadm  -c disconnect
 the drive, and finally gracefully pull it from the system.  Unfortunately,
 this means my data has to go offline.  While that's not a big deal for a
 home box, it is for something in the enterprise with uptime concerns.

 From my experimentation, you can't disconnect or unconfigure a drive that is
 part of a live zpool.  So, is there a way to tell zfs to pre-emptively fail
 it so that you can use cfgadm to put the drive into a state for a graceful
 hotswap?  Am I just missing something obvious?  Detach seems to only apply
 to mirrors and hot spares.

 --Tim

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on JBOD storage, mpt driver issue - server not responding

2009-11-11 Thread Maurice Volaski
I've experienced behavior similar this several times, each time it 
was a single bad drive, in this case, looking like target 0. For 
whatever reason, buggy Solaris/mpt driver, some of the other drives 
get wind of it, then hide from their respective buses in fear. :-)




Operating System: SunOS dellserver 5.10 Generic_141445-09 i86pc i386 i86pc

Nov 11 00:18:22 dellserver scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci8086,3...@3/pci1028,1...@0 (mpt0):

Nov 11 00:18:22 dellserver Disconnected command timeout for Target 0
Nov 11 00:18:22 dellserver scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci8086,3...@3/pci1028,1...@0/s...@0,0 (sd13):
Nov 11 00:18:22 dellserver Error for Command: read(10) 
Error Level: Retryable
Nov 11 00:18:22 dellserver scsi: [ID 107833 kern.notice]   Requested 
Block: 276886498 Error Block: 276886498
Nov 11 00:18:22 dellserver scsi: [ID 107833 kern.notice]   Vendor: 
Dell   Serial Number: Dell Interna
Nov 11 00:18:22 dellserver scsi: [ID 107833 kern.notice]   Sense 
Key: Unit Attention
Nov 11 00:18:22 dellserver scsi: [ID 107833 kern.notice]   ASC: 0x29 
(power on, reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0
Nov 11 00:19:33 dellserver scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci8086,3...@3/pci1028,1...@0 (mpt0):

Nov 11 00:19:33 dellserver Disconnected command timeout for Target 0
Nov 11 00:19:34 dellserver scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci8086,3...@3/pci1028,1...@0/s...@0,0 (sd13):
Nov 11 00:19:34 dellserver SCSI transport failed: reason 
'reset': retrying command
Nov 11 00:20:44 dellserver scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci8086,3...@3/pci1028,1...@0 (mpt0):

Nov 11 00:20:44 dellserver Disconnected command timeout for Target 0
--


--

Maurice Volaski, maurice.vola...@einstein.yu.edu
Computing Support, Rose F. Kennedy Center
Albert Einstein College of Medicine of Yeshiva University
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs eradication

2009-11-11 Thread Darren J Moffat

Joerg Moellenkamp wrote:

Hi,

Well ... i think Darren should implement this as a part of zfs-crypto. Secure 
Delete on SSD looks like quite challenge, when wear leveling and bad block 
relocation kicks in ;)


No I won't be doing that as part of the zfs-crypto project. As I said 
some jurisdictions are happy that if the data is encrypted then 
overwrite of the blocks isn't required.   For those that aren't use 
dd(1M) or format(1M) may be sufficient - if that isn't then nothing 
short of physical destruction is likely good enough.


If we choose to add block erasure on delete to ZFS then it is a 
completely separate and complementary feature to encryption.


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Fwd: [ilugb] Does ZFS support Hole Punching/Discard

2009-11-11 Thread Nicolas Williams
On Mon, Sep 07, 2009 at 09:58:19AM -0700, Richard Elling wrote:
 I only know of hole punching in the context of networking. ZFS doesn't
 do networking, so the pedantic answer is no.

But a VDEV may be an iSCSI device, thus there can be networking below
ZFS.

For some iSCSI targets (including ZVOL-based ones) a hole punchin
operation can be very useful since it explicitly tells the backend that
some contiguous block of space can be released for allocation to others.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Fwd: [ilugb] Does ZFS support Hole Punching/Discard

2009-11-11 Thread Bob Friesenhahn

On Wed, 11 Nov 2009, Tim Cook wrote:


I'm well aware of the fact that SSD mfg's put extra blocks into the 
device to increase both performance and MTBF.  I'm not sure how that 
invalidates what I've said though, or even plays a roll, and you 
haven't done a very good job of explaining why you think I'm wrong.  
TRIM is simply letting the device know that a block has been deleted 
from the OS perspective.  In a caching scenario, you aren't deleting 
anything, you're continually over-writing.  How exactly do you 
foresee TRIM being useful when the command wouldn't even be invoked?


The act of over-writing requires erasing.  If the cache is going to 
expire seldom-used data, it could potentially use TRIM to start 
erasing pages while the new data is being retrieved from primary 
storage.


Regardless, it seems that smarter FLASH storage device design 
eliminates most of the value offered by TRIM.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on JBOD storage, mpt driver issue - server not responding

2009-11-11 Thread M P
I already changed some of the drives, no difference. The target drive seem to 
have  random character - most likely not from the drives.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on JBOD storage, mpt driver issue - server not responding

2009-11-11 Thread Markus Kovero
Have you tried another SAS-cable?

Yours
Markus Kovero

-Original Message-
From: zfs-discuss-boun...@opensolaris.org 
[mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of M P
Sent: 11. marraskuuta 2009 21:05
To: zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] ZFS on JBOD storage, mpt driver issue - server not 
responding

I already changed some of the drives, no difference. The target drive seem to 
have  random character - most likely not from the drives.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [storage-discuss] ZFS on JBOD storage, mpt driver issue - server not responding

2009-11-11 Thread Marion Hakanson
m...@cybershade.us said:
 So at this point this looks like an issue with the MPT driver or these SAS
 cards (I tested two) when under heavy load. I put the latest firmware for the
 SAS card from LSI's web site - v1.29.00 without any changes, server still
 locks.
 
 Any ideas, suggestions how to fix or workaround this issue? The adapter is
 suppose to be enterprise-class. 

We have three of these HBA's, used as follows:
X4150, J4400, Solaris-10U7-x86, mpt patch 141737-01
V245, J4200, Solaris-10U7, mpt patch 141736-05
X4170, J4400, Solaris-10U8-x86, mpt/kernel patch 141445-09

None of these systems are suffering the issues you describe.  All of
their SAS HBA's are running the latest Sun-supported firmware I could
find for these HBA's, which is v1.26.03.00 (BIOS 6.24.00), in LSI firmware
update 14.2.2 at:
http://www.lsi.com/support/sun/

In that package is also a v1.27.03.00 firmware for use when connecting
to the F5100 flash accelerator, but it's clearly labelled as only for
use with that device.

Anyway, I of course don't know if you've already tried the v1.26.03.00
firmware in your situation, but I wanted to at least report that we
are using this combination on Solaris-10 without experiencing the
timeout issues you are having.

Regards,

Marion


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs eradication

2009-11-11 Thread Tim Cook
On Wed, Nov 11, 2009 at 12:29 PM, Darren J Moffat
darr...@opensolaris.orgwrote:

 Joerg Moellenkamp wrote:

 Hi,

 Well ... i think Darren should implement this as a part of zfs-crypto.
 Secure Delete on SSD looks like quite challenge, when wear leveling and bad
 block relocation kicks in ;)


 No I won't be doing that as part of the zfs-crypto project. As I said some
 jurisdictions are happy that if the data is encrypted then overwrite of the
 blocks isn't required.   For those that aren't use dd(1M) or format(1M) may
 be sufficient - if that isn't then nothing short of physical destruction is
 likely good enough.

 If we choose to add block erasure on delete to ZFS then it is a completely
 separate and complementary feature to encryption.

 --
 Darren J Moffat


http://www.dban.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs eradication

2009-11-11 Thread David Magda
On Wed, November 11, 2009 13:29, Darren J Moffat wrote:

 No I won't be doing that as part of the zfs-crypto project. As I said
 some jurisdictions are happy that if the data is encrypted then
 overwrite of the blocks isn't required.   For those that aren't use
 dd(1M) or format(1M) may be sufficient - if that isn't then nothing
 short of physical destruction is likely good enough.

Does anyone know if SATA's 'secure erase' command available on SSDs?


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] marvell88sx2 driver build126

2009-11-11 Thread Orvar Korvar
So he did actually hit a bug? But the bug is not dangerous as it doesnt destroy 
data?

But I did not replace any devices and still it showed checksum errors. I think 
I did a zfs send | zfs receive? I dont remember. But I just copied things back 
and forth, and the checksum errors showed up. So what does that mean?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs eradication

2009-11-11 Thread Bill Sommerfeld
On Wed, 2009-11-11 at 10:29 -0800, Darren J Moffat wrote:
 Joerg Moellenkamp wrote:
  Hi,
  
  Well ... i think Darren should implement this as a part of
 zfs-crypto. Secure Delete on SSD looks like quite challenge, when wear
 leveling and bad block relocation kicks in ;)
 
 No I won't be doing that as part of the zfs-crypto project. As I said 
 some jurisdictions are happy that if the data is encrypted then 
 overwrite of the blocks isn't required.   For those that aren't use 
 dd(1M) or format(1M) may be sufficient - if that isn't then nothing 
 short of physical destruction is likely good enough.

note that eradication via overwrite makes no sense if the underlying
storage uses copy-on-write, because there's no guarantee that the newly
written block actually will overlay the freed block.

IMHO the sweet spot here may be to overwrite once with zeros (allowing
the block to be compressed out of existance if the underlying storage is
a compressed zvol or equivalent) or to use the TRIM command.

(It may also be worthwhile for zvols exported via various protocols to
themselves implement the TRIM command -- freeing the underlying
storage).

- Bill

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs eradication

2009-11-11 Thread Darren J Moffat

Bill Sommerfeld wrote:

On Wed, 2009-11-11 at 10:29 -0800, Darren J Moffat wrote:

Joerg Moellenkamp wrote:

Hi,

Well ... i think Darren should implement this as a part of

zfs-crypto. Secure Delete on SSD looks like quite challenge, when wear
leveling and bad block relocation kicks in ;)

No I won't be doing that as part of the zfs-crypto project. As I said 
some jurisdictions are happy that if the data is encrypted then 
overwrite of the blocks isn't required.   For those that aren't use 
dd(1M) or format(1M) may be sufficient - if that isn't then nothing 
short of physical destruction is likely good enough.


note that eradication via overwrite makes no sense if the underlying
storage uses copy-on-write, because there's no guarantee that the newly
written block actually will overlay the freed block.


Which is why this has to be a ZFS feature rather than something link GNU 
shred(1) which runs in userland.



IMHO the sweet spot here may be to overwrite once with zeros (allowing
the block to be compressed out of existance if the underlying storage is
a compressed zvol or equivalent) or to use the TRIM command.


Exactly.

--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs eradication

2009-11-11 Thread Bob Friesenhahn

On Wed, 11 Nov 2009, Darren J Moffat wrote:


note that eradication via overwrite makes no sense if the underlying
storage uses copy-on-write, because there's no guarantee that the newly
written block actually will overlay the freed block.


Which is why this has to be a ZFS feature rather than something link GNU 
shred(1) which runs in userland.


Zfs is absolutely useless for this if the underlying storage uses 
copy-on-write.  Therefore, it is absolutely useless to put it in zfs. 
No one should even consider it.


The use of encrypted blocks is much better, even though encrypted 
blocks may be subject to freeze-spray attack if the whole computer is 
compromised while it is still running.  Otherwise use a sledge-hammer 
followed by incineration.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs eradication

2009-11-11 Thread David Magda

On Nov 11, 2009, at 17:40, Bob Friesenhahn wrote:

Zfs is absolutely useless for this if the underlying storage uses  
copy-on-write.  Therefore, it is absolutely useless to put it in  
zfs. No one should even consider it.


The use of encrypted blocks is much better, even though encrypted  
blocks may be subject to freeze-spray attack if the whole computer  
is compromised while it is still running.  Otherwise use a sledge- 
hammer followed by incineration.


There seem to be 'secure erase' methods available for some SSDs:

Zeus Solid State Drives are available with secure erase methods to  
support a wide variety of requirements. MilPurge provide secure  
erase procedure that comply with several agency guidelines,  
including: DoD 5220.22-M, NSA 130-2, AFSSI 5020, AR 380-19, and  
Navso 5239. Additional capabilities include Intelligent Destructive  
Purge where the flash media is physically damaged and rendered  
totally and irrevocably unusable.


http://www.stec-inc.com/products/zeus/

The Intel X25-M is reported to mark all cells as free / empty via  
ATA's secure erase:


http://ata.wiki.kernel.org/index.php/ATA_Secure_Erase

Marking them and actually resetting them are two different things  
though. Hopefully more products will actually do a proper reset / wipe  
in the future.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] raidz-1 vs mirror

2009-11-11 Thread Thomas Maier-Komor
Hi everybody,

I am considering moving my data pool from a two disk (10krpm) mirror
layout to a three disk raidz-1. This is just a single user workstation
environment, where I mostly perform compile jobs. From past experiences
with raid5 I am a little bit reluctant to do so, as software raid5 has a
major impact on write performance.

Is this similar with raidz-1 or does the zfs stack work around the
limitations that come with raid5 into play? How big would the penalty be?

As an alternative I could swap the drives for bigger ones - but these
would probably then be 7.2k rpm discs, because of costs.

Any experiences or thoughts?

TIA,
Thomas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs eradication

2009-11-11 Thread Darren J Moffat

Bob Friesenhahn wrote:

On Wed, 11 Nov 2009, Darren J Moffat wrote:


note that eradication via overwrite makes no sense if the underlying
storage uses copy-on-write, because there's no guarantee that the newly
written block actually will overlay the freed block.


Which is why this has to be a ZFS feature rather than something link 
GNU shred(1) which runs in userland.


Zfs is absolutely useless for this if the underlying storage uses 
copy-on-write.  Therefore, it is absolutely useless to put it in zfs. No 
one should even consider it.


I disagree.  Sure there are cases where ZFS which is copy-on-write is 
sitting ontop of something that is copy-on-write (iSCSI luns backed by 
ZFS on another system for example) that doesn't mean there are no cases 
where this is useful.   For example in an appliance or other controlled 
environment it isn't useless and it is being considered for example 
those use cases of ZFS.


If we introduce this for ZFS it would likely be a per dataset or pool 
property and it will be documented that the whole storage stack needs to 
be taken into consideration for it to determined if this is actually 
effective.


The use of encrypted blocks is much better, even though encrypted blocks 
may be subject to freeze-spray attack if the whole computer is 
compromised while it is still running.  Otherwise use a sledge-hammer 
followed by incineration.


Much better for jurisdictions that allow for that, but not all do.  I 
know of at least one that wants even ciphertext blocks to overwritten.




--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs eradication

2009-11-11 Thread Bob Friesenhahn

On Wed, 11 Nov 2009, Darren J Moffat wrote:


Zfs is absolutely useless for this if the underlying storage uses 
copy-on-write.  Therefore, it is absolutely useless to put it in 
zfs. No one should even consider it.


I disagree.  Sure there are cases where ZFS which is copy-on-write 
is sitting ontop of something that is copy-on-write (iSCSI luns 
backed by ZFS on another system for example) that doesn't mean there 
are no cases where this is useful.  For example in an appliance or 
other controlled environment it isn't useless and it is being 
considered for example those use cases of ZFS.


Perhaps in a product where the engineering team has examined every 
part of the product from top to bottom, it might have some use. 
Advertising that the feature exists might cause someone to believe 
that it actually works.  There are plenty of devices in common use 
which are always/sometimes COW.  These include SSDs and certain types 
of CD/DVD/WORM drives.  For SSDs, COW is commonly called wear 
leveling.  In hard-drives, we have what is known as bad block 
management.


Given that many storage products are made in places in China, it seems 
unlikely that any government is going to trust assurances from a 
product vendor that their product never leaves behind copies of data.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] raidz-1 vs mirror

2009-11-11 Thread Rob Logan

 from a two disk (10krpm) mirror layout to a three disk raidz-1. 

wrights will be unnoticeably slower for raidz1 because of parity calculation
and latency of a third spindle. but reads will be 1/2 the speed
of the mirror because it can split the reads between two disks.

another way to say the same thing:

a raidz will be the speed of the slowest disk in the array, while a
mirror will be x(Number of mirrors)  time faster for reads or
the the speed of the slowest disk for wrights.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] raidz-1 vs mirror

2009-11-11 Thread Bob Friesenhahn

On Wed, 11 Nov 2009, Rob Logan wrote:




from a two disk (10krpm) mirror layout to a three disk raidz-1.


wrights will be unnoticeably slower for raidz1 because of parity calculation
and latency of a third spindle. but reads will be 1/2 the speed
of the mirror because it can split the reads between two disks.


But with raidz1 a data block will be split (striped) across two disks. 
Doesn't that also speed up the reads (assuming that at least the zfs 
record size is requested)?


We were told that scheduled reads from mirrors is done using an 
algorithm which do not assure that sequential reads will be read from 
different disks in a mirror pair.  Sometimes sequential reads may be 
from the same side of the mirror.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] raidz-1 vs mirror

2009-11-11 Thread Richard Elling

On Nov 11, 2009, at 4:30 PM, Rob Logan wrote:




from a two disk (10krpm) mirror layout to a three disk raidz-1.


wrights will be unnoticeably slower for raidz1 because of parity  
calculation

and latency of a third spindle. but reads will be 1/2 the speed
of the mirror because it can split the reads between two disks.


... where speed is latency.  For bandwidth, a 3-device RAIDZ
should be approximately the same as a 2-way mirror. For larger
RAIDZ sets, bandwidth/space can scale.



another way to say the same thing:

a raidz will be the speed of the slowest disk in the array, while a
mirror will be x(Number of mirrors)  time faster for reads or
the the speed of the slowest disk for wrights.


The model I use for a pool with no cache or log devices, is that
the number of small, random reads (IOPS) is approximately
  RAIDZ IOPS  = IOPS of one device * N/(N-1)
where N is the number of disks in in the RAIDZ set.

This model completely falls apart for workloads other than
small, random reads or if a cache or log device exists.  It also
explains why using a SSD for a cache device for workloads
which make small, random reads can be a huge win :-)
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on JBOD storage, mpt driver issue - server not responding

2009-11-11 Thread Travis Tabbal
 Hi, you could try LSI itmpt driver as well, it seems
 to handle this better, although I think it only
 supports 8 devices at once or so.
 
 You could also try more recent version of opensolaris
 (123 or even 126), as there seems to be a lot fixes
 regarding mpt-driver (which still seems to have
 issues).


I won't speak for the OP, but I've been seeing this same behaviour on 126 with 
LSI 1068E based cards (Supermicro USAS-L8i). 

For the LSI driver, how does one install it? I'm new to OpenSolaris and don't 
want to mess it up. It looked to be very old, is Solaris backward compatibility 
that good? 

It would be really nice if Sun would at least acknowledge the bug and that they 
can/can't reproduce it. I'm happy to supply information and test things if it 
will help. I have some spare disks I can attach to one of these cards and test 
driver updates and such. It sounds like people with Sun hardware are 
experiencing this as well.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on JBOD storage, mpt driver issue - server not responding

2009-11-11 Thread Travis Tabbal
 Have you tried another SAS-cable?


I have. 2 identical SAS cards, different cables, different disks (brand, size, 
etc). I get the errors on random disks in the pool. I don't think it's hardware 
related as there have been a few reports of this issue already.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on JBOD storage, mpt driver issue - server not responding

2009-11-11 Thread James C. McPherson

Travis Tabbal wrote:

Hi, you could try LSI itmpt driver as well, it seems to handle this
better, although I think it only supports 8 devices at once or so.

You could also try more recent version of opensolaris (123 or even
126), as there seems to be a lot fixes regarding mpt-driver (which
still seems to have issues).



I won't speak for the OP, but I've been seeing this same behaviour on 126
 with LSI 1068E based cards (Supermicro USAS-L8i). For the LSI driver,
how does one install it? I'm new to OpenSolaris and don't want to mess it
up. It looked to be very old, is Solaris backward compatibility that
good?


I don't know whether itmpt has been updated to cope with OpenSolaris.
Use it at your own risk.


It would be really nice if Sun would at least acknowledge the bug and
that they can/can't reproduce it. I'm happy to supply information and
test things if it will help. I have some spare disks I can attach to one
of these cards and test driver updates and such. It sounds like people
with Sun hardware are experiencing this as well.


The first step towards acknowledging that there is a problem
is you logging a bug in bugs.opensolaris.org. If you don't, we
don't know that there might be a problem outside of the ones
that we identify.


James C. McPherson
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp   http://www.jmcp.homeunix.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] COMSTAR iSCSI with vSphere not working

2009-11-11 Thread Duncan Bradey
I am at a loss of where else to look to work out why my vSphere 4 server cannot 
access my iSCSI LUNs via the COMSTAR iSCSI target.

# uname -a
SunOS prmel1iscsi01 5.11 snv_111b i86pc i386 i86pc Solaris
# itadm list-target -v
TARGET NAME  STATESESSIONS 
iqn.1986-03.com.sun:02:dd37a032-4edf-cdb5-a1aa-81726262d094  online   6
alias:  prmel1iscsi01
auth:   none (defaults)
targetchapuser: -
targetchapsecret:   unset
tpg-tags:   iSCSI = 2
# itadm list-tpg -v
TARGET PORTAL GROUP   PORTAL COUNT
iSCSI 2
portals:172.22.205.28:3260,172.22.205.27:3260

My vSphere server logs into the target OK

# stmfadm list-target -v
Target: iqn.1986-03.com.sun:02:dd37a032-4edf-cdb5-a1aa-81726262d094
Operational Status: Online
Provider Name : iscsit
Alias : -
Sessions  : 12
Initiator: iqn.1998-01.com.vmware:PRMEL1VSPCOR02-74207c44
Alias: -
Logged in since: Thu Nov 12 16:45:00 2009
Initiator: iqn.1998-01.com.vmware:PRMEL1VSPCOR02-74207c44
Alias: -
Logged in since: Thu Nov 12 16:45:00 2009
Initiator: iqn.1998-01.com.vmware:PRMEL1VSPCOR02-74207c44
Alias: -
Logged in since: Thu Nov 12 16:45:00 2009
Initiator: iqn.1998-01.com.vmware:PRMEL1VSPCOR02-74207c44
Alias: -
Logged in since: Thu Nov 12 16:45:00 2009
Initiator: iqn.1998-01.com.vmware:PRMEL1VSPCOR02-74207c44
Alias: -
Logged in since: Thu Nov 12 16:45:00 2009
Initiator: iqn.1998-01.com.vmware:PRMEL1VSPCOR02-74207c44
Alias: -
Logged in since: Thu Nov 12 16:45:00 2009
Initiator: iqn.1998-01.com.vmware:PRMEL1VSPCOR02-74207c44
Alias: -
Logged in since: Thu Nov 12 16:44:20 2009
Initiator: iqn.1998-01.com.vmware:PRMEL1VSPCOR02-74207c44
Alias: -
Logged in since: Thu Nov 12 16:44:20 2009
Initiator: iqn.1998-01.com.vmware:PRMEL1VSPCOR02-74207c44
Alias: -
Logged in since: Thu Nov 12 16:44:20 2009
Initiator: iqn.1998-01.com.vmware:PRMEL1VSPCOR02-74207c44
Alias: -
Logged in since: Thu Nov 12 16:44:20 2009
Initiator: iqn.1998-01.com.vmware:PRMEL1VSPCOR02-74207c44
Alias: -
Logged in since: Thu Nov 12 16:44:20 2009
Initiator: iqn.1998-01.com.vmware:PRMEL1VSPCOR02-74207c44
Alias: -
Logged in since: Thu Nov 12 16:44:20 2009
# itadm list-initiator
INITIATOR NAME   CHAPUSER  SECRET 
iqn.1998-01.com.vmware:prmel1vspcor02-74207c44   none 
unset
# stmfadm list-hg -v 
Host Group: prmel1vspcor
Member: iqn.1998-01.com.vmware:prmel1vspcor02-74207c44
# stmfadm list-lu
LU Name: 600144F0B00309004AFB990C0001
LU Name: 600144F0B00309004AFB990D0002
LU Name: 600144F0B00309004AFB990E0003
LU Name: 600144F0B00309004AFB990F0004
# stmfadm list-view -l 600144F0B00309004AFB990C0001
View Entry: 0
Host group   : prmel1vspcor
Target group : All
LUN  : 0
# stmfadm list-view -l 600144F0B00309004AFB990D0002
View Entry: 0
Host group   : prmel1vspcor
Target group : All
LUN  : 1
# stmfadm list-view -l 600144F0B00309004AFB990E0003
View Entry: 0
Host group   : prmel1vspcor
Target group : All
LUN  : 2
# stmfadm list-view -l 600144F0B00309004AFB990F0004
View Entry: 0
Host group   : prmel1vspcor
Target group : All
LUN  : 3

The vSphere server shows the following:-

Nov 12 16:39:35 PRMEL1VSPCOR02 vmkernel: 0:00:25:02.440 cpu0:4245)ScsiScan: 
839: Path 'vmhba34:C0:T0:L0': Vendor: 'SUN '  Model: 'COMSTAR  '  
Rev: '1.0 ' 
Nov 12 16:39:35 PRMEL1VSPCOR02 vmkernel: 0:00:25:02.440 cpu0:4245)ScsiScan: 
842: Path 'vmhba34:C0:T0:L0': Type: 0x1f, ANSI rev: 5, TPGS: 0 (none) 
Nov 12 16:39:35 PRMEL1VSPCOR02 vmkernel: 0:00:25:02.440 cpu0:4245)ScsiScan: 
105: Path 'vmhba34:C0:T0:L0': Peripheral qualifier 0x1 not supported 
Nov 12 16:39:35 PRMEL1VSPCOR02 vmkernel: 0:00:25:02.441 cpu0:4245)ScsiScan: 
839: Path 'vmhba34:C1:T0:L0': Vendor: 'SUN '  Model: 'COMSTAR  '  
Rev: '1.0 ' 
Nov 12 16:39:35 PRMEL1VSPCOR02 vmkernel: 0:00:25:02.441 cpu0:4245)ScsiScan: 
842: Path 'vmhba34:C1:T0:L0': Type: 0x1f, ANSI rev: 5, TPGS: 0 (none) 
Nov 12 16:39:35 PRMEL1VSPCOR02 vmkernel: 0:00:25:02.441 cpu0:4245)ScsiScan: 
105: Path 'vmhba34:C1:T0:L0': Peripheral qualifier 0x1 not supported 
Nov 12 16:39:35 PRMEL1VSPCOR02 vmkernel: 0:00:25:02.441 cpu0:4245)ScsiScan: 
839: Path 'vmhba34:C2:T0:L0': Vendor: 'SUN '  Model: 'COMSTAR  '  
Rev: '1.0 ' 
Nov 12 16:39:35 PRMEL1VSPCOR02 vmkernel: 0:00:25:02.441 cpu0:4245)ScsiScan: 
842: Path 

Re: [zfs-discuss] COMSTAR iSCSI with vSphere not working

2009-11-11 Thread Rasmus Fauske

Duncan Bradey skrev:

I am at a loss of where else to look to work out why my vSphere 4 server cannot 
access my iSCSI LUNs via the COMSTAR iSCSI target.

I am at a complete loss, I've tried everything that I can think of, using CHAP, 
disabling CHAP, recreating the target, I even reinstalled the OS and 
reconfigured the COMSTAR stack from scratch.
  
How large are the volumes you are creating ? In my setup vSphere could 
not see the volume if it was above 2000GB


I am running b118 with comstar and vSphere and that works great. Using 
file backed iscsi luns (thin provisioned)


--
Rasmus Fauske


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on JBOD storage, mpt driver issue - server not responding

2009-11-11 Thread James C. McPherson

Travis Tabbal wrote:



On Wed, Nov 11, 2009 at 10:25 PM, James C. McPherson 
j...@opensolaris.org mailto:j...@opensolaris.org wrote:



The first step towards acknowledging that there is a problem
is you logging a bug in bugs.opensolaris.org
http://bugs.opensolaris.org. If you don't, we
don't know that there might be a problem outside of the ones
that we identify.

I apologize if I offended by not knowing the protocol. I thought that 
posting in the forums was watched and the bug tracker updated by people 
at Sun. I didn't think normal users had access to submit bugs. Thank you 
for the reply. I have submitted a bug on the issue with all the 
information I think might be useful. If someone at Sun would like more 
information, output from commands, or testing, I would be happy to help.



Hi Travis,
no, you didn't offend at all. There's a chunk of doco on the
hub.opensolaris.org site which talks about bugs, and there's a link
to both bugster and bugzilla. Bugster is the internal tool
which you can view via bugs.opensolaris.org. The bugzilla instance
is r/w by anybody who has an account on opensolaris.org. Most of
the kernel groups are not yet looking at the bugzilla instance so
it's better to use bugster at this point in time.


I was not provided with a bug number by the system. I assume that those 
are given out if the bug is deemed worthy of further consideration.


That's an invalid assumption. As it happens, the bugs.o.o interface
does not always provide you with a bug id, we have to wait for the
new entry to show up in our internal triage queue and then reassign
it to where it really should go. I haven't seen your bug turn up yet
so I can't help more at this point. No doubt a copy of it will turn
up in my inbox after I've gone to sleep.

Whoever picks up your bug should contact you directly to get copies
of the other data you mention.



James C. McPherson
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp   http://www.jmcp.homeunix.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] COMSTAR iSCSI with vSphere not working

2009-11-11 Thread Duncan Bradey
Rasmus,

I had 4 volumes:-

Found 4 LU(s)

  GUIDDATA SIZE   SOURCE
  ---  
600144f0b00309004afb990f0004  1099511562240
/dev/zvol/rdsk/infobrick/iscsi/prmel1vspcor-jhgtier2-03
600144f0b00309004afb990e0003  1099511562240
/dev/zvol/rdsk/infobrick/iscsi/prmel1vspcor-jhgtier2-02
600144f0b00309004afb990d0002  2196875706368
/dev/zvol/rdsk/infobrick/iscsi/prmel1vspcor-jhgtier2-01
600144f0b00309004afb990c0001  2196875706368
/dev/zvol/rdsk/infobrick/iscsi/prmel1vspcor-jhgtier2-00

I've reallocated these same LUN's to a Windows server (a VM with MS iSCSI 
initiator) and this works fine.

One your posting below I've created a new 1024GB LUN and tried allocated this 
and only this LUN to vSphere, but I'm getting the same issue.  Here are the new 
details:-

  GUIDDATA SIZE   SOURCE
  ---  
600144f0b00309004afbac8a0007  1099511562240
/dev/zvol/rdsk/infobrick/iscsi/prmel1vspcor-jhgtier2-04

# stmfadm list-view -l 600144f0b00309004afbac8a0007
View Entry: 0
Host group   : prmel1vspcor
Target group : All
LUN  : 0

# stmfadm list-hg -v
Host Group: prmel1vspcor
Member: iqn.1998-01.com.vmware:prmel1vspcor02-74207c44

Unfortunately there is no change to the behaviour - same errors in the 
/var/log/vmkernel:-

Nov 12 17:29:35 PRMEL1VSPCOR02 vmkernel: 0:01:15:02.440 cpu6:4252)ScsiScan: 
839: Path 'vmhba34:C0:T0:L0': Vendor: 'SUN '  Model: 'COMSTAR  '  
Rev: '1.0 ' 
Nov 12 17:29:35 PRMEL1VSPCOR02 vmkernel: 0:01:15:02.440 cpu6:4252)ScsiScan: 
842: Path 'vmhba34:C0:T0:L0': Type: 0x1f, ANSI rev: 5, TPGS: 0 (none) 
Nov 12 17:29:35 PRMEL1VSPCOR02 vmkernel: 0:01:15:02.440 cpu6:4252)ScsiScan: 
105: Path 'vmhba34:C0:T0:L0': Peripheral qualifier 0x1 not supported 
...
Nov 12 17:35:05 PRMEL1VSPCOR02 vmkernel: 0:00:02:30.505 cpu1:4241)WARNING: 
iscsi_vmk: iscsivmk_ConnReceiveAtomic: vmhba34:CH:2 T:0 CN:0: Failed to receive 
data: Connection closed by peer 
Nov 12 17:35:05 PRMEL1VSPCOR02 vmkernel: 0:00:02:30.505 cpu1:4241)WARNING: 
iscsi_vmk: iscsivmk_ConnReceiveAtomic: Sess [ISID: 00023d03 TARGET: 
iqn.1986-03.com.sun:02:dd37a032-4edf-cdb5-a1aa-81726262d094 TPGT: 2 TSIH: 0] 
Nov 12 17:35:05 PRMEL1VSPCOR02 vmkernel: 0:00:02:30.505 cpu1:4241)WARNING: 
iscsi_vmk: iscsivmk_ConnReceiveAtomic: Conn [CID: 0 L: 172.22.205.39:52606 R: 
172.22.205.27:3260] 
Nov 12 17:35:05 PRMEL1VSPCOR02 vmkernel: 0:00:02:30.505 cpu1:4241)iscsi_vmk: 
iscsivmk_ConnRxNotifyFailure: vmhba34:CH:2 T:0 CN:0: Connection rx notifying 
failure: Failed to Receive. State=Online 
Nov 12 17:35:05 PRMEL1VSPCOR02 vmkernel: 0:00:02:30.505 cpu1:4241)iscsi_vmk: 
iscsivmk_ConnRxNotifyFailure: Sess [ISID: 00023d03 TARGET: 
iqn.1986-03.com.sun:02:dd37a032-4edf-cdb5-a1aa-81726262d094 TPGT: 2 TSIH: 0] 
Nov 12 17:35:05 PRMEL1VSPCOR02 vmkernel: 0:00:02:30.505 cpu1:4241)iscsi_vmk: 
iscsivmk_ConnRxNotifyFailure: Conn [CID: 0 L: 172.22.205.39:52606 R: 
172.22.205.27:3260] 
Nov 12 17:35:05 PRMEL1VSPCOR02 vmkernel: 0:00:02:30.505 cpu1:4241)WARNING: 
iscsi_vmk: iscsivmk_StopConnection: vmhba34:CH:2 T:0 CN:0: Processing CLEANUP 
event 
Nov 12 17:35:05 PRMEL1VSPCOR02 vmkernel: 0:00:02:30.505 cpu1:4241)WARNING: 
iscsi_vmk: iscsivmk_StopConnection: Sess [ISID: 00023d03 TARGET: 
iqn.1986-03.com.sun:02:dd37a032-4edf-cdb5-a1aa-81726262d094 TPGT: 2 TSIH: 0] 
...

I've even tried a reboot of my vSphere server just to be sure that nothing was 
invalid there (btw, CHAP is currently disabled as well), but no good.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss