Re: [zfs-discuss] Disk Issues

2010-02-08 Thread Brian McKerr
Ok, I changed the cable and also tried swapping the port on the motherboard. 
The drive continued to have huge asvc_t and also started to have huge wsvc_t. I 
unplugged it and the 'pool' is now operating as per expected performance wise.

See the 'storage' forum for any further updates as I am now convinced this has 
nothing to do with ZFS or my attempt to disable the ZIL. 8-)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS ZIL + L2ARC SSD Setup

2010-02-08 Thread Felix Buenemann

Hi Daniel,

Am 08.02.10 05:45, schrieb Daniel Carosone:

On Mon, Feb 08, 2010 at 04:58:38AM +0100, Felix Buenemann wrote:

I have some questions about the choice of SSDs to use for ZIL and L2ARC.


I have one answer.  The other questions are mostly related to your
raid controller, which I can't answer directly.


- Is it safe to run the L2ARC without battery backup with write cache
enabled?


Yes, it's just a cache, errors will be detected and re-fetched from
the pool. Also, it is volatile-at-reboot (starts cold) at present
anyway, so preventing data loss at power off is not worth spending any
money or time over.


Thanks for clarifying this.


- Does it make sense to use HW RAID10 on the storage controller or would
I get better performance out of JBOD + ZFS RAIDZ2?


A more comparable alternative would be using the controller in jbod
mode and a pool of zfs mirror vdevs.  I'd expect that gives similar
performance to the controller's mirroring (unless higher pci bus usage
is a bottleneck) but gives you the benefits of zfs healing on disk
errors.


I was under the impression, that using HW RAID10 would save me 50% PCI 
bandwidth and allow the controller to more intelligently handle its 
cache, so I sticked with it. But I should run some benchmarks in RAID10 
vs. JBOD with ZFS mirrors to see if this makes a difference.



Performance of RaidZ/5 vs mirrors is a much more workload-sensitive
question, regardless of the additional implementation-specific
wrinkles of either kind.

Your emphasis on lots of slog and l2arc suggests performance is a
priority.  Whether all this kit is enough to hide the IOPS penalty of
raidz/5, or whether you need it even to make mirrors perform
adequately, you'll have to decide yourself.


So it seems right to assume, that RAIDZ1/2 has about the same 
performance hit as HW RAID5/6 with Write Cache. I wasn't aware that ZFS 
can do RAID10 style multiple mirrors, so that seems to be the better 
option anyways.



--
Dan.


- Felix

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Intrusion Detection - powered by ZFS Checksumming ?

2010-02-08 Thread Lutz Schumann
Hello, 

an idea popped into my mind while talking about security and intrusion 
detection. 

Host based ID may use Checksumming for file change tracking. It works like 
this: 

Once installed and knowning the software is OK, a baseline is created. 
Then in every check - verify the current status of the data  with the baseline 
and report changes. 

An example for this is AIDE.  

The difficult part is the checksumming - this takes time. 

My idea would be to use ZFS snapshots for this. 

baseline creation = create snapshot
baseline verification = verify the checksums of the objects and report objects 
diffent

This could work for non-zvol environments. 

Is it possible to extract the checksums of ZFS objects with a command line tool 
? 

Regards, 
Robert
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Big send/receive hangs on 2009.06

2010-02-08 Thread David Dyer-Bennet
So, I was running my full backup last night, backing up my main data 
pool zp1, and it seems to have hung.


Any suggestions for additional data gathering?

-bash-3.2$ zpool status zp1
  pool: zp1
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
pool will no longer be accessible on older software versions.
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
zp1 ONLINE   0 0 0
  mirrorONLINE   0 0 0
c5t0d0  ONLINE   0 0 0
c5t1d0  ONLINE   0 0 0
  mirrorONLINE   0 0 0
c6t0d0  ONLINE   0 0 0
c6t1d0  ONLINE   0 0 0

errors: No known data errors

to one of my external USB drives holding pool bup-wrack

-bash-3.2$ zpool status bup-wrack
  pool: bup-wrack
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
bup-wrack   ONLINE   0 0 0
  c7t0d0ONLINE   0 0 0

errors: No known data errors

The line in the script that starts the send and receive is

zfs send -Rv $srcsnap | zfs recv -Fudv $BUPPOOL/$HOSTNAME/$FS

And the -v causes the start and stop of each incremental stream to be 
announced of course.  The last output from it was:


sending from @bup-20090315-190807UTC to zp1/d...@bup-20090424-034702utc
receiving incremental stream of zp1/d...@bup-20090424-034702utc into 
bup-wrack/fsfs/zp1/d...@bup-20090424-034702utc


And it appears hung when I got up this morning.  No activity on the 
drive, zpool iostat shows no activity on the backup pool and no 
unexplained activity on the data pool.  The server is responsive, and 
the data pool is responsive.  ps shows considerable accumulated time on 
the backup and receive processes, but no change in the last half hour.


zpool list shows that quite a lot of data has not yet been transferred 
to the backup pool (which was newly-created when this backup started).


-bash-3.2$ zpool list
NAMESIZE   USED  AVAILCAP  HEALTH  ALTROOT
bup-wrack   928G   438G   490G47%  ONLINE  /backups/bup-wrack
rpool74G  6.35G  67.7G 8%  ONLINE  -
zp1 744G   628G   116G84%  ONLINE  -

ps -ef shows

root  3153  3145   0 23:09:07 pts/3  19:59 zfs recv -Fudv 
bup-wrack/fsfs/zp1
root  3145  3130   0 23:09:04 pts/3   0:00 /bin/bash 
./bup-backup-full zp1 bup-wrack
root  3152  3145   0 23:09:07 pts/3  17:06 zfs send -Rv 
z...@bup-20100208-050907gmt


zfs list shows:

-bash-3.2$ zfs list -t snapshot,filesystem -r zp1
NAME USED  AVAIL  REFER  MOUNTPOINT
zp1  628G   104G  33.8M  /home
z...@bup-20090223-033745utc  0  -  33.8M  -
z...@bup-20090225-184857utc  0  -  33.8M  -
z...@bup-20090302-032437utc  0  -  33.8M  -
z...@bup-20090309-033514utc  0  -  33.8M  -
z...@bup-20090315-190807utc  0  -  33.8M  -
z...@bup-20090424-034702utc22K  -  33.8M  -
z...@bup-20090619-063536gmt  0  -  33.8M  -
z...@bup-20090619-143851utc  0  -  33.8M  -
z...@bup-20090804-024506utc  0  -  33.8M  -
z...@bup-20090906-192431utc  0  -  33.8M  -
z...@bup-20100102-035216utc  0  -  33.8M  -
z...@bup-20100102-184101utc  0  -  33.8M  -
z...@bup-20100208-050707gmt  0  -  33.8M  -
z...@bup-20100208-050907gmt  0  -  33.8M  -
zp1/ddb  494G   104G   452G  /home/ddb
zp1/d...@bup-20090223-033745utc  5.12M  -   326G  -
zp1/d...@bup-20090225-184857utc  4.15M  -   328G  -
zp1/d...@bup-20090302-032437utc  16.6M  -   329G  -
zp1/d...@bup-20090309-033514utc  8.95M  -   330G  -
zp1/d...@bup-20090315-190807utc  35.3M  -   330G  -
zp1/d...@bup-20090424-034702utc   140M  -   345G  -
zp1/d...@bup-20090619-063536gmt  43.9M  -   386G  -
zp1/d...@bup-20090619-143851utc  44.9M  -   386G  -
zp1/d...@bup-20090804-024506utc  4.30G  -   418G  -
zp1/d...@bup-20090906-192431utc  8.43G  -   440G  -
zp1/d...@bup-20100102-035216utc  4.13G  -   435G  -
zp1/d...@bup-20100102-184101utc   108M  -   431G  -
zp1/d...@bup-20100208-050707gmt   142K  -   452G  -
zp1/d...@bup-20100208-050907gmt   140K  -   452G  -
zp1/jmf 33.5G   104G  33.3G  /home/jmf
zp1/j...@bup-20090223-033745utc  0  -  33.2G  -
zp1/j...@bup-20090225-184857utc  0  -  33.2G  -
zp1/j...@bup-20090302-032437utc  0  -  33.2G  -
zp1/j...@bup-20090309-033514utc  0  -  33.2G  -
zp1/j...@bup-20090315

Re: [zfs-discuss] Intrusion Detection - powered by ZFS Checksumming ?

2010-02-08 Thread Darren J Moffat

On 08/02/2010 12:55, Lutz Schumann wrote:

Hello,

an idea popped into my mind while talking about security and intrusion 
detection.

Host based ID may use Checksumming for file change tracking. It works like this:

Once installed and knowning the software is OK, a baseline is created.
Then in every check - verify the current status of the data  with the baseline 
and report changes.

An example for this is AIDE.

The difficult part is the checksumming - this takes time.

My idea would be to use ZFS snapshots for this.

baseline creation = create snapshot
baseline verification = verify the checksums of the objects and report objects 
diffent

This could work for non-zvol environments.

Is it possible to extract the checksums of ZFS objects with a command line tool 
?


Only with the zdb(1M) tool but note that the checksums are NOT of files 
but of the ZFS blocks.


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Recover ZFS Array after OS Crash?

2010-02-08 Thread Robert Milkowski

On 06/02/2010 13:18, Fajar A. Nugraha wrote:

On Sat, Feb 6, 2010 at 1:32 AM, Jjahservan...@gmail.com  wrote:
   

saves me hundreds on HW-based RAID controllers ^_^
 

... which you might need to fork over to buy additional memory or faster CPU :P

Don't get me wrong, zfs is awesome, but to do so it needs more CPU
power and RAM (and possibly SSD) compared to other filesystems. If
your main concern is cost, then some HW raid controller might be more
effective.

   

any real data to back your claims?
Then you need to be realistic - if ZFS consumes lets say 10-30% more CPU 
but still can do several GBs (assuming your storage can handle it) on a 
modern x86 box then for 99% of use cases where *much* less data is being 
actually handled by a fs in real workloads the difference in CPU usage 
in neglectable. This is even more so for fileservers (as in the OP case) 
where the box is usually dedicated to do a fileserving only.


In real life in most environments, ZFS or not, the lvm/fs layer consume 
much less than 10% of your CPU on an entry level x86 server and if ZFS 
would consume a little bit more it doesn't really matter.


For example IIRC an old x4500 (older AMD CPUs) can do about 2GB/s 
sustained throughput when using ZFS while still not saturating CPUs.


--
Robert Milkowski
http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS ZIL + L2ARC SSD Setup

2010-02-08 Thread Bob Friesenhahn

On Mon, 8 Feb 2010, Felix Buenemann wrote:


I was under the impression, that using HW RAID10 would save me 50% PCI 
bandwidth and allow the controller to more intelligently handle its cache, so 
I sticked with it. But I should run some benchmarks in RAID10 vs. JBOD with 
ZFS mirrors to see if this makes a difference.


The answer to this is it depends.  If the PCI-E and controller have 
enough bandwidth capacity, then the write bottleneck will be the 
disk itself.  If there is insufficient controller bandwidth capacity, 
then the controller becomes the bottleneck.   If the bottleneck is the 
disks, then there is hardly any write penalty from using zfs mirrors. 
If the bottleneck is the controller, then you may see 1/2 the 
write performance due to using zfs mirrors.


If you are using modern computing hardware, then the disks should be 
the bottleneck.


Performance of HW RAID controllers is a complete unknown and they tend 
to modify the data so that it depends on the specific controller, 
which really sucks if the controller fails.  It is usually better to 
run the controller in a JBOD mode (taking advantage of its write 
cache, if available) and use zfs mirrors.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs send/receive : panic and reboot

2010-02-08 Thread Bruno Damour
copied from opensolaris-dicuss as this probably belongs here.

I kept on trying to migrate my pool with children (see previous threads) and 
had the (bad) idea to try the -d option on the receive part.
The system reboots immediately.

Here is the log in /var/adm/messages

Feb 8 16:07:09 amber unix: [ID 836849 kern.notice]
Feb 8 16:07:09 amber ^Mpanic[cpu1]/thread=ff014ba86e40:
Feb 8 16:07:09 amber genunix: [ID 169834 kern.notice] avl_find() succeeded 
inside avl_add()
Feb 8 16:07:09 amber unix: [ID 10 kern.notice]
Feb 8 16:07:09 amber genunix: [ID 655072 kern.notice] ff00053c4660 
genunix:avl_add+59 ()
Feb 8 16:07:09 amber genunix: [ID 655072 kern.notice] ff00053c46c0 
zfs:find_ds_by_guid+b9 ()
Feb 8 16:07:09 amber genunix: [ID 655072 kern.notice] ff00053c46f0 
zfs:findfunc+23 ()
Feb 8 16:07:09 amber genunix: [ID 655072 kern.notice] ff00053c47d0 
zfs:dmu_objset_find_spa+38c ()
Feb 8 16:07:09 amber genunix: [ID 655072 kern.notice] ff00053c4810 
zfs:dmu_objset_find+40 ()
Feb 8 16:07:09 amber genunix: [ID 655072 kern.notice] ff00053c4a70 
zfs:dmu_recv_stream+448 ()
Feb 8 16:07:09 amber genunix: [ID 655072 kern.notice] ff00053c4c40 
zfs:zfs_ioc_recv+41d ()
Feb 8 16:07:09 amber genunix: [ID 655072 kern.notice] ff00053c4cc0 
zfs:zfsdev_ioctl+175 ()
Feb 8 16:07:09 amber genunix: [ID 655072 kern.notice] ff00053c4d00 
genunix:cdev_ioctl+45 ()
Feb 8 16:07:09 amber genunix: [ID 655072 kern.notice] ff00053c4d40 
specfs:spec_ioctl+5a ()
Feb 8 16:07:09 amber genunix: [ID 655072 kern.notice] ff00053c4dc0 
genunix:fop_ioctl+7b ()
Feb 8 16:07:09 amber genunix: [ID 655072 kern.notice] ff00053c4ec0 
genunix:ioctl+18e ()
Feb 8 16:07:09 amber genunix: [ID 655072 kern.notice] ff00053c4f10 
unix:brand_sys_syscall32+1ca ()
Feb 8 16:07:09 amber unix: [ID 10 kern.notice]
Feb 8 16:07:09 amber genunix: [ID 672855 kern.notice] syncing file systems...
Feb 8 16:07:09 amber genunix: [ID 904073 kern.notice] done
Feb 8 16:07:10 amber genunix: [ID 111219 kern.notice] dumping to 
/dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
Feb 8 16:07:10 amber ahci: [ID 405573 kern.info] NOTICE: ahci0: 
ahci_tran_reset_dport port 3 reset port
Feb 8 16:07:35 amber genunix: [ID 10 kern.notice]
Feb 8 16:07:35 amber genunix: [ID 665016 kern.notice] ^M100% done: 107693 pages 
dumped,
Feb 8 16:07:35 amber genunix: [ID 851671 kern.notice] dump succeeded
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] f20 x4540

2010-02-08 Thread Robert Milkowski
Hi,

Officially it's not supported (yet?).

Has anyone tried it with x4540 though?


-- 
Robert Milkowski
http://milek.blogspot.com
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Drive failure causes system to be unusable

2010-02-08 Thread Martin Mundschenk
Hi!

I have a OSOL box as a home file server. It has 4 1TB USB Drives and 1 TB 
FW-Drive attached. The USB devices are combined to a RaidZ-Pool and the FW 
Drive acts as a hot spare.

This night, one USB drive faulted and the following happened:

1. The zpool was not accessible anymore
2. changing to a directory on the pool causes the tty to get stuck
3. no reboot was possible
4. the system had to be rebooted ungracefully by pushing the power button

After reboot:

1. The zpool ran in a degraded state
2. the spare device did NOT automatically go online
3. the system did not boot to the usual run level, and no auto-boot zones where 
started, GDM did not start either


NAME STATE READ WRITE CKSUM
tank DEGRADED 0 0 0
  raidz1-0   DEGRADED 0 0 0
c21t0d0  ONLINE   0 0 0
c22t0d0  ONLINE   0 0 0
c20t0d0  FAULTED  0 0 0  corrupted data
c23t0d0  ONLINE   0 0 0
cache
  c18t0d0ONLINE   0 0 0
spares
  c16t0d0AVAIL  



My questions:

1. Why does the system get stuck, when a device faults?
2. Why does the hot spare not go online? (The manual says, that going online 
automatically is the default behavior)
3. Why does the system not boot to the usual run level, when a zpool is in a 
degraded state at boot time?


Regards,
Martin


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mounting a snapshot of an iSCSI volume using Windows

2010-02-08 Thread Scott Meilicke
Thanks Dan.

When I try the clone then import:

pfexec zfs clone 
data01/san/gallardo/g...@zfs-auto-snap:monthly-2009-12-01-00:00 
data01/san/gallardo/g-testandlab
pfexec sbdadm import-lu /dev/zvol/rdsk/data01/san/gallardo/g-testandlab

The sbdadm import-lu gives me:

sbdadm: guid in use

which makes sense, now that I see it. The man pages make it look like I cannot 
give it another GUID during the import. Any other thoughts? I *could* delete 
the current lu, import, get my data off and reverse the process, but that would 
take the current volume off line, which is not what I want to do.

Thanks,
Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive : panic and reboot

2010-02-08 Thread Lori Alt


Can you please send a complete list of the actions taken:  The commands 
you used to create the send stream, the commands used to receive the 
stream.  Also the output of `zfs list -t all` on both the sending and 
receiving sides.  If you were able to collect a core dump (it should be 
in /var/crash/hostname), it would be good to upload it.


The panic you're seeing is in the code that is specific to receiving a 
dedup'ed stream.  It's possible that you could do the migration if you 
turned off dedup (i.e. didn't specify -D) when creating the send 
stream.. However, then we wouldn't be able to diagnose and fix what 
appears to be a bug.


The best way to get us the crash dump is to upload it here:

https://supportfiles.sun.com/upload

We need either both vmcore.X and unix.X OR you can just send us vmdump.X.

Sometimes big uploads have mixed results, so if there is a problem some 
helpful hints are
on 
http://wikis.sun.com/display/supportfiles/Sun+Support+Files+-+Help+and+Users+Guide, 


specifically in section 7.

It's best to include your name or your initials or something in the name 
of the file you upload.  As

you might imagine we get a lot of files uploaded named vmcore.1

You might also create a defect report at http://defect.opensolaris.org/bz/

Lori


On 02/08/10 09:41, Bruno Damour wrote:

copied from opensolaris-dicuss as this probably belongs here.

I kept on trying to migrate my pool with children (see previous threads) and 
had the (bad) idea to try the -d option on the receive part.
The system reboots immediately.

Here is the log in /var/adm/messages

Feb 8 16:07:09 amber unix: [ID 836849 kern.notice]
Feb 8 16:07:09 amber ^Mpanic[cpu1]/thread=ff014ba86e40:
Feb 8 16:07:09 amber genunix: [ID 169834 kern.notice] avl_find() succeeded 
inside avl_add()
Feb 8 16:07:09 amber unix: [ID 10 kern.notice]
Feb 8 16:07:09 amber genunix: [ID 655072 kern.notice] ff00053c4660 
genunix:avl_add+59 ()
Feb 8 16:07:09 amber genunix: [ID 655072 kern.notice] ff00053c46c0 
zfs:find_ds_by_guid+b9 ()
Feb 8 16:07:09 amber genunix: [ID 655072 kern.notice] ff00053c46f0 
zfs:findfunc+23 ()
Feb 8 16:07:09 amber genunix: [ID 655072 kern.notice] ff00053c47d0 
zfs:dmu_objset_find_spa+38c ()
Feb 8 16:07:09 amber genunix: [ID 655072 kern.notice] ff00053c4810 
zfs:dmu_objset_find+40 ()
Feb 8 16:07:09 amber genunix: [ID 655072 kern.notice] ff00053c4a70 
zfs:dmu_recv_stream+448 ()
Feb 8 16:07:09 amber genunix: [ID 655072 kern.notice] ff00053c4c40 
zfs:zfs_ioc_recv+41d ()
Feb 8 16:07:09 amber genunix: [ID 655072 kern.notice] ff00053c4cc0 
zfs:zfsdev_ioctl+175 ()
Feb 8 16:07:09 amber genunix: [ID 655072 kern.notice] ff00053c4d00 
genunix:cdev_ioctl+45 ()
Feb 8 16:07:09 amber genunix: [ID 655072 kern.notice] ff00053c4d40 
specfs:spec_ioctl+5a ()
Feb 8 16:07:09 amber genunix: [ID 655072 kern.notice] ff00053c4dc0 
genunix:fop_ioctl+7b ()
Feb 8 16:07:09 amber genunix: [ID 655072 kern.notice] ff00053c4ec0 
genunix:ioctl+18e ()
Feb 8 16:07:09 amber genunix: [ID 655072 kern.notice] ff00053c4f10 
unix:brand_sys_syscall32+1ca ()
Feb 8 16:07:09 amber unix: [ID 10 kern.notice]
Feb 8 16:07:09 amber genunix: [ID 672855 kern.notice] syncing file systems...
Feb 8 16:07:09 amber genunix: [ID 904073 kern.notice] done
Feb 8 16:07:10 amber genunix: [ID 111219 kern.notice] dumping to 
/dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
Feb 8 16:07:10 amber ahci: [ID 405573 kern.info] NOTICE: ahci0: 
ahci_tran_reset_dport port 3 reset port
Feb 8 16:07:35 amber genunix: [ID 10 kern.notice]
Feb 8 16:07:35 amber genunix: [ID 665016 kern.notice] ^M100% done: 107693 pages 
dumped,
Feb 8 16:07:35 amber genunix: [ID 851671 kern.notice] dump succeeded
  


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mounting a snapshot of an iSCSI volume using Windows

2010-02-08 Thread Dave

Use create-lu to give the clone a different GUID:

sbdadm create-lu /dev/zvol/rdsk/data01/san/gallardo/g-testandlab

--
Dave

On 2/8/10 10:34 AM, Scott Meilicke wrote:

Thanks Dan.

When I try the clone then import:

pfexec zfs clone 
data01/san/gallardo/g...@zfs-auto-snap:monthly-2009-12-01-00:00 
data01/san/gallardo/g-testandlab
pfexec sbdadm import-lu /dev/zvol/rdsk/data01/san/gallardo/g-testandlab

The sbdadm import-lu gives me:

sbdadm: guid in use

which makes sense, now that I see it. The man pages make it look like I cannot 
give it another GUID during the import. Any other thoughts? I *could* delete 
the current lu, import, get my data off and reverse the process, but that would 
take the current volume off line, which is not what I want to do.

Thanks,
Scott

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive : panic and reboot

2010-02-08 Thread Victor Latushkin

Lori Alt wrote:


Can you please send a complete list of the actions taken:  The commands 
you used to create the send stream, the commands used to receive the 
stream.  Also the output of `zfs list -t all` on both the sending and 
receiving sides.  If you were able to collect a core dump (it should be 
in /var/crash/hostname), it would be good to upload it.


If it does not exist, just create it

mkdir -p /var/crash/`uname -n`

and then run 'savecore'.

The panic you're seeing is in the code that is specific to receiving a 
dedup'ed stream.  It's possible that you could do the migration if you 
turned off dedup (i.e. didn't specify -D) when creating the send 
stream.. However, then we wouldn't be able to diagnose and fix what 
appears to be a bug.


The best way to get us the crash dump is to upload it here:

https://supportfiles.sun.com/upload

We need either both vmcore.X and unix.X OR you can just send us vmdump.X.

Sometimes big uploads have mixed results, so if there is a problem some 
helpful hints are
on 
http://wikis.sun.com/display/supportfiles/Sun+Support+Files+-+Help+and+Users+Guide, 


specifically in section 7.


You may consider compressing vmdump.X further with e.g. 7z archiver

7z a vmdump.X.7z vmdump.X



It's best to include your name or your initials or something in the name 
of the file you upload.  As

you might imagine we get a lot of files uploaded named vmcore.1

You might also create a defect report at http://defect.opensolaris.org/bz/

Lori


On 02/08/10 09:41, Bruno Damour wrote:

copied from opensolaris-dicuss as this probably belongs here.

I kept on trying to migrate my pool with children (see previous 
threads) and had the (bad) idea to try the -d option on the receive part.

The system reboots immediately.

Here is the log in /var/adm/messages

Feb 8 16:07:09 amber unix: [ID 836849 kern.notice]
Feb 8 16:07:09 amber ^Mpanic[cpu1]/thread=ff014ba86e40:
Feb 8 16:07:09 amber genunix: [ID 169834 kern.notice] avl_find() 
succeeded inside avl_add()

Feb 8 16:07:09 amber unix: [ID 10 kern.notice]
Feb 8 16:07:09 amber genunix: [ID 655072 kern.notice] ff00053c4660 
genunix:avl_add+59 ()
Feb 8 16:07:09 amber genunix: [ID 655072 kern.notice] ff00053c46c0 
zfs:find_ds_by_guid+b9 ()
Feb 8 16:07:09 amber genunix: [ID 655072 kern.notice] ff00053c46f0 
zfs:findfunc+23 ()
Feb 8 16:07:09 amber genunix: [ID 655072 kern.notice] ff00053c47d0 
zfs:dmu_objset_find_spa+38c ()
Feb 8 16:07:09 amber genunix: [ID 655072 kern.notice] ff00053c4810 
zfs:dmu_objset_find+40 ()
Feb 8 16:07:09 amber genunix: [ID 655072 kern.notice] ff00053c4a70 
zfs:dmu_recv_stream+448 ()
Feb 8 16:07:09 amber genunix: [ID 655072 kern.notice] ff00053c4c40 
zfs:zfs_ioc_recv+41d ()
Feb 8 16:07:09 amber genunix: [ID 655072 kern.notice] ff00053c4cc0 
zfs:zfsdev_ioctl+175 ()
Feb 8 16:07:09 amber genunix: [ID 655072 kern.notice] ff00053c4d00 
genunix:cdev_ioctl+45 ()
Feb 8 16:07:09 amber genunix: [ID 655072 kern.notice] ff00053c4d40 
specfs:spec_ioctl+5a ()
Feb 8 16:07:09 amber genunix: [ID 655072 kern.notice] ff00053c4dc0 
genunix:fop_ioctl+7b ()
Feb 8 16:07:09 amber genunix: [ID 655072 kern.notice] ff00053c4ec0 
genunix:ioctl+18e ()
Feb 8 16:07:09 amber genunix: [ID 655072 kern.notice] ff00053c4f10 
unix:brand_sys_syscall32+1ca ()

Feb 8 16:07:09 amber unix: [ID 10 kern.notice]
Feb 8 16:07:09 amber genunix: [ID 672855 kern.notice] syncing file 
systems...

Feb 8 16:07:09 amber genunix: [ID 904073 kern.notice] done
Feb 8 16:07:10 amber genunix: [ID 111219 kern.notice] dumping to 
/dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
Feb 8 16:07:10 amber ahci: [ID 405573 kern.info] NOTICE: ahci0: 
ahci_tran_reset_dport port 3 reset port

Feb 8 16:07:35 amber genunix: [ID 10 kern.notice]
Feb 8 16:07:35 amber genunix: [ID 665016 kern.notice] ^M100% done: 
107693 pages dumped,

Feb 8 16:07:35 amber genunix: [ID 851671 kern.notice] dump succeeded
  


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mounting a snapshot of an iSCSI volume using Windows

2010-02-08 Thread Scott Meilicke
Sure, but that will put me back into the original situation.

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS ZIL + L2ARC SSD Setup

2010-02-08 Thread Richard Elling
To add to Bob's notes...

On Feb 8, 2010, at 8:37 AM, Bob Friesenhahn wrote:
 On Mon, 8 Feb 2010, Felix Buenemann wrote:
 
 I was under the impression, that using HW RAID10 would save me 50% PCI 
 bandwidth and allow the controller to more intelligently handle its cache, 
 so I sticked with it. But I should run some benchmarks in RAID10 vs. JBOD 
 with ZFS mirrors to see if this makes a difference.
 
 The answer to this is it depends.  If the PCI-E and controller have enough 
 bandwidth capacity, then the write bottleneck will be the disk itself.  

If you have HDDs, the write bandwidth bottleneck will be the disk.

 If there is insufficient controller bandwidth capacity, then the controller 
 becomes the bottleneck.

We don't tend to see this for HDDs, but SSDs can crush a controller and
channel.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Drive failure causes system to be unusable

2010-02-08 Thread Richard Elling
On Feb 8, 2010, at 9:05 AM, Martin Mundschenk wrote:
 Hi!
 
 I have a OSOL box as a home file server. It has 4 1TB USB Drives and 1 TB 
 FW-Drive attached. The USB devices are combined to a RaidZ-Pool and the FW 
 Drive acts as a hot spare.
 
 This night, one USB drive faulted and the following happened:
 
 1. The zpool was not accessible anymore
 2. changing to a directory on the pool causes the tty to get stuck
 3. no reboot was possible
 4. the system had to be rebooted ungracefully by pushing the power button
 
 After reboot:
 
 1. The zpool ran in a degraded state
 2. the spare device did NOT automatically go online
 3. the system did not boot to the usual run level, and no auto-boot zones 
 where started, GDM did not start either
 
 
 NAME STATE READ WRITE CKSUM
 tank DEGRADED 0 0 0
   raidz1-0   DEGRADED 0 0 0
 c21t0d0  ONLINE   0 0 0
 c22t0d0  ONLINE   0 0 0
 c20t0d0  FAULTED  0 0 0  corrupted data
 c23t0d0  ONLINE   0 0 0
 cache
   c18t0d0ONLINE   0 0 0
 spares
   c16t0d0AVAIL  
 
 
 
 My questions:
 
 1. Why does the system get stuck, when a device faults?

Are you sure there is not another fault here?  What does svcs -xv show?
 -- richard

 2. Why does the hot spare not go online? (The manual says, that going online 
 automatically is the default behavior)
 3. Why does the system not boot to the usual run level, when a zpool is in a 
 degraded state at boot time?
 
 
 Regards,
 Martin
 
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Intrusion Detection - powered by ZFS Checksumming ?

2010-02-08 Thread Lutz Schumann
 Only with the zdb(1M) tool but note that the
 checksums are NOT of files 
 but of the ZFS blocks.

Thanks - bocks, right (doh) - thats what I was missing. Damn it would be so 
nice :(
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mounting a snapshot of an iSCSI volume using Windows

2010-02-08 Thread Dave
Ah, I didn't see the original post. If you're using an old COMSTAR 
version prior to build 115, maybe the metadata placed at the first 64K 
of the volume is causing problems?


http://mail.opensolaris.org/pipermail/storage-discuss/2009-September/007192.html

The clone and create-lu process works for mounting cloned volumes under 
linux with b130. I don't have any windows clients to test with.


--
Dave


On 2/8/10 11:23 AM, Scott Meilicke wrote:

Sure, but that will put me back into the original situation.

-Scott

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mounting a snapshot of an iSCSI volume using Windows

2010-02-08 Thread Scott Meilicke
That is likely it. I create the volume using 2009.06, then later upgraded to 
124. I just now created a new zvol, connected it to my windows server, 
formatted, and added some data. Then I snapped the zvol, cloned the snap, and 
used 'pfexec sbdadm create-lu'. When presented to the windows server, it 
behaved as expected. I could see the data I created prior to the snapshot.

Thank you very much Dave (and everyone else).

Now,
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS 'secure erase'

2010-02-08 Thread Miles Nordin
 nw == Nicolas Williams nicolas.willi...@sun.com writes:
 ch == c hanover chano...@umich.edu writes:

Trying again:

ch In our particular case, there won't be
ch snapshots of destroyed filesystems (I create the snapshots,
ch and destroy them with the filesystem).

Right, but if your zpool is above a zvol vdev (ex COMSTAR on another
box), then someone might take a snapshot of the encrypted zvol.  Then
after you ``securely delete'' a filesystem by overwriting various
intermediate keys or whatever, they might roll back the zvol snapshot
to undelete.

Yes, you still need the passphrase to reach what they've undeleted,
but that's always true---what's ``secure delete'' supposed to mean
besides the ability to permanently remove one dataset but not others,
even from those who posess the passphrase?  Otherwise it would not be
a feature.  It would just be a suggestion: ``forget your passphrase.''

nw ZFS crypto over zvols and what not presents no additional
nw problems.

If you are counting on the ability to forget a key by overwriting the
block of vdev in which the key's stored, then doing it over zvol's is
an additional problem.

  but for SSD,

Even if you do not have snapshots, SSD's are CoW internally so they
have something like latent snapshots from an attacker's perspective.
That is the point of my zvol example, which you are losing in your
``zvol's are just like devices, that's `abstraction,' I don't have to
think about it.''

ex., if your data lifecycle includes the idea that the ZFS crypto user
will securely delete things from devices before sent back for warranty
repair or reallocated to another group, whether it's SAN LUN's or
SSD's or zvol's or anythnig that has a copy-on-write character, then
from now on there is no such thing as overwriting.  There is only
forgetting passphrases.  This is both the case for using crypto in the
first place (overwriting blocks is no longer useful.  Devices no
longer offer any command that can really erase them.), and also the
limitation of any ``Secure delete'' feature.

Example, my Chinese friend gives me a USB token and tells me the
passphrase.  It has 'blah' stored on it.  I create zfs filesystem
'blergh', write secret stuff to it, then ``securely delete'' it.  I
return the token to my friend without fear the contents of 'blergh'
could escape because you've promised I've ``securely'' deleted it.

He takes the token to its manufacturer, loads diagnostic firmware,
rolls back the USB key to an earlier state using its CoW wear-leveling
feature, and recovers the ``securely deleted'' dataset.

so in these cases ``secure delete'' is meaningless.  USB tokens are
common, and I don't know what is the use case of a ``secure delete''
feature rather than simply ``using passphrasese'', if not this one.
zfs crypto overall is not meaningless, but it depends on the
passphrase and is granular at whatever is protected by that
passphrase, no smaller, once CoW underneath.

If you have, (1) ability to change the passphrase whenver you like,
and (2) the passphrase can be not just a string a user types but it
can include a block of data read off a token, like LUKS, then with a
little bit of care you can have back secure erase over CoW backing
store.  It depends on your ability to securely destroy the old block
of key material on this token when you change the passphrase and be
sure no one's saved an old copy of it.  That's what I meant by
keystore outside the vdev structure.


Another scenario requiring something like secure delete which is
complicated by SSD's and zvol's underneath is to protect laptops
crossing borders.  You may wish to make known that you routinely
revoke owners' access to their laptop drives prevent customs agents
from trying to harrass/detain people into handing over their
passphrases.  You might do this by changing the passphrase before
travel then delivering the new passphrase to yourself over VPN once
you've passed customs.  Then, you can safely give the old passphrase,
which is all you know.  If the laptop contains an SSD then the old
passphrase is probably still useful to a customs agent who can extract
dirty blocks from beneath the SSD-fs, so you lose.

For the second scenario, the holy grail feature would be to have two
zpools on one vdev, encrypted with different keys.  zpool A will have
a 'balloon' dataset reserving the blocks used by zpool B.  zpool A
will have an encrypted ueberblock free of magic numbers and in a
nonstandard location, and will be the one with secret data on it.

zpool B will be normal and contain no holygrail features, should be
preloaded by you with an earlier snapshot of zpool A.  If you were to
start using zpool B it would quietly overwrite and corrupt parts of
zpool A.  so, the process would be:

zpool create B ssd
load with nonsecret but bootable stuff
zpool export B
zpool create -holygrailfeature -o tokenfile=/tmp/A-token A ssd
  automatically makes balloon dataset reserving used blocks of B
  possibly stores 

Re: [zfs-discuss] Pool import with failed ZIL device now possible ?

2010-02-08 Thread Miles Nordin
 ck == Christo Kutrovsky kutrov...@pythian.com writes:
 djm == Darren J Moffat darr...@opensolaris.org writes:
 kth == Kjetil Torgrim Homme kjeti...@linpro.no writes:

ck The never turn off the ZIL sounds scary, but if the only
ck consequences are 15 (even 45) seconds of data loss .. i am
ck willing to take this for my home environment.

   djm You have done a risk analysis and if you are happy that your
   djm NTFS filesystems could be corrupt on those ZFS ZVOLs if you
   djm lose data then you could consider turning off the ZIL.

yeah I wonder if this might have more to do with write coalescing and
reordering within the virtualizing package's userland, though?
Disabling ZIL-writing should still cause ZVOL's to recover to a
crash-consistent state: so long as the NTFS was stored on a single
zvol it should not become corrupt.  It just might be older than you
might like, right?  I'm not sure it's working as well as that, just
saying it's probably not disabling the ZIL that's causing whatever
problems people have with guest NTFS's, right?

also, you can always rollback the zvol to the latest snapshot and
uncorrupt the NTFS.  so this NEVER is probably too strong.

especially because ZFS recovers to txg's, the need for fsync() by
certain applications is actually less than it is on other filesystems
that lack that characteristic and need to use fsync() as a barrier.
seems silly not to exploit this.

 I mean, there is no guarantee writes will be executed in order,
 so in theory, one could corrupt it's NTFS file system.

   kth I think you have that guarantee, actually.

+1, at least from ZFS I think you have it.  It'll recover to a txg
commit which is a crash-consistent point-in-time snapshot w.r.t. to
when the writes were submitted to it.  so as long as they aren't
being reordered by something above ZFS...

   kth I think you need to reboot the client so that its RAM cache is
   kth cleared before any other writes are made.

yeah it needs to understand the filesystem was force-unmounted, and
the only way to tell it so is to yank the virtual cord.

   djm For what it's worth I personally run with the ZIL disabled on
   djm my home NAS system which is serving over NFS and CIFS to
   djm various clients, but I wouldn't recommend it to anyone.  The
   djm reason I say never to turn off the ZIL is because in most
   djm environments outside of home usage it just isn't worth the
   djm risk to do so (not even for a small business).

yeah ok but IMHO you are getting way too much up in other people's
business, assuming things about them, by saying this.  these dire
warnings of NEVER are probably what's led to this recurring myth that
disabling ZIL-writing can lead to pool corruption when it can't.


pgpI9mKkUHVuo.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Cores vs. Speed?

2010-02-08 Thread Miles Nordin
 enh == Edward Ned Harvey sola...@nedharvey.com writes:

   enh As for mac access via nfs, automounter, etc ... I found that
   enh the UID/GID / posix permission bits were a problem, and I
   enh found it was easier and more reliable for the macs to use SMB

I found it much less reliable, if by reliable you mean not losing
data.  There's a questionable GUI feature that throws up a
[Disconnect] window whenever a normal unix system would say 'not
responding still trying', but so long as you ignore this window
instead of pressing what seems to be the only button, the old Unix
feature of ``server can reboot without losing client writes'' seems to
still be there.  SMB, not so much.

There's also questions of case sensitivity, locking, being mounted at
boot time rather than login time, accomodating more than one user.
I've also heard SMB is far slower.

The Macs I've switched to automounted NFS are causing me less trouble.

If you are in a ``share almost everything'' situation, just add

 umask 000

to /etc/launchd.conf and reboot.


pgpQnaWJ6VGUM.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mounting a snapshot of an iSCSI volume using Windows

2010-02-08 Thread Scott Meilicke
I plan on filing a support request with Sun, and will try to post back with any 
results.

Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Install/boot OS from ZFS iscsi target

2010-02-08 Thread Amer Ather
Is it possible to install and boot MS windows 7 from zfs iscsi target? 
What about Linux or even Solaris? Do the installation DVDs of these OS. 
have sufficient drivers to install it on an iscsi target. Please share 
if there is a document available.



Thanks,
--
Amer Ather  
Senior Staff Engineer
Solaris Kernel
Global Services Delivery
amer.at...@sun.com  
408-276-9780 (x19780)   
 If you fail to prepare, prepare to fail
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS ZIL + L2ARC SSD Setup

2010-02-08 Thread Bob Friesenhahn

On Mon, 8 Feb 2010, Richard Elling wrote:


If there is insufficient controller bandwidth capacity, then the 
controller becomes the bottleneck.


We don't tend to see this for HDDs, but SSDs can crush a controller and
channel.


It is definitely seen with older PCI hardware.

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zpool list size

2010-02-08 Thread Lasse Osterild
Hi,

This may well have been covered before but I've not been able to find an answer 
to this particular question.

I've setup a raidz2 test env using files like this:

 # mkfile 1g t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 s1 s2
 # zpool create dataPool raidz2 /xvm/t1 /xvm/t2 /xvm/t3 /xvm/t4 /xvm/t5
 # zpool add dataPool raidz2 /xvm/t6 /xvm/t7 /xvm/t8 /xvm/t9 /xvm/t10
 # zpool add dataPool spare /xvm/s1 /xvm/s2

 # zpool status dataPool
  pool: dataPool
 state: ONLINE
 scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
dataPool  ONLINE   0 0 0
  raidz2-0ONLINE   0 0 0
/xvm/t1   ONLINE   0 0 0
/xvm/t2   ONLINE   0 0 0
/xvm/t3   ONLINE   0 0 0
/xvm/t4   ONLINE   0 0 0
/xvm/t5   ONLINE   0 0 0
  raidz2-1ONLINE   0 0 0
/xvm/t6   ONLINE   0 0 0
/xvm/t7   ONLINE   0 0 0
/xvm/t8   ONLINE   0 0 0
/xvm/t9   ONLINE   0 0 0
/xvm/t10  ONLINE   0 0 0
spares
  /xvm/s1 AVAIL   
  /xvm/s2 AVAIL   

All is good and it works, I then copied a few gigs of data onto the pool and 
checked with zpool list
r...@vmstor01:/# zpool list
NAME   SIZE  ALLOC   FREECAP  DEDUP  HEALTH  ALTROOT
dataPool  9.94G  4.89G  5.04G49%  1.00x  ONLINE  -

Now here's what I don't get, why does it say the poo sizel is 9.94G when it's 
made up of 2 x raidz2 consisting of 1G volumes, it should only be 6G which df 
-h also reports correctly.  For a RAIDZ2 pool I find the information, the fact 
that it's 9.94G and not 5.9G, completely useless and misleading, why is parity 
part of the calculation? Also ALLOC seems wrong, there's nothing in the pool 
except a full copy of /usr (just to fill up with test data), it does however 
correctly display that I've used about 50% of the pool.  This is a build 131 
machine btw.

r...@vmstor01:/# df -h /dataPool
FilesystemSize  Used Avail Use% Mounted on
dataPool  5.9G  3.0G  3.0G  51% /dataPool

Cheers,

 - Lasse

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool list size

2010-02-08 Thread Richard Elling
This is a FAQ, but the FAQ is not well maintained :-(
http://hub.opensolaris.org/bin/view/Community+Group+zfs/faq

On Feb 8, 2010, at 1:35 PM, Lasse Osterild wrote:
 Hi,
 
 This may well have been covered before but I've not been able to find an 
 answer to this particular question.
 
 I've setup a raidz2 test env using files like this:
 
 # mkfile 1g t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 s1 s2
 # zpool create dataPool raidz2 /xvm/t1 /xvm/t2 /xvm/t3 /xvm/t4 /xvm/t5
 # zpool add dataPool raidz2 /xvm/t6 /xvm/t7 /xvm/t8 /xvm/t9 /xvm/t10
 # zpool add dataPool spare /xvm/s1 /xvm/s2
 
 # zpool status dataPool
  pool: dataPool
 state: ONLINE
 scrub: none requested
 config:
 
   NAME  STATE READ WRITE CKSUM
   dataPool  ONLINE   0 0 0
 raidz2-0ONLINE   0 0 0
   /xvm/t1   ONLINE   0 0 0
   /xvm/t2   ONLINE   0 0 0
   /xvm/t3   ONLINE   0 0 0
   /xvm/t4   ONLINE   0 0 0
   /xvm/t5   ONLINE   0 0 0
 raidz2-1ONLINE   0 0 0
   /xvm/t6   ONLINE   0 0 0
   /xvm/t7   ONLINE   0 0 0
   /xvm/t8   ONLINE   0 0 0
   /xvm/t9   ONLINE   0 0 0
   /xvm/t10  ONLINE   0 0 0
   spares
 /xvm/s1 AVAIL   
 /xvm/s2 AVAIL   
 
 All is good and it works, I then copied a few gigs of data onto the pool and 
 checked with zpool list
 r...@vmstor01:/# zpool list
 NAME   SIZE  ALLOC   FREECAP  DEDUP  HEALTH  ALTROOT
 dataPool  9.94G  4.89G  5.04G49%  1.00x  ONLINE  -
 
 Now here's what I don't get, why does it say the poo sizel is 9.94G when it's 
 made up of 2 x raidz2 consisting of 1G volumes, it should only be 6G which df 
 -h also reports correctly.

No, zpool displays the available pool space. df -h displays something else 
entirely.
If you have 10 1GB vdevs, then the total available pool space is 10GB. From the 
zpool(1m) man page:
...
 size
 Total size of the storage pool.

 These space usage properties report  actual  physical  space
 available  to  the  storage  pool. The physical space can be
 different from the total amount of space that any  contained
 datasets  can  actually  use.  The amount of space used in a
 raidz configuration depends on the  characteristics  of  the
 data being written. In addition, ZFS reserves some space for
 internal accounting that  the  zfs(1M)  command  takes  into
 account,  but the zpool command does not. For non-full pools
 of a reasonable size, these effects should be invisible. For
 small  pools,  or  pools  that are close to being completely
 full, these discrepancies may become more noticeable.
...

 -- richard

  For a RAIDZ2 pool I find the information, the fact that it's 9.94G and not 
 5.9G, completely useless and misleading, why is parity part of the 
 calculation? Also ALLOC seems wrong, there's nothing in the pool except a 
 full copy of /usr (just to fill up with test data), it does however correctly 
 display that I've used about 50% of the pool.  This is a build 131 machine 
 btw.
 
 r...@vmstor01:/# df -h /dataPool
 FilesystemSize  Used Avail Use% Mounted on
 dataPool  5.9G  3.0G  3.0G  51% /dataPool
 
 Cheers,
 
 - Lasse
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] NFS access by OSX clients (was Cores vs. Speed?)

2010-02-08 Thread Edward Ned Harvey
 There's also questions of case sensitivity, locking, being mounted at
 boot time rather than login time, accomodating more than one user.
 I've also heard SMB is far slower.
 
 The Macs I've switched to automounted NFS are causing me less trouble.
 
 If you are in a ``share almost everything'' situation, just add
 
  umask 000
 
 to /etc/launchd.conf and reboot.

How are you managing UID's on the NFS server?  If user eharvey connects to
server from client Mac A, or Mac B, or Windows 1, or Windows 2, or any of
the linux machines ... the server has to know it's eharvey, and assign the
correct UID's etc.  When I did this in the past, I maintained a list of
users in AD, and duplicate list of users in OD, so the mac clients could
resolve names to UID's via OD.  And a third duplicate list in NIS so the
linux clients could resolve.  It was terrible.  You must be doing something
better?

How do you manage your NFS exports?  Do all the clients have static assigned
IP's, or do you simply export to the whole subnet, or do you do something
else?  I would consider it a security risk, if any schmo could take any
unused IP address, connect to the server, and claim to be eharvey without
any problem.

Also, I had a umask problem, which presumably you've got solved by the
launchd.conf edit.  Presumably this umask applies, whether you create a
folder in Finder, or create a file in MS Word, or save a new text file from
TextEdit ... The umask is applied to every file and every folder creation,
regardless of which app is doing the creation, right?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] L2ARC in Cluster is picked up althought not part of the pool

2010-02-08 Thread Daniel Carosone
On Mon, Feb 01, 2010 at 12:22:55PM -0800, Lutz Schumann wrote:
   Created a pool on head1 containing just the cache
  device (c0t0d0). 
  
  This is not possible, unless there is a bug. You
  cannot create a pool
  with only a cache device.  I have verified this on
  b131:
  # zpool create norealpool cache /dev/ramdisk/rc1
  1
  invalid vdev specification: at least one toplevel
  l vdev must be specified
  
  This is also consistent with the notion that cache
  devices are auxiliary
  devices and do not have pool configuration
  information in the label.
 
 Sorry for the confustion ...  a little misunderstanding. I created a Pool 
 who's only data disk is the disk formally used as cache device in the pool 
 that switched. Then I exported this pool mad eform just a single disk (data 
 disk). And switched back. The exported pool was picked up as cache device ... 
 this seems really problematic. 

This is exactly the scenario I was concerned about earlier in the
thread.  Thanks for confirming that it occurs.  Please verify that the
pool had autoreplace=off (just to avoid that distraction), and file a
bug.  

Cache devices should not automatically destroy disk contents based
solely on device path, especially where that device path came along
with a pool import.  Cache devices need labels to confirm their
identity. This is irrespective of whether the cache contents after the
label are persistent or volatile, ie should be fixed without waiting
for the CR about persistent l2arc.

--
Dan.

pgpjdt4tg1JNp.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool list size

2010-02-08 Thread Lasse Osterild
On 08/02/2010, at 22.50, Richard Elling wrote:

 
 r...@vmstor01:/# zpool list
 NAME   SIZE  ALLOC   FREECAP  DEDUP  HEALTH  ALTROOT
 dataPool  9.94G  4.89G  5.04G49%  1.00x  ONLINE  -
 
 Now here's what I don't get, why does it say the poo sizel is 9.94G when 
 it's made up of 2 x raidz2 consisting of 1G volumes, it should only be 6G 
 which df -h also reports correctly.
 
 No, zpool displays the available pool space. df -h displays something else 
 entirely.
 If you have 10 1GB vdevs, then the total available pool space is 10GB. From 
 the 
 zpool(1m) man page:
 ...
 size
 Total size of the storage pool.
 
 These space usage properties report  actual  physical  space
 available  to  the  storage  pool. The physical space can be
 different from the total amount of space that any  contained
 datasets  can  actually  use.  The amount of space used in a
 raidz configuration depends on the  characteristics  of  the
 data being written. In addition, ZFS reserves some space for
 internal accounting that  the  zfs(1M)  command  takes  into
 account,  but the zpool command does not. For non-full pools
 of a reasonable size, these effects should be invisible. For
 small  pools,  or  pools  that are close to being completely
 full, these discrepancies may become more noticeable.
 ...
 
 -- richard

Ok thanks I know that the amount of used space will vary, but what's the 
usefulness of the total size when ie in my pool above 4 x 1G (roughly, 
depending on recordsize) are reserved for parity, it's not like it's useable 
for anything else :)  I just don't see the point when it's a raidz or raidz2 
pool, but I guess I am missing something here.

Cheers,

 - Lasse 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS access by OSX clients

2010-02-08 Thread Miles Nordin
 enh == Edward Ned Harvey macenterpr...@nedharvey.com writes:

   enh How are you managing UID's on the NFS server?

All the macs are installed from the same image using asr.  And for the
most part, there's just one user, except where there isn't, and then I
manage uid's by hand.

   enh When I did this in the past, I maintained a list of users in
   enh AD, and duplicate list of users in OD, so the mac clients
   enh could resolve names to UID's via OD.  And a third duplicate
   enh list in NIS so the linux clients could resolve.  It was
   enh terrible.

Why is that terrible?  Is it impossible to automate because of the AD
piece?  OD/NIS should be dumpable from SQL easily, right?  If AD is
the unscriptable piece, it just seems kind of sad to throw the whole
thing out and standardize on the one piece that's the most convoluted
and brittle and least automatable, instead of the other way around.

   enh How do you manage your NFS exports?  [...] export to the whole
   enh subnet

yeah, that.  r...@1.2.3.0/24

there is a highly stupid bug that would crash mountd for NFSv4 or get
incorrect refusal for NFSv3 if the IP was not lookupable in reverse
DNS or /etc/hosts.  but it may be fixed now because someone from
nfs-discuss was unable to reproduce.

   enh I would consider it a security risk, if any schmo could take
   enh any unused IP address, connect to the server, and claim to be
   enh eharvey

yeah there is zero security, none at all.  I don't really think adding
exports restrictions at a finer granularity than subnet would help
much.  Only Kerberos would help.  

but most of the security we care about comes from taking snapshots:
that's the attack that's relevant here, disgruntled or confused
employees deleting everything.  This is a robust kind of security, not
MM model.  also every desktop has a read-only copy of yesterday's
shared filesystem, from another nfs server populated with rsync,
pre-mounted, in case of problems with the writeable one.

At least it is not crap security like SMB, with five or ten wildly
different variants and password formats operating on different ports
some with MAC session-binding some without.  I admit SMB has some
security rather than none, but it's a slow crashy clumsy caveat-laden
protocol.  You might also look at it this way: if there's going to be
a panic/DoS or exploitable buffer overflow security problem, it's far
more likely to be in the SMB stack than the NFS stack.

(that said, 'mknod file b 14 n' seems to panic a Solaris NFS
server, at least b71.)

   enh solved by the launchd.conf edit.  Presumably this umask
   enh applies, whether you create a folder in Finder, or create a
   enh file in MS Word, or save a new text file from TextEdit ... The
   enh umask is applied to every file and every folder creation,
   enh regardless of which app is doing the creation, right?

right.  This much works perfectly AFAICT.

I suppose if you have a user database and want private user folders,
you just make them owned by that user and chmod 700.  At least that
much works everywhere and survives backup, unlike this complete
disaster that is ACL's.

I get it, the NFSv3 featureset with no text usernames and no Kerberos
unchanged in two decades is not a reasonable answer to modern
expectations, and NIS is no longer the unifying directory service it
once was now that Mac is a credible client.  AD can go fuck itself:
buy a windows server and another sysadmin to manage it, or suffer the
polluting effect it has on your mind and your entire operation.  but,
yeah, NFSv3 is not enough.  It's zero-security simplicity turns out to
be exactly what we need here though, and the Mac client with
automounter 10.5 or later, is extremely solid, more than the other Mac
filesystems or GlobalSAN.


pgpeQgCjtTO0o.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Intrusion Detection - powered by ZFS Checksumming ?

2010-02-08 Thread Daniel Carosone
On Mon, Feb 08, 2010 at 11:24:56AM -0800, Lutz Schumann wrote:
  Only with the zdb(1M) tool but note that the
  checksums are NOT of files 
  but of the ZFS blocks.
 
 Thanks - bocks, right (doh) - thats what I was missing. Damn it would be so 
 nice :(

If you're comparing the current data to a snapshot baseline on the
same pool, it just means you need to compare more checksums (several
per file), it doesn't invalidate the idea.  

There may also be other ways of checking quickly that the file data is
unmodified since snapshot X, but again it will require looking at zfs
internals. This is far from the first use case for an official
interface to get at this kind of data.  It's quite similar to the
question of how to verify send|recv integrity from yesterday, for
example. As yet I don't know of a concrete proposal of what such an
interface should look like (since there's nothing to borrow from
POSIX), let alone an implementation.

It more complicated if you're comparing checksums against an external
baseline reference (such as from a build) because block sizes and
checksum algorithms may vary between pools.  However, as you note,
that's already catered for by existing tools, so they could work
together.

--
Dan.

pgp3k63UrsurG.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool list size

2010-02-08 Thread Daniel Carosone
On Mon, Feb 08, 2010 at 11:28:11PM +0100, Lasse Osterild wrote:
 Ok thanks I know that the amount of used space will vary, but what's
 the usefulness of the total size when ie in my pool above 4 x 1G
 (roughly, depending on recordsize) are reserved for parity, it's not
 like it's useable for anything else :)  I just don't see the point
 when it's a raidz or raidz2 pool, but I guess I am missing something
 here.  

The basis of raidz is that each block is its own raid stripe, with its
own layout.  At present, this only matters for the size of the stripe.
For example, if I write a single 512-byte block, to a dual-parity
raidz2, I will write three blocks, to three disks.  With a larger
block, I will have more data over more disks, until the block is big
enough to stripe evenly over all of them. As the block gets bigger
yet, more is written to each disk as part of the stripe, and the
parity units get bigger to match the size of the largest data unit.
This rounding can very often mean that different disks have
different amounts of data for each stripe.  

Crucially, it also means the ratio of parity-to-data is not fixed.
This tends to average out on a pool with lots of data and mixed 
block sizes, but not always; consider an extreme case of a pool
containing only datasets with blocksize=512. That's what the comments
in the documentation are referring to, and the major reason for the
zpool output you see.

In future, it may go further and be more important.

Just as the data count per stripe can vary, there's nothing
fundamental in the raidz layout that says that the same parity count
and method has to be used for the entire pool, either.  Raidz already
degrades to simple mirroring in some of the same small-stripe cases
discussed above.

There's no particular reason, in theory, why they could not also have
different amounts of parity on a per-block basis.  I imagine that when
bp-rewrite and the ability to reshape pools comes along, this will
indeed be the case, at least during transition.  As a simple example,
when reshaping a raidz1 to a raidz2 by adding a disk, there will be
blocks with single parity and other blocks with dual for a time until
the operation is finished. 

Maybe one day in the future, there will just be a basic raidz vdev
type, and we can set dataset properties for the number of additional
parity blocks each should get.  This might be a little like we can
currently set copies, including that it would only affect new writes
and lead to very mixed redundancy states.  

Noone has actually said this is a real goal, and the reasons it's not
presently allowed include administrative and operational simplicity as
well as implementation and testing constraints, but I think it would
be handy and cool.  

--
Dan.

pgpJKzcDxWcE8.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS ZIL + L2ARC SSD Setup

2010-02-08 Thread Felix Buenemann

Am 08.02.10 22:23, schrieb Bob Friesenhahn:

On Mon, 8 Feb 2010, Richard Elling wrote:



If there is insufficient controller bandwidth capacity, then the
controller becomes the bottleneck.


We don't tend to see this for HDDs, but SSDs can crush a controller and
channel.


It is definitely seen with older PCI hardware.


Well to make things short: Using JBOD + ZFS Striped Mirrors vs. 
controller's RAID10, dropped the max. sequential read I/O from over 400 
MByte/s to below 300 MByte/s. However random I/O and sequential writes 
seemed to perform equally well.
One thing however was mucbh better using ZFS mirrors: random seek 
performance was about 4 times higher, so I guess for random I/O on a 
busy system the JBOD would win.


The controller can deliver 800 MByte/s on cache hits and is connected 
with PCIe x8, so theoretically it should have enough PCI bandwidth. It's 
cpu is the older 500MHz IOP333, so it has less power than the newer 
IOP348 controllers with 1.2GHZ cpus.


Too bad I have no choice but using HW RAID, because the mainboard bios 
only supports 7 boot devices, so it can't boot from the right disk if 
the Areca is in JBOD and I found no way to disable the controllers BIOS.

Well maybe I could flash the EFI BIOS to work around this...
(I've done my tests by reconfiguring the controller at runtime.)



Bob


- Felix


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool list size

2010-02-08 Thread Cindy Swearingen

Hi Richard,

I last updated this FAQ on 1/19.

Which part is not well-maintained?

:-)

Cindy

On 02/08/10 14:50, Richard Elling wrote:

This is a FAQ, but the FAQ is not well maintained :-(
http://hub.opensolaris.org/bin/view/Community+Group+zfs/faq

On Feb 8, 2010, at 1:35 PM, Lasse Osterild wrote:

Hi,

This may well have been covered before but I've not been able to find an answer 
to this particular question.

I've setup a raidz2 test env using files like this:

# mkfile 1g t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 s1 s2
# zpool create dataPool raidz2 /xvm/t1 /xvm/t2 /xvm/t3 /xvm/t4 /xvm/t5
# zpool add dataPool raidz2 /xvm/t6 /xvm/t7 /xvm/t8 /xvm/t9 /xvm/t10
# zpool add dataPool spare /xvm/s1 /xvm/s2

# zpool status dataPool
 pool: dataPool
state: ONLINE
scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
dataPool  ONLINE   0 0 0
  raidz2-0ONLINE   0 0 0
/xvm/t1   ONLINE   0 0 0
/xvm/t2   ONLINE   0 0 0
/xvm/t3   ONLINE   0 0 0
/xvm/t4   ONLINE   0 0 0
/xvm/t5   ONLINE   0 0 0
  raidz2-1ONLINE   0 0 0
/xvm/t6   ONLINE   0 0 0
/xvm/t7   ONLINE   0 0 0
/xvm/t8   ONLINE   0 0 0
/xvm/t9   ONLINE   0 0 0
/xvm/t10  ONLINE   0 0 0
spares
	  /xvm/s1 AVAIL   
	  /xvm/s2 AVAIL   


All is good and it works, I then copied a few gigs of data onto the pool and 
checked with zpool list
r...@vmstor01:/# zpool list
NAME   SIZE  ALLOC   FREECAP  DEDUP  HEALTH  ALTROOT
dataPool  9.94G  4.89G  5.04G49%  1.00x  ONLINE  -

Now here's what I don't get, why does it say the poo sizel is 9.94G when it's 
made up of 2 x raidz2 consisting of 1G volumes, it should only be 6G which df 
-h also reports correctly.


No, zpool displays the available pool space. df -h displays something else 
entirely.
If you have 10 1GB vdevs, then the total available pool space is 10GB. From the 
zpool(1m) man page:

...
 size
 Total size of the storage pool.

 These space usage properties report  actual  physical  space
 available  to  the  storage  pool. The physical space can be
 different from the total amount of space that any  contained
 datasets  can  actually  use.  The amount of space used in a
 raidz configuration depends on the  characteristics  of  the
 data being written. In addition, ZFS reserves some space for
 internal accounting that  the  zfs(1M)  command  takes  into
 account,  but the zpool command does not. For non-full pools
 of a reasonable size, these effects should be invisible. For
 small  pools,  or  pools  that are close to being completely
 full, these discrepancies may become more noticeable.
...

 -- richard


 For a RAIDZ2 pool I find the information, the fact that it's 9.94G and not 
5.9G, completely useless and misleading, why is parity part of the calculation? 
Also ALLOC seems wrong, there's nothing in the pool except a full copy of /usr 
(just to fill up with test data), it does however correctly display that I've 
used about 50% of the pool.  This is a build 131 machine btw.

r...@vmstor01:/# df -h /dataPool
FilesystemSize  Used Avail Use% Mounted on
dataPool  5.9G  3.0G  3.0G  51% /dataPool

Cheers,

- Lasse

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool list size

2010-02-08 Thread Lasse Osterild

On 09/02/2010, at 00.23, Daniel Carosone wrote:

 On Mon, Feb 08, 2010 at 11:28:11PM +0100, Lasse Osterild wrote:
 Ok thanks I know that the amount of used space will vary, but what's
 the usefulness of the total size when ie in my pool above 4 x 1G
 (roughly, depending on recordsize) are reserved for parity, it's not
 like it's useable for anything else :)  I just don't see the point
 when it's a raidz or raidz2 pool, but I guess I am missing something
 here.  
 
 The basis of raidz is that each block is its own raid stripe, with its
 own layout.  At present, this only matters for the size of the stripe.
 For example, if I write a single 512-byte block, to a dual-parity
 raidz2, I will write three blocks, to three disks.  With a larger
 block, I will have more data over more disks, until the block is big
 enough to stripe evenly over all of them. As the block gets bigger
 yet, more is written to each disk as part of the stripe, and the
 parity units get bigger to match the size of the largest data unit.
 This rounding can very often mean that different disks have
 different amounts of data for each stripe.  
 
 Crucially, it also means the ratio of parity-to-data is not fixed.
 This tends to average out on a pool with lots of data and mixed 
 block sizes, but not always; consider an extreme case of a pool
 containing only datasets with blocksize=512. That's what the comments
 in the documentation are referring to, and the major reason for the
 zpool output you see.
 
 In future, it may go further and be more important.
 
 Just as the data count per stripe can vary, there's nothing
 fundamental in the raidz layout that says that the same parity count
 and method has to be used for the entire pool, either.  Raidz already
 degrades to simple mirroring in some of the same small-stripe cases
 discussed above.
 
 There's no particular reason, in theory, why they could not also have
 different amounts of parity on a per-block basis.  I imagine that when
 bp-rewrite and the ability to reshape pools comes along, this will
 indeed be the case, at least during transition.  As a simple example,
 when reshaping a raidz1 to a raidz2 by adding a disk, there will be
 blocks with single parity and other blocks with dual for a time until
 the operation is finished. 
 
 Maybe one day in the future, there will just be a basic raidz vdev
 type, and we can set dataset properties for the number of additional
 parity blocks each should get.  This might be a little like we can
 currently set copies, including that it would only affect new writes
 and lead to very mixed redundancy states.  
 
 Noone has actually said this is a real goal, and the reasons it's not
 presently allowed include administrative and operational simplicity as
 well as implementation and testing constraints, but I think it would
 be handy and cool.  
 
 --
 Dan.

Thanks Dan! :)

That explanation made perfect sense and I appreciate you taking the time to 
write this, perhaps parts of it could go into the FAQ ?  I realise that it's 
sort of in there already but it doesn't explain it very well.

Cheers,

 - Lasse
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool list size

2010-02-08 Thread Cindy Swearingen

Hi Lasse,

I expanded this entry to include more details of the zpool list and
zfs list reporting.

See if the new explanation provides enough details.

Thanks,

Cindy

On 02/08/10 16:51, Lasse Osterild wrote:

On 09/02/2010, at 00.23, Daniel Carosone wrote:


On Mon, Feb 08, 2010 at 11:28:11PM +0100, Lasse Osterild wrote:

Ok thanks I know that the amount of used space will vary, but what's
the usefulness of the total size when ie in my pool above 4 x 1G
(roughly, depending on recordsize) are reserved for parity, it's not
like it's useable for anything else :)  I just don't see the point
when it's a raidz or raidz2 pool, but I guess I am missing something
here.  

The basis of raidz is that each block is its own raid stripe, with its
own layout.  At present, this only matters for the size of the stripe.
For example, if I write a single 512-byte block, to a dual-parity
raidz2, I will write three blocks, to three disks.  With a larger
block, I will have more data over more disks, until the block is big
enough to stripe evenly over all of them. As the block gets bigger
yet, more is written to each disk as part of the stripe, and the
parity units get bigger to match the size of the largest data unit.
This rounding can very often mean that different disks have
different amounts of data for each stripe.  


Crucially, it also means the ratio of parity-to-data is not fixed.
This tends to average out on a pool with lots of data and mixed 
block sizes, but not always; consider an extreme case of a pool

containing only datasets with blocksize=512. That's what the comments
in the documentation are referring to, and the major reason for the
zpool output you see.

In future, it may go further and be more important.

Just as the data count per stripe can vary, there's nothing
fundamental in the raidz layout that says that the same parity count
and method has to be used for the entire pool, either.  Raidz already
degrades to simple mirroring in some of the same small-stripe cases
discussed above.

There's no particular reason, in theory, why they could not also have
different amounts of parity on a per-block basis.  I imagine that when
bp-rewrite and the ability to reshape pools comes along, this will
indeed be the case, at least during transition.  As a simple example,
when reshaping a raidz1 to a raidz2 by adding a disk, there will be
blocks with single parity and other blocks with dual for a time until
the operation is finished. 


Maybe one day in the future, there will just be a basic raidz vdev
type, and we can set dataset properties for the number of additional
parity blocks each should get.  This might be a little like we can
currently set copies, including that it would only affect new writes
and lead to very mixed redundancy states.  


Noone has actually said this is a real goal, and the reasons it's not
presently allowed include administrative and operational simplicity as
well as implementation and testing constraints, but I think it would
be handy and cool.  


--
Dan.


Thanks Dan! :)

That explanation made perfect sense and I appreciate you taking the time to 
write this, perhaps parts of it could go into the FAQ ?  I realise that it's 
sort of in there already but it doesn't explain it very well.

Cheers,

 - Lasse
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zpool/zfs history does not record version upgrade events

2010-02-08 Thread zfs ml
zpool/zfs history does not record version upgrade events, those seem like 
important events worth keeping in either the public or internal history.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives

2010-02-08 Thread Daniel Carosone

This is a long thread, with lots of interesting and valid observations
about the organisation of the industry,  the segmentation of the
market, getting what you pay for vs paying for what you want, etc.

I don't really find within, however, an answer to the original
question, at least the way I read it.  Perhaps that's the issue - that
the question was asked without enough specifics and context, and so
everyone has their own interpretation and their own answer to their
own question.

Remembering that a lot of this was branded and marketed as open
storage, the desire to mix and match components is not only natural,
a clear expectation has been set that it should be possible and easy
and open.

That's not to say that you can expect to have your cake and eat it
too.  Certain combinations and permutations are more qualified,
tested, supported and therefore expensive than others; these
characteristics are part of what you should be able to mix and
match, understanding the full implications of each tradeoff choice.

Snorcle wants to sell hardware.  Sure, they want even more to sell a
complete hardware and annual maintenance package with annuity revenue
over multiple years with high markups.  Some people are simply not
customers for all of that, but might still be customers for the
hardware. Especially these days, it seems they still would want to
sell the hardware even when they can't sell the rest of the package.

I read the following context between the lines of the original
question:
 - I have or can source disk drives I'm comfortable using.  
 - I understand that I'm not paying for, and can't expect, commercial
   support for whatever final combination I wind up with.
 - I am comfortable relying on standards and specifications for
   interoperability, enough that it's unlikely I'll have to get into
   deep debugging for problems. At least, I'm unwilling or unable to
   pay high premiums ahead of time in the hope of avoiding potential
   high costs for later problems.
 - The J4500 seems like nice hardware, and I know that at least it
   isn't likely to change unexpectedly to some different chipset not
   recognised by opensolaris, just before purchase.  This would give
   me some comfort. 
 - I like Sun, and am thankful for ZFS, and since I have to buy
   hardware anyway I'll look at what Sun offers. Perhaps I would even
   prefer to buy the Sun offering, all else being approximately
   equal.  This would also give me some comfort.

In that context, I haven't seen an answer, just a conclusion: 

 - All else is not equal, so I give my money to some other hardware
   manufacturer, and get frustrated that Sun won't let me buy the
   parts I could use effectively and comfortably.  

--
Dan.

pgpvZdqvo577p.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Dedup Questions.

2010-02-08 Thread Tom Hall
Hi,

I am loving the new dedup feature.


Few questions:
If you enable it after data is on the filesystem, it will find the
dupes on read as well as write? Would a scrub therefore make sure the
DDT is fully populated.

Re the DDT, can someone outline it's structure please? Some sort of
hash table? The blogs I have read so far dont specify.

Re DDT size, is (data in use)/(av blocksize) * 256bit right as a worst
case (ie all blocks non identical)
What are average block sizes?

Cheers,
Tom
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool list size

2010-02-08 Thread Daniel Carosone
On Mon, Feb 08, 2010 at 05:23:29PM -0700, Cindy Swearingen wrote:
 Hi Lasse,

 I expanded this entry to include more details of the zpool list and
 zfs list reporting.

 See if the new explanation provides enough details.

Cindy, feel free to crib from or refer to my text in whatever way might help. 

--
Dan.


pgp25J93QupLp.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS ZIL + L2ARC SSD Setup

2010-02-08 Thread Bob Friesenhahn

On Tue, 9 Feb 2010, Felix Buenemann wrote:


Well to make things short: Using JBOD + ZFS Striped Mirrors vs. controller's 
RAID10, dropped the max. sequential read I/O from over 400 MByte/s to below 
300 MByte/s. However random I/O and sequential writes seemed to perform


Much of the difference is likely that your controller implements true 
RAID10 wereas ZFS striped mirrors are actually load-shared mirrors. 
Since zfs does not use true striping across vdevs, it relies on 
sequential prefetch requests to get the sequential read rate up. 
Sometimes zfs's prefetch is not aggressive enough.


I have observed that there may still be considerably more read 
performance available (to another program/thread) even while a 
benchmark program is reading sequentially as fast as it can.


Try running two copies of your benchmark program at once and see what 
happens.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Anyone with experience with a PCI-X SSD card?

2010-02-08 Thread Erik Trimble

I've a couple of older systems that are front-ending a large backup array.

I'd like to put in a large L2ARC cache device for them to use with 
dedup.Right now, they only have Ultra320 SCA 3.5 hot-swap drive 
bays, and PCI-X slots.


I haven't found any SSDs (or adapters) which might work with the 
Ultra320 bays, so I'm hunting for something to stick in the PCI-X (NOT 
PCI-Express) slot.


Ideally, I'd love to find something that lets me hook a standard 2.5 
SSD to, but I have space limitations. About the best I've found right 
now is a 32-bit PCI card which has Compact Flash slots on it.  /Really/ 
not what I want.


So, I've seen a bunch of PCI-E cards which have flash on them and act as 
a SSD, but is there any hope for a old PCI-X slot? Anyone seen such a beast?


--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives

2010-02-08 Thread Kjetil Torgrim Homme
Daniel Carosone d...@geek.com.au writes:

 In that context, I haven't seen an answer, just a conclusion: 

  - All else is not equal, so I give my money to some other hardware
manufacturer, and get frustrated that Sun won't let me buy the
parts I could use effectively and comfortably.  

no one is selling disk brackets without disks.  not Dell, not EMC, not
NetApp, not IBM, not HP, not Fujitsu, ...

-- 
Kjetil T. Homme
Redpill Linpro AS - Changing the game

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives

2010-02-08 Thread Thomas Burgess
Just like i said way earlier,  The entire idea is like asking to buy a
Ferrari  without the aluminum wheels they sell because you think they are
charging too much for them, after all, aluminum is cheap.
It's just not done that way.  There are OTHER OPTIONS for people who can't
afford it.  You really can't have both.  You can either afford it or you
can't.

On Mon, Feb 8, 2010 at 8:36 PM, Kjetil Torgrim Homme kjeti...@linpro.nowrote:

 Daniel Carosone d...@geek.com.au writes:

  In that context, I haven't seen an answer, just a conclusion:
 
   - All else is not equal, so I give my money to some other hardware
 manufacturer, and get frustrated that Sun won't let me buy the
 parts I could use effectively and comfortably.

 no one is selling disk brackets without disks.  not Dell, not EMC, not
 NetApp, not IBM, not HP, not Fujitsu, ...

 --
 Kjetil T. Homme
 Redpill Linpro AS - Changing the game

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives

2010-02-08 Thread Tim Cook
On Monday, February 8, 2010, Kjetil Torgrim Homme kjeti...@linpro.no wrote:
 Daniel Carosone d...@geek.com.au writes:

 In that context, I haven't seen an answer, just a conclusion:

  - All else is not equal, so I give my money to some other hardware
    manufacturer, and get frustrated that Sun won't let me buy the
    parts I could use effectively and comfortably.

 no one is selling disk brackets without disks.  not Dell, not EMC, not
 NetApp, not IBM, not HP, not Fujitsu, ...

 --
 Kjetil T. Homme
 Redpill Linpro AS - Changing the game

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Although I am in full support of what sun is doing, to play devils
advocate: supermicro is.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives

2010-02-08 Thread Thomas Burgess
On Mon, Feb 8, 2010 at 9:13 PM, Tim Cook t...@cook.ms wrote:

 On Monday, February 8, 2010, Kjetil Torgrim Homme kjeti...@linpro.no
 wrote:
  Daniel Carosone d...@geek.com.au writes:
 
  In that context, I haven't seen an answer, just a conclusion:
 
   - All else is not equal, so I give my money to some other hardware
 manufacturer, and get frustrated that Sun won't let me buy the
 parts I could use effectively and comfortably.
 
  no one is selling disk brackets without disks.  not Dell, not EMC, not
  NetApp, not IBM, not HP, not Fujitsu, ...
 
  --
  Kjetil T. Homme
  Redpill Linpro AS - Changing the game
 
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 

 Although I am in full support of what sun is doing, to play devils
 advocate: supermicro is.



This is a far cry from an apples to apples comparison though.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives

2010-02-08 Thread Erik Trimble

Tim Cook wrote:

On Monday, February 8, 2010, Kjetil Torgrim Homme kjeti...@linpro.no wrote:
  

Daniel Carosone d...@geek.com.au writes:



In that context, I haven't seen an answer, just a conclusion:

 - All else is not equal, so I give my money to some other hardware
   manufacturer, and get frustrated that Sun won't let me buy the
   parts I could use effectively and comfortably.
  

no one is selling disk brackets without disks.  not Dell, not EMC, not
NetApp, not IBM, not HP, not Fujitsu, ..

Although I am in full support of what sun is doing, to play devils
advocate: supermicro is.
  
True, but they're not a systems vendor. They're a parts OEM.   You might 
be able to get larger integrated solutions from them 
(motherboard/chassis together), but you'll have to buy the rest of the 
parts yourself (or go to a system integrator to build a system for you).


No brand-name system provider allows you to purchase empty disk 
sleds.  About the best I can come up with on that is that eBay often has 
a selection of various brackets, usually from 3rd-parties which copy the 
Brand design.


In the end, you pay for support and integration testing. Whether it is 
worth it depends solely on your situation. But don't expect vendors to 
service all (or even many) niches - they all pick their battles, and if 
you're not in their zone, it's a huge uphill struggle to get them to add 
your zone.  It's that simple.



--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Intrusion Detection - powered by ZFS Checksumming ?

2010-02-08 Thread Damon Atkins
May be look at rsync and  rsync lib (http://librsync.sourceforge.net/) code to 
see if a ZFS API could be design to help rsync/librsync in the future as well 
as diff.

It might be a good idea for POSIX to have a single checksum and a 
multi-checksum interface.

One problem could be block sizes, if a file is re-written and is the same size 
it may have different ZFS record sizes within, if it was written over a long 
period of time (txg's)(ignoring compression), and therefore you could not use 
ZFS checksum to compare two files.

Side Note:
It would be nice if ZFS on every txg only wrote full record sizes unless it was 
short on memory, or a file was closed. Maybe the txg could happen more often if 
it just scanned for full recordsize's writes and closed files. Or block which 
had not be altered for three scan's.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] [OT] excess zfs-discuss mailman digests

2010-02-08 Thread grarpamp
Hi. As sometimes list-owner's aren't monitored...
I signed up for digests.
On the mailman page it hints at once daily service.
I'm getting maybe 12 per day, didn't count them.
Non-overlapping, various messages counts in each.
This is unexpected given the above hint.
Once a day would be nice :)
Thanks.

PS: Is there any way to get a copy of the list since inception
for local client perusal, not via some online web interface?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Big send/receive hangs on 2009.06

2010-02-08 Thread David Dyer-Bennet
Nobody has any ideas?  It's still hung after work.

I wonder what it will take to stop the backup and export the pool?  Well, 
that's nice; a straight kill terminated the processes, at least.

zpool status shows no errors. zfs list shows backup filesystems mounted.

zpool export -f is running...no disk I/O now...starting to look hung.

Ah, the zfs receive process is still in the process table.  kill -9 doesn't 
help.

Kill and kill -9 won't touch the zpool export process, either.

Pulling the USB cable on the drive doesn't seem to be helping any either.

zfs list now hangs, but giving it a little longer just in case.

Kill -9 doesn't touch any of the hung jobs.

Closing the ssh sessions doesn't touch any of them either.

zfs list on pools other than bup-wrack works. zpool list works, and shows 
bup-wrack.

Attempting to set failmode=continue gives an I/O error.

Plugging the USB back in and then setting failmode gives the same I/O error.

cfgadm -al lists known disk drives and usb3/9 as usb-storage connected. I 
think that's the USB disk that's stuck.

cfgadm -cremove usb3/9 failed configuration operation not supported.

cfgadm -cdisconnect usb3/9 queried if I wanted to suspend activity, then failed 
with cannot issue devctl to ap_id: /devices/p...@0,0/pci10de,c...@2,1:9

Still -al the same.

cfgadm -cunconfigure same error as disconnect.

I was able to list properties on bup-wrack:

bash-3.2$ zpool get all bup-wrack
NAME   PROPERTY   VALUE   SOURCE
bup-wrack  size   928G-
bup-wrack  used   438G-
bup-wrack  available  490G-
bup-wrack  capacity   47% -
bup-wrack  altroot/backups/bup-wrack  local
bup-wrack  health UNAVAIL -
bup-wrack  guid   2209605264342513453  default
bup-wrack  version14  default
bup-wrack  bootfs -   default
bup-wrack  delegation on  default
bup-wrack  autoreplaceoff default
bup-wrack  cachefile  nonelocal
bup-wrack  failmode   waitdefault
bup-wrack  listsnapshots  off default

It's not healthy, alright. And the attempt to set failmode really did fail.

I've been here before, and it has always required a reboot. 

Other than setting failmode=continue earlier, anybody have any ideas?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [OT] excess zfs-discuss mailman digests

2010-02-08 Thread Mike Gerdts
On Mon, Feb 8, 2010 at 9:04 PM, grarpamp grarp...@gmail.com wrote:
 PS: Is there any way to get a copy of the list since inception
 for local client perusal, not via some online web interface?

You can get monthly .gz archives in mbox format from
http://mail.opensolaris.org/pipermail/zfs-discuss/.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives

2010-02-08 Thread Daniel Carosone
  Although I am in full support of what sun is doing, to play devils
  advocate: supermicro is.

They're not the only ones, although the most-often discussed here.   

Dell will generally sell hardware and warranty and service add-ons in
any combination, to anyone willing and capable of figuring out what to
order, although that effort might well be more than the result is
worth.  Many of the others have issues in being further from the
retail market, such as support divisions that are only set up to deal
with large enterprise full-service customers. Nothing wrong with that
if it suits them.  

Of the others listed, Sun is the one promoting change and the benefits
of ZFS and open storage, and which has the opportunity to make sales
to an interested community.  They, too, are entitled to exclude
themselves from sales they don't want, for whatever reason they or
their new masters choose. 

On Mon, Feb 08, 2010 at 09:33:12PM -0500, Thomas Burgess wrote:
 This is a far cry from an apples to apples comparison though.

As much as I'm no fan of Apple, it's a pity they dropped ZFS because
that would have brought considerable attention to the opportunity of
marketing and offering zfs-suitable hardware to the consumer arena.
Port-multiplier boxes already seem to be targetted most at the Apple
crowd, even it's only in hope of scoring a better margin. 

Otherwise, bad analogies, whether about cars or fruit, don't help.

--
Dan.

pgpFh9oakiUNu.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Intrusion Detection - powered by ZFS Checksumming ?

2010-02-08 Thread Kjetil Torgrim Homme
Damon Atkins damon_atk...@yahoo.com.au writes:

 One problem could be block sizes, if a file is re-written and is the
 same size it may have different ZFS record sizes within, if it was
 written over a long period of time (txg's)(ignoring compression), and
 therefore you could not use ZFS checksum to compare two files.

the record size used for a file is chosen when that file is created.  it
can't change.  when the default record size for the dataset changes,
only new files will be affected.  ZFS *must* write a complete record
even if you change just one byte (unless it's the tail record of
course), since there isn't any better granularity for the block
pointers.

-- 
Kjetil T. Homme
Redpill Linpro AS - Changing the game

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives

2010-02-08 Thread Thomas Burgess
 On Mon, Feb 08, 2010 at 09:33:12PM -0500, Thomas Burgess wrote:
  This is a far cry from an apples to apples comparison though.

 As much as I'm no fan of Apple, it's a pity they dropped ZFS because
 that would have brought considerable attention to the opportunity of
 marketing and offering zfs-suitable hardware to the consumer arena.
 Port-multiplier boxes already seem to be targetted most at the Apple
 crowd, even it's only in hope of scoring a better margin.

 Otherwise, bad analogies, whether about cars or fruit, don't help.


It might help people to understand how ridiculous they sound going on and on
about buying a premium storage appliance without any storage.  I think the
car analogy was dead on.  You don't have to agree with a vendors practices
to understand them.  If you have a more fitting analogy, then by all means
lets hear it.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [OT] excess zfs-discuss mailman digests

2010-02-08 Thread Kjetil Torgrim Homme
grarpamp grarp...@gmail.com writes:

 PS: Is there any way to get a copy of the list since inception for
 local client perusal, not via some online web interface?

I prefer to read mailing lists using a newsreader and the NNTP interface
at Gmane.  a newsreader tends to be better at threading etc. than a mail
client which is fed an mbox...  see http://gmane.org/about.php for more
information.

-- 
Kjetil T. Homme
Redpill Linpro AS - Changing the game

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Anyone with experience with a PCI-X SSD card?

2010-02-08 Thread Thomas Burgess
On Mon, Feb 8, 2010 at 10:33 PM, Erik Trimble erik.trim...@sun.com wrote:

 Erik Trimble wrote:

 I've a couple of older systems that are front-ending a large backup array.

 I'd like to put in a large L2ARC cache device for them to use with dedup.
Right now, they only have Ultra320 SCA 3.5 hot-swap drive bays, and
 PCI-X slots.

 I haven't found any SSDs (or adapters) which might work with the Ultra320
 bays, so I'm hunting for something to stick in the PCI-X (NOT PCI-Express)
 slot.

 Ideally, I'd love to find something that lets me hook a standard 2.5 SSD
 to, but I have space limitations. About the best I've found right now is a
 32-bit PCI card which has Compact Flash slots on it.  /Really/ not what I
 want.

 So, I've seen a bunch of PCI-E cards which have flash on them and act as a
 SSD, but is there any hope for a old PCI-X slot? Anyone seen such a beast?

  To reply to myself, the best I can do is this:

   http://www.apricorn.com/product_detail.php?type=familyid=59

 (it uses a sil3124 controller, so it /might/ work with OpenSolaris )


 and an award for the That-is-ALMOST-What-I-Want goes to:

   http://www.sonnettech.com/PRODUCT/tempohd.html





The first one is really cool.  I was wondering this as well btw.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Anyone with experience with a PCI-X SSD card?

2010-02-08 Thread Daniel Carosone
On Mon, Feb 08, 2010 at 07:33:56PM -0800, Erik Trimble wrote:
 To reply to myself, the best I can do is this:

http://www.apricorn.com/product_detail.php?type=familyid=59

 (it uses a sil3124 controller, so it /might/ work with OpenSolaris )

Nice.  I'd certainly like to know if you try it and have success.
Note that the pci-x version also has a pci-e to pci-x bridge (Tsi384)
that would need to work.  I expect ppb's are handled generically by
the framework and spec.

I didn't find anything to indicate either way whether there was
bootable bios on board; again this might be a potential hurdle with
the ppb if you intend to boot from it.

For me, I'd be looking at the pci-e version, and as you note there are
other options for pure ssd.  This seems the most modular (choosing my
own brand/type/mix of ssd and hdd, for example).

--
Dan.

pgpqVyu2yI9G7.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Anyone with experience with a PCI-X SSD card?

2010-02-08 Thread Daniel Carosone
On Tue, Feb 09, 2010 at 03:11:38PM +1100, Daniel Carosone wrote:
 I didn't find anything to indicate either way whether there was
 bootable bios on board

Ah - in the install guide there's a mention about pressing F4 or
Ctrl-S when prompted at boot to configure the raid format, so
there's evidently is some bios. 

 again this might be a potential hurdle with
 the ppb if you intend to boot from it.



pgp69OTd5RWBA.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Intrusion Detection - powered by ZFS Checksumming ?

2010-02-08 Thread Damon Atkins
I would have thought that if I write 1k then ZFS txg times out in 30secs, then 
the 1k will be written to disk in a 1k record block, and then if I write 4k 
then 30secs latter txg happen another 4k record size block will be written, and 
then if I write 130k a 128k and 2k record block will be written.

Making the file have record sizes of
1k+4k+128k+2k
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dedup Questions.

2010-02-08 Thread Richard Elling
On Feb 8, 2010, at 6:04 PM, Kjetil Torgrim Homme wrote:

 Tom Hall thattommyh...@gmail.com writes:
 
 If you enable it after data is on the filesystem, it will find the
 dupes on read as well as write? Would a scrub therefore make sure the
 DDT is fully populated.
 
 no.  only written data is added to the DDT, so you need to copy the data
 somehow.  zfs send/recv is the most convenient, but you could even do a
 loop of commands like
 
  cp -p $file $file.tmp  mv $file.tmp $file
 
 Re the DDT, can someone outline it's structure please? Some sort of
 hash table? The blogs I have read so far dont specify.
 
 I can't help here.

UTSL

 Re DDT size, is (data in use)/(av blocksize) * 256bit right as a worst
 case (ie all blocks non identical)
 
 the size of an entry is much larger:
 
 | From: Mertol Ozyoney mertol.ozyo...@sun.com
 | Subject: Re: Dedup memory overhead
 | Message-ID: 00cb01caa580$a3d6f110$eb84d330$%ozyo...@sun.com
 | Date: Thu, 04 Feb 2010 11:58:44 +0200
 | 
 | Approximately it's 150 bytes per individual block.
 
 What are average block sizes?
 
 as a start, look at your own data.  divide the used size in df with
 used inodes in df -i.  example from my home directory:
 
  $ /usr/gnu/bin/df -i ~
  FilesystemInodes IUsed IFree  IUse%Mounted on
  tank/home  223349423   3412777 219936646 2%/volumes/home
 
  $ df -k ~
  Filesystemkbytes  used avail capacity  Mounted on
  tank/home  573898752 257644703 10996825471%/volumes/home
 
 so the average file size is 75 KiB, smaller than the recordsize of 128
 KiB.  extrapolating to a full filesystem, we'd get 4.9M files.
 unfortunately, it's more complicated than that, since a file can consist
 of many records even if the *average* is smaller than a single record.
 
 a pessimistic estimate, then, is one record for each of those 4.9M
 files, plus one record for each 128 KiB of diskspace (2.8M), for a total
 of 7.7M records.  the size of the DDT for this (quite small!) filesystem
 would be something like 1.2 GB.  perhaps a reasonable rule of thumb is 1
 GB DDT per TB of storage.

zdb -D poolname will provide details on the DDT size.  FWIW, I have a
pool with 52M DDT entries and the DDT is around 26GB.

$ pfexec zdb -D tank
   
DDT-sha256-zap-duplicate: 19725 entries, size 270 on disk, 153 in core
DDT-sha256-zap-unique: 52284055 entries, size 284 on disk, 159 in core

dedup = 1.00, compress = 1.00, copies = 1.00, dedup * compress / copies 
= 1.00

(you can tell by the stats that I'm not expecting much dedup :-)
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Intrusion Detection - powered by ZFS Checksumming ?

2010-02-08 Thread Richard Elling
On Feb 8, 2010, at 9:10 PM, Damon Atkins wrote:

 I would have thought that if I write 1k then ZFS txg times out in 30secs, 
 then the 1k will be written to disk in a 1k record block, and then if I write 
 4k then 30secs latter txg happen another 4k record size block will be 
 written, and then if I write 130k a 128k and 2k record block will be written.
 
 Making the file have record sizes of
 1k+4k+128k+2k

Close. Once the max record size is achieved, it is not reduced.  So the
allocation is:
1KB + 4KB + 128KB + 128KB

Physical writes tend to be coalesced, which is one reason why a transactional
system performs well.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss