Re: [zfs-discuss] Trying to understand zfs RAID-Z

2007-05-18 Thread Victor Latushkin


If I understand correctly, then the parity block for RAID-Z are also 
written in two different atomic operations. As per RAID-5. (the only 
difference being each can be of a different stripe size).
  


HL As with Raid-5 on  a four disk stripe, there are four independant
HL writes, and they don't need to be atomic, as Copy-on-Write implies
HL that the new blocks are written elsewhere on disk, while maintaining
HL the original data.  Only after all four writes return and are flushed
HL to disk can you proceed and update the metadata.

And to clear things - meta data are updated also in a spirit of COW -
so metadata are written to new locations and then uber block is
atomically updated pointing to new meta data
Well, to add to this, uber-blocks are also updated in COW fashion - 
there is a circular array of 128 uber-blocks, and new uber-block is 
written to the next to current slot.



victor
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Reading a ZFS Snapshot

2007-05-18 Thread Darren J Moffat

mnh wrote:

Hi,

I was wondering if there is any way to read a ZFS snapshot using 
system/zfs lib (ie refer to it as a block device).
I dug through the libzfs source but could not find anything that could 
enable me to 'read' the contents of a

snapshot/filesystem.


Why ?  What problem are you trying to solve ?

Given that you can't read the filesystem as a block device in the first 
place why would it make sense to do so for a snapshot of filesystem ?


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Trying to understand zfs RAID-Z

2007-05-18 Thread Henk Langeveld

HL And to clear things - meta data are updated also in a spirit of COW -
HL so metadata are written to new locations and then uber block is
HL atomically updated pointing to new meta data

Victor Latushkin wrote:
Well, to add to this, uber-blocks are also updated in COW fashion - 
there is a circular array of 128 uber-blocks, and new uber-block is 
written to the next to current slot.


Correct, I left it out because there's more detail involved with the uberblock.

We can deal with it when we get there.

Cheers,
Henk
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 3320 JBOD setup

2007-05-18 Thread Dale Sears

See inline near then end...

Tomas Ögren wrote:

On 14 May, 2007 - Dale Sears sent me these 0,9K bytes:


I was wondering if this was a good setup for a 3320 single-bus,
single-host attached JBOD.  There are 12 146G disks in this array:

I used:

zpool create pool1 \
raidz2 c2t0d0 c2t1d0 c2t2d0 c2t3d0 c2t4d0 c2t5d0 c2t6d0 c2t8d0 c2t9d0 
c2t10 \

spare c2t11d0 c2t12d0

(or something very similar)

This yields a 1TB file system with dual parity and two spare disks.


So first any two disks can fail at the same time.. then after
rebuilding, two more disks can fail.. until you've replaced a disk..


The cu is happy, but I wonder if there are any other suggestions for making
this array faster or more reliable or just better in your opinion.  I 
know that better has different meanings under different application

conditions, so I'm just looking for folks to recommend a setup and
perhaps explain why they would do it that way.


That raid set will give you the same random IO performance as a single
disk. Sequential IO will be better than a single disk.

For instance splitting it into two raidz2 disks without spares can
survive any two disks within both groups (so 2 to 4 disks can fail
without data loss).. Random IO performance will be twice the single
raidz2/single disk.


What would that command look like?   Is this what you're saying?:

 zpool create pool1 \
 raidz2 c2t0d0 c2t1d0 c2t2d0 c2t3d0  c2t4d0  c2t5d0  \
 raidz2 c2t6d0 c2t8d0 c2t9d0 c2t10d0 c2t11d0 c2t12d0

Thanks!

Dale


/Tomas

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Reading a ZFS Snapshot

2007-05-18 Thread mnh

Darren J Moffat wrote:


mnh wrote:


Hi,

I was wondering if there is any way to read a ZFS snapshot using 
system/zfs lib (ie refer to it as a block device).
I dug through the libzfs source but could not find anything that 
could enable me to 'read' the contents of a

snapshot/filesystem.



Why ?  What problem are you trying to solve ?


We are trying to implement a third-party Backup/Restore System for zfs 
(including bare metal recovery). Essentially requires

the snapshot to be read and stored in a proprietary format.



Given that you can't read the filesystem as a block device in the 
first place why would it make sense to do so for a snapshot of 
filesystem ?


I know it doesn't make much sense, I was just hoping that zfs's 
snapshots could be used by a different product/vendor.


Thanks,
mnh

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Reading a ZFS Snapshot

2007-05-18 Thread Darren J Moffat

mnh wrote:

Darren J Moffat wrote:


mnh wrote:


Hi,

I was wondering if there is any way to read a ZFS snapshot using 
system/zfs lib (ie refer to it as a block device).
I dug through the libzfs source but could not find anything that 
could enable me to 'read' the contents of a

snapshot/filesystem.



Why ?  What problem are you trying to solve ?


We are trying to implement a third-party Backup/Restore System for zfs 
(including bare metal recovery). Essentially requires

the snapshot to be read and stored in a proprietary format.


Is there a reason why you can't just walk through the snapshot using 
POSIX APIs ?  The snapshot is mounted in 
rootofdataset/.zfs/snapshot/nameofsnapshot


Or maybe zfs send/recv is what you need.



Given that you can't read the filesystem as a block device in the 
first place why would it make sense to do so for a snapshot of 
filesystem ?


I know it doesn't make much sense, I was just hoping that zfs's 
snapshots could be used by a different product/vendor.


Sure they can but I'm not sure you are approaching the problem from a 
view that ZFS can give you on the data.


It might help if you described how this backup software works for other 
filesystems, eg UFS.


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Reading a ZFS Snapshot

2007-05-18 Thread mnh

Darren J Moffat wrote:

  Is there a reason why you can't just walk through the snapshot using 
POSIX APIs ?  The snapshot is mounted in 
rootofdataset/.zfs/snapshot/nameofsnapshot


We cannot walk through the mounted snapshot as it's not just the data 
that we are concerned about. We need to read the complete snapshot 
(data+metadata).




Or maybe zfs send/recv is what you need.


I looked at zfs send/recv and was pleasantly surprised by its 
capabilities. Unfortunately it does not fit into our backup agent's 
design (NDMP based). The agent
reads directly of a snapshot - using zfs send would require additional 
space to create the backup file. (and we cannot do a zfs send to the 
destination, even

though it would have been nice :) ).





Given that you can't read the filesystem as a block device in the 
first place why would it make sense to do so for a snapshot of 
filesystem ?


I know it doesn't make much sense, I was just hoping that zfs's 
snapshots could be used by a different product/vendor.



Sure they can but I'm not sure you are approaching the problem from a 
view that ZFS can give you on the data.


It might help if you described how this backup software works for 
other filesystems, eg UFS.


For UFS (as with other filesystems eg. NTFS) - we use the filesystem's 
local snapshot mechanism (fssnap/vss). In all cases you can refer to the 
snapshot as a block device.


Thanks,
mnh

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 3320 JBOD setup

2007-05-18 Thread Tomas Ögren
On 18 May, 2007 - Dale Sears sent me these 1,5K bytes:

 Tomas Ögren wrote:
 On 14 May, 2007 - Dale Sears sent me these 0,9K bytes:
 
 I was wondering if this was a good setup for a 3320 single-bus,
 single-host attached JBOD.  There are 12 146G disks in this array:
 
 I used:
 
 zpool create pool1 \
 raidz2 c2t0d0 c2t1d0 c2t2d0 c2t3d0 c2t4d0 c2t5d0 c2t6d0 c2t8d0 c2t9d0 
 c2t10 \
 spare c2t11d0 c2t12d0
[..]
 That raid set will give you the same random IO performance as a single
 disk. Sequential IO will be better than a single disk.
 
 For instance splitting it into two raidz2 disks without spares can
 survive any two disks within both groups (so 2 to 4 disks can fail
 without data loss).. Random IO performance will be twice the single
 raidz2/single disk.
 
 What would that command look like?   Is this what you're saying?:
 
  zpool create pool1 \
  raidz2 c2t0d0 c2t1d0 c2t2d0 c2t3d0  c2t4d0  c2t5d0  \
  raidz2 c2t6d0 c2t8d0 c2t9d0 c2t10d0 c2t11d0 c2t12d0
 
 Thanks!

Yep. Verify performance differences in your usage case between the two
methods..

Its reliability against failures is a bit more of a gamble than a big
one with 2HS.. If you're lucky, 4 disks can blow up at the same time
without problems (vs 2 in your version).. If you're unlucky, 2 disks
from the same set blows up and then another one before you had the
chance to replace them with cold spare(s).. If first 2 then another one
during a weekend or so.. A hot spare could have saved you then..

If you have a cold spare laying around and replacing as soon as one
break, this shouldn't be a problem.. but it can make a difference, it's
up to you to decide (or attach a single additional hotspare outside the
3320).

/Tomas
-- 
Tomas Ögren, [EMAIL PROTECTED], http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Reading a ZFS Snapshot

2007-05-18 Thread William D. Hathaway
I think it would be handy if a utility could read a full  zfs snapshot and 
restore subsets of files or directories like using something like tar -xf or 
ufsrestore -i.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] DBMS on zpool

2007-05-18 Thread homerun
Hi

Just playing around with zfs , trying to locate DBMS data files to zpool.
DBMS i mean here are oracle and informix.
currently noticed that read operations perfomance is excelent but all write 
operations are not and also write operations performance variates a lot.
My quess for not so good write performance and write performance variation is 
double buffering , DBMS buffers and zfs caching. together.
Have anyone seen or tested best practices how should DBMS setup be 
implemented using zpool ; zfs or zvol.

Thanks
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Trying to understand zfs RAID-Z

2007-05-18 Thread David Bustos
Quoth Steven Sim on Thu, May 17, 2007 at 09:55:37AM +0800:
Gurus;
I am exceedingly impressed by the ZFS although it is my humble opinion
that Sun is not doing enough evangelizing for it.

What else do you think we should be doing?


David
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Reading a ZFS Snapshot

2007-05-18 Thread Chris Gerhard
I'm not sure what you want that the file system does not already provide.

you can use cp to copy files out, or find(1) to find them based on time or any 
other attribute and then cpio to copy them out.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


RE: [zfs-discuss] DBMS on zpool

2007-05-18 Thread Ellis, Mike

This is probably a good place to start.

http://blogs.sun.com/realneel/entry/zfs_and_databases

Please post back to the group with your results, I'm sure many of us are
interested.

Thanks,

 -- MikeE

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of homerun
Sent: Friday, May 18, 2007 8:42 AM
To: zfs-discuss@opensolaris.org
Subject: [zfs-discuss] DBMS on zpool


Hi

Just playing around with zfs , trying to locate DBMS data files to
zpool.
DBMS i mean here are oracle and informix.
currently noticed that read operations perfomance is excelent but all
write operations are not and also write operations performance variates
a lot.
My quess for not so good write performance and write performance
variation is double buffering , DBMS buffers and zfs caching. together.
Have anyone seen or tested best practices how should DBMS setup be
implemented using zpool ; zfs or zvol.

Thanks
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Reading a ZFS Snapshot

2007-05-18 Thread William D. Hathaway
An example would be if you had a raw snapshot on tape.  A single file or subset 
of files could be restored from it without needing the space to load the full 
snapshot into a zpool.  This would be handy if you have a zpool with 500GB of 
space and 300GB used.  If you had a snapshot that was 250GB and wanted to load 
it back up to restore a file, you wouldn't have sufficient space.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] AVS replication vs ZFS send recieve for odd sized volume pairs

2007-05-18 Thread a habman

Hello all,  I am interested in setting up an HA NFS server with zfs as
the storage filesystem on Solaris 10 + Sun Cluster 3.2. This is an HPC
environment with a 70 node cluster attached. File sizes are 1-200meg
or so, with an average around 10meg.

I have two servers, and due to changing specs through time I have
ended up with heterogeneus storage.  They are physically close to each
other, so no offsite replication needs.

Server A has an areca 12 port raid card attached to 12x400 gig drives.
Server B has an onboard raid with 6 available slots which I plan on
populating with either 750 gig or 1tb drives.

With AVS 4.0 (which I have running on a test volume pair) I am able to
mirror the zpools at the block level, but I am forced to have an equal
number of LUNs for it to work on( AVS mirrors block devices that zfs
works on top of).  If I carve up each raid set into 4 volumes, AVS
those(plus bitmap volumes) and then ZFS stripe over that,
theoretically I am in business, although this has a couple of
downsides.

If I want to maximize my performance first, while keeping a margin of
safety in this replicated environment, how can I best use my storage?

Option one:

  AVS + Hardware raid 5 on each side.  Make 4 LUNs and zfs stripe on
top.  Hardware raid takes care of drive failure. AVS ensures that the
whole storage pool is replicated at all times to Server B. This method
does not take advantage of disk caching zfs can do, nor additional
performance scheduling zfs would like to manage at the drive level.
Also unknown is how the SC3.2 HA ZFS module will work on an AVS zfs
filesystem as I believe it was designed for a fiberchannel shared set
of disks. On the plus side with this method we have block level
replication, so close to instantaneous sync between filesystems.

Option two:
  Full zfs pools on both side using zfs send+zfs recieve for the
replication.  This has benifits because my pools can be different
sized and grow and thats ok. Could also be mounted on server B as well
(most of the time).  Downside is I have to hack a zfs send + recieve
script+cron job, which is not likely as bombproof as the tried and
tested AVS?

So... basically, how are you all doing replication between two
different disk topologies using zfs?

I am a solaris newbie, attracted by the smell of the zfs, and so
pardon my lack of in depth knowledge into these issues.

Thank you in advance.

Ahab
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Reading a ZFS Snapshot

2007-05-18 Thread Toby Thain


On 18-May-07, at 1:57 PM, William D. Hathaway wrote:


An example would be if you had a raw snapshot on tape.


Unless I misunderstand ZFS, you can archive the contents of a  
snapshot, but there's no concept of a 'raw snapshot' divorced from a  
filesystem.


A single file or subset of files could be restored from it without  
needing the space to load the full snapshot into a zpool.  This  
would be handy if you have a zpool with 500GB of space and 300GB  
used.  If you had a snapshot that was 250GB and wanted to load it  
back up to restore a file, you wouldn't have sufficient space.



This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Trying to understand zfs RAID-Z

2007-05-18 Thread Ian Collins
David Bustos wrote:
 Quoth Steven Sim on Thu, May 17, 2007 at 09:55:37AM +0800:
   
Gurus;
I am exceedingly impressed by the ZFS although it is my humble opinion
that Sun is not doing enough evangelizing for it.
 

 What else do you think we should be doing?

   
Send Thumpers to every respectable journal for a review!  That's
probably a problem for marketing, how to target the the publications the
people with the check books read to broaden the awareness of ZFS. 

Just about every x86 server manufacturer provides and promotes the
features of hardware RAID solutions, maybe Sun should make more of the
cost savings in storage ZFS offers to gain a cost advantage over the
competition, or even save $ on HP servers by running Solaris an
removing the RAID. 

How about some JBOD only storage products?  Or at least make hardware
RAID a add on an option, to cater for a broader market.

Trying to break (especially windows) administrators and CIOs out of the
hardware RAID is best or even hardware RAID is essential mindset is
a tough ask.  As hardware RAID drops in price and moves into consumer
grade products, ZFS will loose the cost advantage (just try and get a
JBOD only SATA card, I only know of one).

Ian
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Making 'zfs destroy' safer

2007-05-18 Thread Peter Schuller
Hello,

with the advent of clones and snapshots, one will of course start
creating them. Which also means destroying them.

Am I the only one who is *extremely* nervous about doing zfs destroy
some/[EMAIL PROTECTED]?

This goes bot manually and automatically in a script. I am very paranoid
about this; especially because the @ sign might conceivably be
incorrectly interpreted by some layer of scripting, being a
non-alphanumeric character and highly atypical for filenames/paths.

What about having dedicated commands destroysnapshot, destroyclone,
or remove (less dangerous variant of destroy) that will never do
anything but remove snapshots or clones? Alternatively having something
along the lines of zfs destroy --nofs or zfs destroy --safe.

I realize this is borderline being in the same territory as special
casing rm -rf / and similar, which is generally not considered a good
idea.

But somehow the snapshot situation feels a lot more risky.

-- 
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller [EMAIL PROTECTED]'
Key retrieval: Send an E-Mail to [EMAIL PROTECTED]
E-Mail: [EMAIL PROTECTED] Web: http://www.scode.org




signature.asc
Description: OpenPGP digital signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Making 'zfs destroy' safer

2007-05-18 Thread Peter Schuller
 What about having dedicated commands destroysnapshot, destroyclone,
 or remove (less dangerous variant of destroy) that will never do
 anything but remove snapshots or clones? Alternatively having something
 along the lines of zfs destroy --nofs or zfs destroy --safe.

Another option is to allow something along the lines of:

zfs destroy snapshot:/path/to/[EMAIL PROTECTED]

Where the use of snapshot: would guarantee that non-snapshots are not
affected.

-- 
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller [EMAIL PROTECTED]'
Key retrieval: Send an E-Mail to [EMAIL PROTECTED]
E-Mail: [EMAIL PROTECTED] Web: http://www.scode.org




signature.asc
Description: OpenPGP digital signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Trying to understand zfs RAID-Z

2007-05-18 Thread Toby Thain


On 18-May-07, at 4:39 PM, Ian Collins wrote:


David Bustos wrote:
... maybe Sun should make more of the
cost savings in storage ZFS offers to gain a cost advantage over the
competition,


Cheaper AND more robust+featureful is hard to beat.

--T

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] DBMS on zpool

2007-05-18 Thread Richard Elling

homerun wrote:

Hi

Just playing around with zfs , trying to locate DBMS data files to zpool.
DBMS i mean here are oracle and informix.
currently noticed that read operations perfomance is excelent but all write 
operations are not and also write operations performance variates a lot.
My quess for not so good write performance and write performance variation is 
double buffering , DBMS buffers and zfs caching. together.
Have anyone seen or tested best practices how should DBMS setup be 
implemented using zpool ; zfs or zvol.


Neel has spent some time on this topic.  I'd start with his blog.
http://blogs.sun.com/realneel

Additional blogs to check are Roch's and Bob Sneed, for discussions
on caching and direct I/O.
http://blogs.sun.com/roch
http://blogs.sun.com/bobs

We've been trying to collect the wisdom onto one site, but it is
getting a little crowded and therefore tends to be terse.
The blogs explain the concepts in more detail, and more conversationally.
http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide

 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Lots of overhead with ZFS - what am I doing wrong?

2007-05-18 Thread Richard Elling

queuing theory should explain this rather nicely.  iostat measures
%busy by counting if there is an entry in the queue for the clock
ticks.  There are two queues, one in the controller and one on the
disk.  As you can clearly see the way ZFS pushes the load is very
different than dd or UFS.
 -- richard

Marko Milisavljevic wrote:
I am very grateful to everyone who took the time to run a few tests to 
help me figure what is going on. As per j's suggestions, I tried some 
simultaneous reads, and a few other things, and I am getting interesting 
and confusing results.


All tests are done using two Seagate 320G drives on sil3114. In each 
test I am using dd if= of=/dev/null bs=128k count=1. Each drive 
is freshly formatted with one 2G file copied to it. That way dd from raw 
disk and from file are using roughly same area of disk. I tried using 
raw, zfs and ufs, single drives and two simultaneously (just executing 
dd commands in separate terminal windows). These are snapshots of iostat 
-xnczpm 3 captured somewhere in the middle of the operation. I am not 
bothering to report CPU% as it never rose over 50%, and was uniformly 
proportional to reported throughput.


single drive raw:
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 1378.40.0 77190.70.0  0.0  1.70.01.2   0  98 c0d1

single drive, ufs file
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 1255.10.0 69949.60.0  0.0  1.80.01.4   0 100 c0d0

Small slowdown, but pretty good.

single drive, zfs file
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  258.3 0.0 33066.60.0 33.0  2.0  127.77.7 100 100 c0d1

Now that is odd. Why so much waiting? Also, unlike with raw or UFS, kr/s 
/ r/s gives 256K, as I would imagine it should.


simultaneous raw:
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  797.00.0 44632.00.0  0.0  1.80.02.3   0 100 c0d0
  795.70.0 44557.40.0  0.0  1.80.02.3   0 100 c0d1

This PCI interface seems to be saturated at 90MB/s. Adequate if the goal 
is to serve files on gigabit SOHO network.


sumultaneous raw on c0d1 and ufs on c0d0:
extended device statistics 
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device

  722.40.0 40246.80.0  0.0   1.80.02.5   0 100 c0d0
  717.10.0 40156.20.0  0.0  1.80.02.5   0  99 c0d1

hmm, can no longer get the 90MB/sec.

simultaneous zfs on c0d1 and raw on c0d0:
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.00.70.01.8  0.0  0.00.00.1   0   0 c1d0
  334.90.0 18756.00.0  0.0  1.90.05.5   0  97 c0d0
  172.50.0 22074.60.0 33.0  2.0  191.3   11.6 100 100 c0d1

Everything is slow.

What happens if we throw onboard IDE interface into the mix?
simultaneous raw SATA and raw PATA:
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 1036.30.3 58033.90.3  0.0  1.60.01.6   0  99 c1d0
 1422.60.0 79668.30.0  0.0  1.60.01.1   1  98 c0d0

Both at maximum throughput.

Read ZFS on SATA drive and raw disk on PATA interface:
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 1018.90.3 57056.14.0  0.0  1.70.01.7   0  99 c1d0
  268.40.0 34353.1 0.0 33.0  2.0  122.97.5 100 100 c0d0

SATA is slower with ZFS as expected by now, but ATA remains at full 
speed. So they are operating quite independantly. Except...


What if we read a UFS file from the PATA disk and ZFS from SATA:
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  792.80.0 44092.90.0  0.0  1.80.02.2   1  98 c1d0
  224.00.0 28675.20.0 33.0  2.0  147.38.9 100 100 c0d0
 
Now that is confusing! Why did SATA/ZFS slow down too? I've retried this 
a number of times, not a fluke.


Finally, after reviewing all this, I've noticed another interesting 
bit... whenever I read from raw disks or UFS files, SATA or PATA, kr/s 
over r/s is 56k, suggesting that underlying IO system is using that as 
some kind of a native block size? (even though dd is requesting 128k). 
But when reading ZFS files, this always comes to 128k, which is 
expected, since that is ZFS default (and same thing happens regardless 
of bs= in dd). On the theory that my system just doesn't like 128k reads 
(I'm desperate!), and that this would explain the whole slowdown and 
wait/wsvc_t column, I tried changing recsize to 32k and rewriting the 
test file. However, accessing ZFS files continues to show 128k reads, 
and it is just as slow. Is there a way to either confirm that the ZFS 
file in question is indeed written with 32k records or, even better, to 
force ZFS to use 56k when accessing the disk. Or perhaps I just 
misunderstand implications of iostat output.


I've repeated each of these tests a few times and doublechecked, and the 
numbers, although 

[zfs-discuss] Re: ZFS over a layered driver interface

2007-05-18 Thread Shweta Krishnan
I explored this a bit and found that the ldi_ioctl in my layered driver does 
fail, but fails because of an iappropriate ioctl for device  error, which the 
underlying ramdisk driver's ioctl returns. So doesn't seem like that's an issue 
at all (since I know the storage pool creation is successful when I give the 
ramdisk directly as the target device).

However, as I mentioned, even though reads and writes are getting invoked on 
the ramdisk, through my layered driver, the storage pool creation still fails.

Surprisingly, the layered driver's routines show no sign of error - as in the 
layered device gets closed successfully when the pool creation command returns.

It is unclear to be what would be a good way to go about debugging this, since 
I'm not familiar with dtrace- i shall try and familiarize myself with dtrace, 
but even then, it seems like there are a large number of functions returning 
non-zero values, and confusing to me where to look for the error.

Any pointers would be most welcome!!

Thanks,
Swetha.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS - Use h/w raid or not? Thoughts. Considerations.

2007-05-18 Thread Chad Mynhier

On 5/17/07, Robert Milkowski [EMAIL PROTECTED] wrote:

Hello Phillip,

Thursday, May 17, 2007, 6:30:38 PM, you wrote:

PF [b]Given[/b]:  A Solaris 10 u3 server with an externally attached
PF disk array with RAID controller(s)

PF [b]Question[/b]:  Is it better to create a zpool from a
PF [u]single[/u] external LUN on an external disk array, or is it
PF better to use no RAID on the disk array and just present
PF individual disks to the server and let ZFS take care of the RAID?


Then other thing - do you use SATA disks? How much data loss or
corruption is an issue for you? Doing software RAID in ZFS can detect
AND correct such problems. HW RAID also can but to much less extent.



I think this point needs to be emphasized.  If reliability is a prime
concern, you absolutely want to let ZFS handle redundancy in one way
or another, either as mirrogin or as raidz.

You can think of redundancy in ZFS as much the same thing as packet
retransmission in TCP.  If the data comes through bad the first time,
checksum verification will catch it, and you get a second chance to
get the correct data.  A single-LUN zpool is the moral equivalent of
disabling retransmission in TCP.

Chad Mynhier
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: AVS replication vs ZFS send recieve for odd sized volume pairs

2007-05-18 Thread John-Paul Drawneek
Yes, i am also interested in this.

We can't afford two super fast setup so we are looking at having a huge pile 
sata to act as a real time  backup for all our streams.

So what can AVS do and its limitations are?

Would a just using zfs send and receive do or does AVS make it all seamless?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Making 'zfs destroy' safer

2007-05-18 Thread Darren Dunham
 with the advent of clones and snapshots, one will of course start
 creating them. Which also means destroying them.
 
 Am I the only one who is *extremely* nervous about doing zfs destroy
 some/[EMAIL PROTECTED]?
 
 This goes bot manually and automatically in a script. I am very paranoid
 about this; especially because the @ sign might conceivably be
 incorrectly interpreted by some layer of scripting, being a
 non-alphanumeric character and highly atypical for filenames/paths.
 
 What about having dedicated commands destroysnapshot, destroyclone,
 or remove (less dangerous variant of destroy) that will never do
 anything but remove snapshots or clones? Alternatively having something
 along the lines of zfs destroy --nofs or zfs destroy --safe.

Apparently (and I'm not sure where this is documented), you can 'rmdir'
a snapshot to remove it (in some cases).

A normal (populated) directory wouldn't be removable with a single
rmdir, so in some sense it's safer.

Personally, I would prefer that file operations (like mv and rmdir)
couldn't affect snapshots.

 I realize this is borderline being in the same territory as special
 casing rm -rf / and similar, which is generally not considered a good
 idea.
 
 But somehow the snapshot situation feels a lot more risky.

Agreed.  I'm somewhat used to the VxVM command set which requires the
type of object be passed in in some cases (even though the name is
necessarily unique and would be enough to define the object).

  vxassist -g diskgroup remove volume volumename
  vxassist -g diskgroup remove mirror mirrorname

It doesn't feel unnatural to me to specify things this way.
-- 
Darren Dunham   [EMAIL PROTECTED]
Senior Technical Consultant TAOShttp://www.taos.com/
Got some Dr Pepper?   San Francisco, CA bay area
  This line left intentionally blank to confuse you. 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Making 'zfs destroy' safer

2007-05-18 Thread Richard Elling

Rather than rehash this, again, from scratch.  Refer to a previous rehashing.
http://www.opensolaris.org/jive/thread.jspa?messageID=15363;

 -- richard

Peter Schuller wrote:

Hello,

with the advent of clones and snapshots, one will of course start
creating them. Which also means destroying them.

Am I the only one who is *extremely* nervous about doing zfs destroy
some/[EMAIL PROTECTED]?

This goes bot manually and automatically in a script. I am very paranoid
about this; especially because the @ sign might conceivably be
incorrectly interpreted by some layer of scripting, being a
non-alphanumeric character and highly atypical for filenames/paths.

What about having dedicated commands destroysnapshot, destroyclone,
or remove (less dangerous variant of destroy) that will never do
anything but remove snapshots or clones? Alternatively having something
along the lines of zfs destroy --nofs or zfs destroy --safe.

I realize this is borderline being in the same territory as special
casing rm -rf / and similar, which is generally not considered a good
idea.

But somehow the snapshot situation feels a lot more risky.





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Making 'zfs destroy' safer

2007-05-18 Thread Darren Dunham
 Rather than rehash this, again, from scratch.  Refer to a previous rehashing.
   http://www.opensolaris.org/jive/thread.jspa?messageID=15363;

That thread really did quickly move to arguments about confirmations and
their usefulness or annoyance.

I think the idea presented of adding something like a filter is slightly
different.  It wouldn't require confirmation or modification of the
existing behavior (and it wouldn't be relevant to the original issue in
that other thread).

   destroy obj   # destroys any existing obj if possible
   destroy snapshot obj  # destroys obj only if it is a snapshot

-- 
Darren Dunham   [EMAIL PROTECTED]
Senior Technical Consultant TAOShttp://www.taos.com/
Got some Dr Pepper?   San Francisco, CA bay area
  This line left intentionally blank to confuse you. 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Making 'zfs destroy' safer

2007-05-18 Thread Krzys


Hey, that's nothing, I had one zfs file system, then I cloned it, so I
thought that I had two separate file systems. then I was making snaps
of both of them. Then later on I decided I did not need original file
system with its snaps. So I did recursively remove it, all of a sudden
I got a message that this clone file system is mounted and cannot be
removed, my heart did stop for a second as that clone was a file
system that I was using. I suspect that I did not promote zfs file
system to be completely stand alone so ehh, I did not have idea that
was the case... but it did scare me how easy I could just loose file
system by removing wrong thing.

Chris

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss