Re: [zfs-discuss] x4500 vs AVS ?

2008-09-17 Thread Ralf Ramge
Jorgen Lundman wrote:

 If we were interested in finding a method to replicate data to a 2nd 
 x4500, what other options are there for us? 

If you already have an X4500, I think the best option for you is a cron 
job with incremental 'zfs send'. Or rsync.

-- 

Ralf Ramge
Senior Solaris Administrator, SCNA, SCSA

Tel. +49-721-91374-3963
[EMAIL PROTECTED] - http://web.de/

11 Internet AG
Brauerstraße 48
76135 Karlsruhe

Amtsgericht Montabaur HRB 6484

Vorstand: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Thomas 
Gottschlich, Matthias Greve, Robert Hoffmann, Markus Huhn, Oliver Mauss, 
Achim Weiss
Aufsichtsratsvorsitzender: Michael Scheeren
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [storage-discuss] A few questions

2008-09-17 Thread gm_sjo
Am I right in thinking though that for every raidz1/2 vdev, you're
effectively losing the storage of one/two disks in that vdev?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [storage-discuss] A few questions

2008-09-17 Thread Peter Tribble
On Wed, Sep 17, 2008 at 8:40 AM, gm_sjo [EMAIL PROTECTED] wrote:
 Am I right in thinking though that for every raidz1/2 vdev, you're
 effectively losing the storage of one/two disks in that vdev?

Well yeah - you've got to have some allowance for redundancy.

-- 
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [storage-discuss] A few questions

2008-09-17 Thread gm_sjo
2008/9/17 Peter Tribble:
 On Wed, Sep 17, 2008 at 8:40 AM, gm_sjo [EMAIL PROTECTED] wrote:
 Am I right in thinking though that for every raidz1/2 vdev, you're
 effectively losing the storage of one/two disks in that vdev?

 Well yeah - you've got to have some allowance for redundancy.

This is what i'm struggling to get my head around - the chances of
losing two disks at the same time are pretty darn remote (within a
reasonable time-to-replace delta), so what advantage is there (other
than potentially pointless uber-redundancy) in running multiple
raidz/2 vdevs? Are you not infact losing performance by reducing the
amount of spindles used for a given pool?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zpool with multiple mirrors question

2008-09-17 Thread Francois
If 2 disks of a mirror fail do the pool will be faulted ?

 NAMESTATE READ WRITE CKSUM
 homez   ONLINE   0 0 0
   mirrorONLINE   0 0 0
 c0t2d0  ONLINE   0 0 0
 c0t3d0  ONLINE   0 0 0
   mirrorONLINE   0 0 0
 c0t4d0  ONLINE   0 0 0
 c0t5d0  ONLINE   0 0 0
   mirrorONLINE   0 0 0
 c0t6d0  ONLINE   0 0 0
 c0t7d0  ONLINE   0 0 0

Thanks.


--
Francois
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool with multiple mirrors question

2008-09-17 Thread Darren J Moffat
Francois wrote:
 If 2 disks of a mirror fail do the pool will be faulted ?
 
  NAMESTATE READ WRITE CKSUM
  homez   ONLINE   0 0 0
mirrorONLINE   0 0 0
  c0t2d0  ONLINE   0 0 0
  c0t3d0  ONLINE   0 0 0
mirrorONLINE   0 0 0
  c0t4d0  ONLINE   0 0 0
  c0t5d0  ONLINE   0 0 0
mirrorONLINE   0 0 0
  c0t6d0  ONLINE   0 0 0
  c0t7d0  ONLINE   0 0 0

If c0t6d0 and c0t7d0 both fail (ie both sides of the same mirror vdev) 
then the pool will be unable to retrieve all the data stored in it.  If 
c0t6d0 and c0t3d0 both fail then there are sufficient replicas of data 
available in that case because it was disks from different mirrors.

This applies to SVM as well if you have a stripe of mirrors with a UFS 
filesystem ontop of that you will have the same availability issue.

-- 
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [storage-discuss] A few questions

2008-09-17 Thread Peter Tribble
On Wed, Sep 17, 2008 at 10:11 AM, gm_sjo [EMAIL PROTECTED] wrote:
 2008/9/17 Peter Tribble:
 On Wed, Sep 17, 2008 at 8:40 AM, gm_sjo [EMAIL PROTECTED] wrote:
 Am I right in thinking though that for every raidz1/2 vdev, you're
 effectively losing the storage of one/two disks in that vdev?

 Well yeah - you've got to have some allowance for redundancy.

 This is what i'm struggling to get my head around - the chances of
 losing two disks at the same time are pretty darn remote (within a
 reasonable time-to-replace delta), so what advantage is there (other
 than potentially pointless uber-redundancy) in running multiple
 raidz/2 vdevs? Are you not infact losing performance by reducing the
 amount of spindles used for a given pool?

No. The number of spindles is constant. The snag is that for random reads,
the performance of a raidz1/2 vdev is essentially that of a single disk. (The
writes are fast because they're always full-stripe; but so are the reads.) So
your effective random read performance is that of a single disk times the
number of raidz vdevs.

It's a tradeoff, as in all things. Fewer vdevs means less wasted space, but
lower performance.

-- 
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [storage-discuss] A few questions

2008-09-17 Thread Ralf Ramge
gm_sjo wrote:

 Are you not infact losing performance by reducing the
 amount of spindles used for a given pool?

This depends. Usually, RAIDZ1/2 isn't a good performancer when it comes 
to random access read I/O, for instance. If I wanted to scale 
performance by adding spindles, I would use mirrors (RAID 10). If you 
want to scale filesystem sizes, RAIDZ is your friend.

I once had the problem that I needed a high random I/O performance and 
at least a 11 TB large filesystem on a X4500. Mirroring was out of the 
question (not enough disk space left), and RAIDZ gave me only about 25% 
of the performance of the existing Linux ext2 boxes I had to compete 
with. But in the end, striping 13 RAIDZ sets of 3 drives each + 1 hot 
spare delivered acceptable results in both categories. But it took me a 
lot of benchmarks to get there.


-- 

Ralf Ramge
Senior Solaris Administrator, SCNA, SCSA

Tel. +49-721-91374-3963
[EMAIL PROTECTED] - http://web.de/

11 Internet AG
Brauerstraße 48
76135 Karlsruhe

Amtsgericht Montabaur HRB 6484

Vorstand: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Thomas 
Gottschlich, Matthias Greve, Robert Hoffmann, Markus Huhn, Oliver Mauss, 
Achim Weiss
Aufsichtsratsvorsitzender: Michael Scheeren
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool with multiple mirrors question

2008-09-17 Thread Francois
Darren J Moffat wrote:

 If c0t6d0 and c0t7d0 both fail (ie both sides of the same mirror vdev) 
 then the pool will be unable to retrieve all the data stored in it.  If 
 c0t6d0 and c0t3d0 both fail then there are sufficient replicas of data 
 available in that case because it was disks from different mirrors.
 
 This applies to SVM as well if you have a stripe of mirrors with a UFS 
 filesystem ontop of that you will have the same availability issue.
 

Thanks for precision :)


--
Francois
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [storage-discuss] iscsi target problems on snv_97

2008-09-17 Thread Moore, Joe
 I believe the problem you're seeing might be related to deadlock
 condition (CR 6745310), if you run pstack on the
 iscsi target  daemon you might find a bunch of zombie
 threads.  The fix
 is putback to snv-99, give snv-99 a try.

Yes, a pstack of the core I've generated from iscsitgtd does have a number of 
zombie threads.

I'm afraid I can't make heads nor tails of the bug report at 
http://bugs.opensolaris.org/view_bug.do?bug_id=6658836 nor its duplicate-of 
6745310, nor any of the related bugs (all are unavailable except for 6676298, 
and the stack trace reported in that bug doesn't look anything like mine.

As far as I can tell snv-98 is the latest build, from Sep 10 according to 
http://dlc.sun.com/osol/on/downloads/.  So snv-99 should be out next week, 
correct?

Anything I can do in the mean time?  Do I need to BFU to the latest nightly 
build?  Or would just taking the iscsitgtd from that build suffice?

--Joe
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [storage-discuss] iscsi target problems on snv_97

2008-09-17 Thread tim szeto
Moore, Joe wrote:
 I believe the problem you're seeing might be related to deadlock
 condition (CR 6745310), if you run pstack on the
 iscsi target  daemon you might find a bunch of zombie
 threads.  The fix
 is putback to snv-99, give snv-99 a try.
 

 Yes, a pstack of the core I've generated from iscsitgtd does have a number of 
 zombie threads.

 I'm afraid I can't make heads nor tails of the bug report at 
 http://bugs.opensolaris.org/view_bug.do?bug_id=6658836 nor its duplicate-of 
 6745310, nor any of the related bugs (all are unavailable except for 
 6676298, and the stack trace reported in that bug doesn't look anything like 
 mine.

 As far as I can tell snv-98 is the latest build, from Sep 10 according to 
 http://dlc.sun.com/osol/on/downloads/.  So snv-99 should be out next week, 
 correct?
   
snv-99 should be out next week.
 Anything I can do in the mean time?  Do I need to BFU to the latest nightly 
 build?  Or would just taking the iscsitgtd from that build suffice?
   
You could try snv-98.  You don't need to bfu, just get the latest iscsitgtd.

-Tim

 --Joe
 ___
 storage-discuss mailing list
 [EMAIL PROTECTED]
 http://mail.opensolaris.org/mailman/listinfo/storage-discuss
   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool with multiple mirrors question

2008-09-17 Thread Miles Nordin
 djm == Darren J Moffat [EMAIL PROTECTED] writes:

   djm If c0t6d0 and c0t7d0 both fail (ie both sides of the same
   djm mirror vdev) then the pool will be unable to retrieve all the
   djm data stored in it.

won't be able to retrieve ANY of the data stored on it.  It's correct
as you wrote it, but you almost make it sound like you could get some
data off the pool, and one might reasonably hope to, but you can't.

for xample,

1. zpool create pool mirror disk1 disk2

2. pax -rwpe /somewhere/else /pool

3. zpool add pool mirror disk3 disk4

   [don't write anything to the pool]

4. [disk3 and disk4 both die]

You've now lost everything you copied onto the pool in step 2.

so, if you type 'zpool add pool disk3 disk4' and forget the 'mirror',
your mistake isn't such a small one.  You have to quickly find disk5
and disk6 to attach.  (happened to me.  with 30day-old
disk3/disk4. at home with no backup.)


pgpvd284d4vke.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] resilver keeps starting over? snv_95

2008-09-17 Thread Neal Pollack
Running Nevada build 95 on an ultra 40.
Had to replace a drive.
Resilver in progress, but it looks like each
time I do a zpool status, the resilver starts over.
Is this a known issue?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] resilver keeps starting over? snv_95

2008-09-17 Thread Tomas Ögren
On 17 September, 2008 - Neal Pollack sent me these 0,3K bytes:

 Running Nevada build 95 on an ultra 40.
 Had to replace a drive.
 Resilver in progress, but it looks like each
 time I do a zpool status, the resilver starts over.
 Is this a known issue?

I recall some issue with 'zpool status' as root restarting resilvering..
Doing it as a regular user will not..

/Tomas
-- 
Tomas Ögren, [EMAIL PROTECTED], http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] resilver keeps starting over? snv_95

2008-09-17 Thread Miles Nordin
 t == Tomas Ögren [EMAIL PROTECTED] writes:

 t I recall some issue with 'zpool status' as root restarting
 t resilvering..  Doing it as a regular user will not..

is there an mdb command similar to zpool status?  maybe it's safer.


pgp8jYtCisPzr.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS system requirements

2008-09-17 Thread Erik Trimble
Cyril Plisko wrote:
 On Wed, Sep 17, 2008 at 6:06 AM, Erik Trimble [EMAIL PROTECTED] wrote:
   
 Just one more things on this:

 Run with a 64-bit processor. Don't even think of using a 32-bit one -
 there are known issues with ZFS not quite properly using 32-bit only
 structures.  That is, ZFS is really 64-bit clean, but not 32-bit clean.

 

 Wow ! That's a statement. Can you provide more info on these 32-bit issues ?
 I am not aware of any. In fact besides being sluggish (presumably due
 to limited address space) I never noticed any issues with ZFS, which I
 used on 32-bit machine for 2 years.

   

http://www.opensolaris.org/jive/thread.jspa?messageID=212508#212508



Looking through the Bug database, it seems that a good chunk of 
32-bit-related problems have been resolved. However, there hasn't been a 
general fix for the overall issue noted in the above discussion.

-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] resilver keeps starting over? snv_95

2008-09-17 Thread Wade . Stuart
Are you doing snaps?  If so unless you have the new bits to handle the
issue,  each snap restarts a scrub or resilver.


Thanks!
Wade Stuart

we are fallon
P: 612.758.2660
C: 612.877.0385

** Fallon has moved.  Effective May 19, 2008 our address is 901 Marquette
Ave, Suite 2400, Minneapolis, MN 55402.

[EMAIL PROTECTED] wrote on 09/17/2008 01:07:53 PM:

 Running Nevada build 95 on an ultra 40.
 Had to replace a drive.
 Resilver in progress, but it looks like each
 time I do a zpool status, the resilver starts over.
 Is this a known issue?

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] resilver keeps starting over? snv_95

2008-09-17 Thread Neal Pollack

On 09/17/08 02:29 PM, [EMAIL PROTECTED] wrote:
Are you doing snaps? 


No, no snapshots ever.
Logged in as root to do;
zpool replace poolname deaddisk
and then did a few zpool status
as root.  It restarted each time.



 If so unless you have the new bits to handle the
issue,  each snap restarts a scrub or resilver.


Thanks!
Wade Stuart

we are fallon
P: 612.758.2660
C: 612.877.0385

** Fallon has moved.  Effective May 19, 2008 our address is 901 Marquette
Ave, Suite 2400, Minneapolis, MN 55402.

[EMAIL PROTECTED] wrote on 09/17/2008 01:07:53 PM:

  

Running Nevada build 95 on an ultra 40.
Had to replace a drive.
Resilver in progress, but it looks like each
time I do a zpool status, the resilver starts over.
Is this a known issue?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



  


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZPOOL Import Problem

2008-09-17 Thread Jim Dunham

On Sep 16, 2008, at 5:39 PM, Miles Nordin wrote:

 jd == Jim Dunham [EMAIL PROTECTED] writes:

jd If at the time the SNDR replica is deleted the set was
jd actively replicating, along with ZFS actively writing to the
jd ZFS storage pool, I/O consistency will be lost, leaving ZFS
jd storage pool in an indeterministic state on the remote node.

jd To address this issue, prior to deleting the replicas, the
jd replica should be placed into logging mode first.

 What if you stop the replication by breaking the network connection
 between primary and replica?  consistent or inconsistent?

Consistent.

 it sounds fishy, like ``we're always-consistent-on-disk with ZFS, but
 please use 'zpool offline' to avoid disastrous pool corruption.''

This is not the case at all.

Maintaining I/O consistency of all volumes in a single I/O consistency  
group, is an attribute of replication. The instant an SNDR replica is  
deleted, that volume is no longer being replicated, and it becomes  
inconsistent with all other write-order volumes. By placing all  
volumes in the I/O consistency group in logging mode, not 'zpool  
offline', and then deleting the replica there is no means for any of  
the remote volumes to become I/O inconsistent.

Yes, one will note that there is a group disable command sndradm -g  
group-name -d, but it was implemented for easy of administration,  
not for performing a write-order coordinated disable command.

jd ndr_ii. This is an automatic snapshot taken before
jd resynchronization starts,

 yeah that sounds fine, possibly better than DRBD in one way because it
 might allow the resync to go faster.

 From the PDF's it sounds like async replication isn't done the same
 way as the resync, it's done safely, and that it's even possible for
 async replication to accumulate hours of backlog in a ``disk queue''
 without losing write ordering so long as you use the ``blocking mode''
 variant of async.

Correct reading of the documentation.

 ii might also be good for debugging a corrupt ZFS, so you can tinker
 with it but still roll back to the original corrupt copy.  I'll read
 about it---I'm guessing I will need to prepare ahead of time if I want
 ii available in the toolbox after a disaster.

jd AVS has the concept of I/O consistency groups, where all disks
jd of a multi-volume filesystem (ZFS, QFS) or database (Oracle,
jd Sybase) are kept write-order consistent when using either sync
jd or async replication.

 Awesome, so long as people know to use it.  so I guess that's the
 answer for the OP: use consistency groups!

I use the name of the ZFS storage pool, as the name of the SNDR I/O  
consistency group.

 The one thing I worry about is, before, AVS was used between RAID and
 filesystem, which is impossible now because that inter-layer area n
 olonger exists.  If you put the individual device members of a
 redundant zpool vdev into an AVS consistency group, what will AVS do
 when one of the devices fails?

Nothing, as it is ZFS the reacts to the failed device

 Does it continue replicating the working devices and ignore the  
 failed one?

In this scenario ZFS knows he device failed, which means ZFS will stop  
writing to the disk, and thus the replica.


 This would sacrifice redundancy at the DR site.  UFS-AVS-RAID
 would not do that in the same situation.

 Or hide the failed device from ZFS and slow things down by sending all
 read/writes of the failed device to the remote mirror?  This would
 slwo down the primary site.  UFS-AVS-RAID would not do that in the
 same situation.

 The latter ZFS-AVS behavior might be rescueable, if ZFS had the
 statistical read-preference feature.  but writes would still be
 massively slowed with this scenario, while in UFS-AVS-RAID they would
 not be.  To get back the level of control one used to have for writes,
 you'd need a different zpool-level way to achieve the intent of the
 AVS sync/async option.  Maybe just a slog which is not AVS-replicated
 would be enough, modulo other ZFS fixes for hiding slow devices.

ZFS-AVS is not UFS-AVS-RAID, and although one can foresee some  
downside to replicating ZFS with AVS, there are some big wins.

Place SNDR in logging mode, and zpool scrub the secondary volumes for  
consistency, then resume replication.
Compressed ZFS Storage pools, result in compressed replication
Encrypted ZFS Storage pools, result in encrypted replication



 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Jim Dunham
Engineering Manager
Storage Platform Software Group
Sun Microsystems, Inc.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss