Re: [zfs-discuss] x4500 vs AVS ?

2008-09-22 Thread Ralf Ramge
Jim Dunham wrote:

 It is the mixture of both resilvering writes, and new ZFS filesystem 
 writes, that make it impossible for AVS to make replication 'smarter'.

Jim is right here. I just want to add that I don't see an obvious way to 
  make AVS as smart as Brent may wish it to be.
Sometimes I describe AVS as a low level service with some proxy 
functionalities. That's not really correct, but good enough for a single 
powerpoint sheet. AVS receives the writes from the file system, and 
replicates them. It does not care about the contents of the 
transactions, like IP can't take care of the responsibilities of higher 
layer protocols like TCP or even layer 7 data (bad comparison, I know, 
but it may help to understand what I mean).

What AVS does is copying the contents of devices. A file system writes 
some data to a sector on a hard disk - AVS is aware of this transaction 
- AVS replicates the sector to the second host - on the secondary 
host, AVS makes sure that *exactly* the same data is written to 
*exactly* the same position on the secondary host's storage device. Your 
secondary storage is a 100% copy. And if you write a bazillion of 0 byte 
sectors to the disk with `dd`, AVS will make sure that the secondary 
does does it, too. And it does this in near real time (if you ignore the 
network bottlenecks). The downside of it: it's easy to do something 
wrong and you may run in network bottlenecks due to a higher amount of 
traffic.

What AVS can't offer: file-based replication. In many cases, you don't 
have to care about having an exact copy of a device. For example, if you 
want a standby solution for your NFS file server, you want to keep the 
contents of the files and directories in sync. You don't care if a newly 
written file uses the same inode number. You only care if the file is 
copied to your backup host while the file system of the backup host is 
*mounted*. The best-known service for this functionality is `rsync`. And 
if you know rsync, you know the downside of these services, too: don't 
even think about replicating your data in real time and/or to multiple 
servers.

The challenge is to find out which kind of replication suits your 
concept better.
For instance, if you want to replicate html pages, graphics or other 
documents, perhaps even with a copy button on an intranet page, 
file-based replication is your friend.
If you need real time copying or device replication, for instance on a 
database server with its own file system, or for keeping configuration 
files in sync across a cluster, then AVS is your best bet.

But let's face it: everybody wants the best of both worlds, and so 
people ask if AVS could not just get smarter. The answer: no, not 
really. It can't check if the file system's write operation make sense 
or if the data really needs to be replicated. AVS is a truck which 
guarantees fast and accurate delivery of whatever you throw into it. 
Taking care of the content itself is the job of the person who prepares 
the freight. And, in our case, this person is called UFS. Or ZFS. And 
ZFS could do a much better job here.

Sun's marketing sells ZFS as offering data integrity at *all times* 
(http://www.sun.com/2004-0914/feature/). Well, that's true, at least as 
long as there is no problem on lower layers. And I  often wondered if 
ZFS doesn't offer something fsck-like for faulted pools because it's 
technically impossible, or because the marketing guys forbade it. I also 
wondered why people are enthusiastic about gimmicks like ditto blocks, 
but don't want data protection in case an X4540 suffers a power outage 
and lots of gigabytes of zfs cache may go down the drain.

Proposal: ZFS should offer some kind of IsReplicated flag in the zpool 
metadata. During a `zpool import`, this flag should be checked and if it 
is set, a corresponding error message should be printed on stdout. Or 
the ability to set dummy zpool parameters, something like a zpool set 
storage:cluster:avs=true tank. This would be only some kind of first 
aid only, but that's better than nothing.

This has nothing to do with AVS only. It also applies to other 
replication services. It would allow us to write simple wrapper scripts 
to switch the replication mechanism into logging mode, thus allowing us 
to safely force the import of the zpool in case of a desaster.

Of course, it would be even better to integrate AVS into ZFS itself. 
zfs set replication=hostname1[,hostname2...hostnameN] would be 
the coolest thing on earth, because it would combine the benefits of AVS 
and rsync-like replication into a perfect product. And it would allow 
the marketing people to use the high availability and full data 
redundancy buzzwords in their flyers.

But until then, I'll have to continue using cron jobs on the secondary 
node which try to log in to the primary with ssh and to do a zfs get 
storage:cluster:avs filesystem on all mounted file systems and save 
it locally for my zpool import wrapper script.  This 

Re: [zfs-discuss] x4500 vs AVS ?

2008-09-19 Thread Jim Dunham
Brent,

 On Tue, Sep 16, 2008 at 11:51 PM, Ralf Ramge [EMAIL PROTECTED]  
 wrote:
 Jorgen Lundman wrote:

 If we were interested in finding a method to replicate data to a 2nd
 x4500, what other options are there for us?

 If you already have an X4500, I think the best option for you is a  
 cron
 job with incremental 'zfs send'. Or rsync.

 --

 Ralf Ramge
 Senior Solaris Administrator, SCNA, SCSA


 We had some Sun reps come out the other day to talk to us about
 storage options, and part of the discussion was AVS replication with
 ZFS.
 I brought up the question of replicating the resilvering process, and
 the reps said it does not replicate. They may be mistaken, but I'm
 hopeful they are correct.

The resilvering process is replicated, as AVS can not differentiate  
between ZFS resilvering writes, and ZFS filesystem writes.

 Could this behavior have been changed recently on AVS to make
 replication 'smarter' with ZFS as the underlying filesystem?

No 'smarter' changes have been made to AVS.

This issue at hand, is that as soon as ZFS makes the decision to not  
write to one of its configured vdevs, that vdev and its replica will  
now contain stale data. When ZFS is told to use the vdev again, (a  
zpool replace), ZFS starts resilvering all in-use data, plus any new  
ZFS filesystem writes on the local vdev, both of which will be  
replicated by AVS.

It is the mixture of both resilvering writes, and new ZFS filesystem  
writes, that make it impossible for AVS to make replication 'smarter'.

 -- 
 Brent Jones
 [EMAIL PROTECTED]
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Jim Dunham
Engineering Manager
Storage Platform Software Group
Sun Microsystems, Inc.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500 vs AVS ?

2008-09-18 Thread Brent Jones
On Tue, Sep 16, 2008 at 11:51 PM, Ralf Ramge [EMAIL PROTECTED] wrote:
 Jorgen Lundman wrote:

 If we were interested in finding a method to replicate data to a 2nd
 x4500, what other options are there for us?

 If you already have an X4500, I think the best option for you is a cron
 job with incremental 'zfs send'. Or rsync.

 --

 Ralf Ramge
 Senior Solaris Administrator, SCNA, SCSA


We had some Sun reps come out the other day to talk to us about
storage options, and part of the discussion was AVS replication with
ZFS.
I brought up the question of replicating the resilvering process, and
the reps said it does not replicate. They may be mistaken, but I'm
hopeful they are correct.
Could this behavior have been changed recently on AVS to make
replication 'smarter' with ZFS as the underlying filesystem?

-- 
Brent Jones
[EMAIL PROTECTED]
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500 vs AVS ?

2008-09-17 Thread Ralf Ramge
Jorgen Lundman wrote:

 If we were interested in finding a method to replicate data to a 2nd 
 x4500, what other options are there for us? 

If you already have an X4500, I think the best option for you is a cron 
job with incremental 'zfs send'. Or rsync.

-- 

Ralf Ramge
Senior Solaris Administrator, SCNA, SCSA

Tel. +49-721-91374-3963
[EMAIL PROTECTED] - http://web.de/

11 Internet AG
Brauerstraße 48
76135 Karlsruhe

Amtsgericht Montabaur HRB 6484

Vorstand: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Thomas 
Gottschlich, Matthias Greve, Robert Hoffmann, Markus Huhn, Oliver Mauss, 
Achim Weiss
Aufsichtsratsvorsitzender: Michael Scheeren
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500 vs AVS ?

2008-09-16 Thread Jorgen Lundman

Sorry, I popped up to Hokkdaido for a holiday. I want to thank you all 
for the replies.

I mentioned AVS as I thought it to do be the only product close to 
enabling us to do a (makeshift) fail-over setup.

We have 5-6 ZFS filesystem, and 5-6 zvol with UFS (for quotas). To do 
zfs send snapshots every minute might perhaps be possible (just not 
very attractive), but if the script dies at any time, you need to resend 
the full volumes, this currently takes 5 days. (Even using nc).

Since we are forced by vendor to run Sol10, it sounds like AVS is not an 
option for us.

If we were interested in finding a method to replicate data to a 2nd 
x4500, what other options are there for us? We do not need instant 
updates, just someplace to fail-over to when the x4500 panics, or a HDD 
dies. (Which equals panic) It currently takes 2 hours to fsck the UFS 
volumes after a panic (and yes, they are logging; it is actually just 
the one UFS volume that always needs fsck).

Vendor has mentioned VeritasVolumReplicator but I was under the 
impression that Veritas is a whole different set to zfs/zpool.

Lund




Jim Dunham wrote:
 On Sep 11, 2008, at 5:16 PM, A Darren Dunham wrote:
 On Thu, Sep 11, 2008 at 04:28:03PM -0400, Jim Dunham wrote:
 On Sep 11, 2008, at 11:19 AM, A Darren Dunham wrote:

 On Thu, Sep 11, 2008 at 10:33:00AM -0400, Jim Dunham wrote:
 The issue with any form of RAID 1, is that the instant a disk  
 fails
 out of the RAID set, with the next write I/O to the remaining  
 members
 of the RAID set, the failed disk (and its replica) are instantly  
 out
 of sync.
 Does raidz fall into that category?
 Yes. The key reason is that as soon as ZFS (or other mirroring  
 software)
 detects a disk failure in a RAID 1 set, it will stop writing to the
 failed disk, which also means it will also stop writing to the  
 replica of
 the failed disk. From the point of view of the remote node, the  
 replica
 of the failed disk is no longer being updated.

 Now if replication was stopped, or the primary node powered off or
 panicked, during the import of the ZFS storage pool on the secondary
 node, the replica of the failed disk must not be part of the ZFS  
 storage
 pool as its data is stale. This happens automatically, since the ZFS
 metadata on the remaining disks have already given up on this  
 member of
 the RAID set.
 Then I misunderstood what you were talking about.  Why the restriction
 on RAID 1 for your statement?
 
 No restriction. I meant to say, RAID 1 or greater.
 
 Even for a mirror, the data is stale and
 it's removed from the active set.  I thought you were talking about
 block parity run across columns...

 -- 
 Darren
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 
 Jim Dunham
 Engineering Manager
 Storage Platform Software Group
 Sun Microsystems, Inc.
 work: 781-442-4042
 cell: 603.724.2972
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 

-- 
Jorgen Lundman   | [EMAIL PROTECTED]
Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
Japan| +81 (0)3 -3375-1767  (home)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500 vs AVS ?

2008-09-12 Thread Jim Dunham
On Sep 11, 2008, at 5:16 PM, A Darren Dunham wrote:
 On Thu, Sep 11, 2008 at 04:28:03PM -0400, Jim Dunham wrote:

 On Sep 11, 2008, at 11:19 AM, A Darren Dunham wrote:

 On Thu, Sep 11, 2008 at 10:33:00AM -0400, Jim Dunham wrote:
 The issue with any form of RAID 1, is that the instant a disk  
 fails
 out of the RAID set, with the next write I/O to the remaining  
 members
 of the RAID set, the failed disk (and its replica) are instantly  
 out
 of sync.

 Does raidz fall into that category?

 Yes. The key reason is that as soon as ZFS (or other mirroring  
 software)
 detects a disk failure in a RAID 1 set, it will stop writing to the
 failed disk, which also means it will also stop writing to the  
 replica of
 the failed disk. From the point of view of the remote node, the  
 replica
 of the failed disk is no longer being updated.

 Now if replication was stopped, or the primary node powered off or
 panicked, during the import of the ZFS storage pool on the secondary
 node, the replica of the failed disk must not be part of the ZFS  
 storage
 pool as its data is stale. This happens automatically, since the ZFS
 metadata on the remaining disks have already given up on this  
 member of
 the RAID set.

 Then I misunderstood what you were talking about.  Why the restriction
 on RAID 1 for your statement?

No restriction. I meant to say, RAID 1 or greater.

 Even for a mirror, the data is stale and
 it's removed from the active set.  I thought you were talking about
 block parity run across columns...

 -- 
 Darren
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Jim Dunham
Engineering Manager
Storage Platform Software Group
Sun Microsystems, Inc.
work: 781-442-4042
cell: 603.724.2972

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500 vs AVS ?

2008-09-11 Thread Jim Dunham
Ralf,

 Jim, at first: I never said that AVS is a bad product. And I never  
 will.  I wonder why you act as if you were attacked personally.
 To be honest, if I were a customer with the original question, such  
 a reaction wouldn't make me feel safer.

I am sorry that my response came across that way, it was not  
intentional.


 - ZFS is not aware of AVS. On the secondary node, you'll always  
 have to
 force the `zfs import` due to the unnoticed changes of metadata  
 (zpool
 in use).
 This is not true. If on the primary node invokes zpool export  
 while replication is still active, then a forced zpool import is  
 not required. This behavior is the same as with a zpool on dual- 
 ported or SAN storage, and is NOT specific to AVS.

 Jim. A graceful shutdown of the primary node may be a valid desaster  
 scenario in the laboratory, but it never will be in the real life.

I agree with your assessment that in real life a 'zpool export' will  
never be done in a real disaster, but unconditionally doing a forced  
'zpool import' is problematic. Prior to performing the forced import,  
one needs to assure that the primary node is actually down and is not  
in the process of booting up, or that replication is stopped and will  
not automatically resume.

Failure to make these checks prior to a forced 'zpool import' could  
lead to scenarios where two or more instances of ZFS are accessing the  
same ZFS storage pool, each attempting to writing their own metadata,  
and thus there own CRCs. In time this action will result in CRC  
checksum failures on reads, followed by a ZFS induced panic.


 No mechanism to prevent data loss exists, e.g. zpools can be
 imported when the replicator is *not* in logging mode.
 This behavior is the same as with a zpool on dual-ported or SAN  
 storage, and is NOT specific to AVS.

 And what makes you think that I said that AVS is the problem here?

 And by the way, the customer doesn't care *why* there's a problem.  
 He only wants to know *if* there's a problem.

There is a mechanism to prevent data lost here, its AVS! This is the  
reasoning behind questioning the association made above of replication  
being part of the problem, where in fact how replication is  
implemented with AVS, it is actually part of the solution.

If one does not following the guidance suggested above before invoking  
a forced 'zpool import', the action will likely result in on-disk CRC  
checksum inconsistencies within the ZFS storage pool, resulting in  
secondary node data loss, the initial point above. Since AVS  
replication is unidirectional there is no data loss on the primary  
node, and when replication is resumed, AVS will undo the faulty  
secondary node writes, correcting the actual data loss, and in time  
restoring 100% synchronization of the ZFS storage pool between the  
primary and secondary nodes.


 - AVS is not ZFS aware.
 AVS is not UFS, QFS, Oracle, Sybase aware either. This makes AVS,  
 and other host based and controller based replication services  
 multi-functional. If you desire ZFS aware functionality, use ZFS  
 send and recv.

 Yes, exactly. And that's the problem, sind `zfs send` and `zfs  
 receive` are no working solution in a fail-safe two node  
 environment. Again: the customer doesn't care *why* there's a  
 problem. He only wants to know *if* there's a problem.

My takeaway from this is that both AVS and ZFS are data path services,  
but collectively they are not on their own a complete disaster  
recovery solution. Since AVS is not aware of ZFS, and vice-versa,  
additional software in the form of Solaris Cluster, GeoCluster or  
other developed software needs to provide the awareness, so that  
viable disaster recovery solutions can be possible, and supportable.


 For instance, if ZFS resilves a mirrored disk,
 e.g. after replacing a drive, the complete disk is sent over the  
 network
 to the secondary node, even though the replicated data on the  
 secondary
 is intact.

The problem with this statement is that one can not guarantee that the  
replicated data on the secondary is intact, specifically that the data  
is 100% identical to the non-failing side of the mirror on the primary  
node. Of course if this guarantee could be assured, then an sndradm - 
E ..., (equal enable) could be done, and the full disk copy could be  
avoided. But all is not lost...

A failure in writing to a mirrored volume almost assures that the data  
will be different, by at least one I/O, the one that triggered the  
initial failure of the mirror. The momentary upside is that AVS is  
interposed above the failing volume, so that the I/O will get  
replicated, even if it failed to make it the disk. The downside is  
that with ZFS (or any other mirroring software), once a failure is  
detected by the mirroring software, it will stop writing to the side  
of the mirror containing the failed disk (and thus the configured AVS  
replica), but will still continue to write to the 

Re: [zfs-discuss] x4500 vs AVS ?

2008-09-11 Thread Jim Dunham
Matt,

 Just to clarify a few items... consider a setup where we desire to  
 use AVS to replicate the ZFS pool on a 4 drive server to like  
 hardware.  The 4 drives are setup as RaidZ.

 If we lose a drive (say #2) in the primary server, RaidZ will take  
 over, and our data will still be available but the array is at a  
 degraded state.

 But what happens to the secondary server?  Specifically to its bit- 
 for-bit copy of Drive #2... presumably it is still good, but ZFS  
 will offline that disk on the primary server, replicate the  
 metadata, and when/if I promote the seconday server, it will also  
 be running in a degraded state (ie: 3 out of 4 drives).  correct?

The issue with any form of RAID 1, is that the instant a disk fails  
out of the RAID set, with the next write I/O to the remaining members  
of the RAID set, the failed disk (and its replica) are instantly out  
of sync.

 In this scenario, my replication hasn't really bought me any  
 increased availablity... or am I missing something?

In testing with ZFS in the scenario, first of all the secondary node's  
ZPOOL is not in the import state. So if one stops replication, or  
there is a primary node failure, a zpool import operation will need to  
be done on the secondary node. In all my testing to date, ZFS does the  
correct thing, realizing that one disk had failed out of the RAID set  
on the primary, and thus to not use it on the secondary. In short, ZFS  
knows that the RAID set is degraded, was being maintained in a  
degraded state, and this fact was replicated to the secondary node,  
correctly.

 Also, if I do chose to fail over to the secondary, can I just to a  
 scrub the broken drive (which isn't really broken, but the zpool  
 would be inconsistent at some level with the other online drives)  
 and get back to full speed quickly? or will I always have to wait  
 until one of the servers resilvers itself (from scratch?), and re- 
 replicates itself??

 thanks in advance.

 -Matt
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Jim Dunham
Engineering Manager
Storage Platform Software Group
Sun Microsystems, Inc.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500 vs AVS ?

2008-09-11 Thread A Darren Dunham
On Thu, Sep 11, 2008 at 10:33:00AM -0400, Jim Dunham wrote:
 The issue with any form of RAID 1, is that the instant a disk fails  
 out of the RAID set, with the next write I/O to the remaining members  
 of the RAID set, the failed disk (and its replica) are instantly out  
 of sync.

Does raidz fall into that category?  Since the parity is maintained only
on written blocks rather than all disk blocks on all columns, it seems
to be resistant to this issue.

-- 
Darren
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500 vs AVS ?

2008-09-11 Thread Jim Dunham

On Sep 11, 2008, at 11:19 AM, A Darren Dunham wrote:

 On Thu, Sep 11, 2008 at 10:33:00AM -0400, Jim Dunham wrote:
 The issue with any form of RAID 1, is that the instant a disk fails
 out of the RAID set, with the next write I/O to the remaining members
 of the RAID set, the failed disk (and its replica) are instantly out
 of sync.

 Does raidz fall into that category?

Yes. The key reason is that as soon as ZFS (or other mirroring  
software) detects a disk failure in a RAID 1 set, it will stop  
writing to the failed disk, which also means it will also stop writing  
to the replica of the failed disk. From the point of view of the  
remote node, the replica of the failed disk is no longer being updated.

Now if replication was stopped, or the primary node powered off or  
panicked, during the import of the ZFS storage pool on the secondary  
node, the replica of the failed disk must not be part of the ZFS  
storage pool as its data is stale. This happens automatically, since  
the ZFS metadata on the remaining disks have already given up on this  
member of the RAID set.


 Since the parity is maintained only
 on written blocks rather than all disk blocks on all columns, it seems
 to be resistant to this issue.

 -- 
 Darren
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Jim Dunham
Engineering Manager
Storage Platform Software Group
Sun Microsystems, Inc.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500 vs AVS ?

2008-09-11 Thread A Darren Dunham
On Thu, Sep 11, 2008 at 04:28:03PM -0400, Jim Dunham wrote:

 On Sep 11, 2008, at 11:19 AM, A Darren Dunham wrote:

 On Thu, Sep 11, 2008 at 10:33:00AM -0400, Jim Dunham wrote:
 The issue with any form of RAID 1, is that the instant a disk fails
 out of the RAID set, with the next write I/O to the remaining members
 of the RAID set, the failed disk (and its replica) are instantly out
 of sync.

 Does raidz fall into that category?

 Yes. The key reason is that as soon as ZFS (or other mirroring software) 
 detects a disk failure in a RAID 1 set, it will stop writing to the 
 failed disk, which also means it will also stop writing to the replica of 
 the failed disk. From the point of view of the remote node, the replica 
 of the failed disk is no longer being updated.

 Now if replication was stopped, or the primary node powered off or  
 panicked, during the import of the ZFS storage pool on the secondary  
 node, the replica of the failed disk must not be part of the ZFS storage 
 pool as its data is stale. This happens automatically, since the ZFS 
 metadata on the remaining disks have already given up on this member of 
 the RAID set.

Then I misunderstood what you were talking about.  Why the restriction
on RAID 1 for your statement?  Even for a mirror, the data is stale and
it's removed from the active set.  I thought you were talking about
block parity run across columns...

-- 
Darren
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500 vs AVS ?

2008-09-10 Thread Victor Latushkin
On 09.09.08 19:32, Richard Elling wrote:
 Ralf Ramge wrote:
 Richard Elling wrote:

 Yes, you're right. But sadly, in the mentioned scenario of having 
 replaced an entire drive, the entire disk is rewritten by ZFS.
 No, this is not true.  ZFS only resilvers data.
 Okay, I see we have a communication problem here. Probably my fault, I 
 should have written the entire data and metadata.
 I made the assumption that a 1 TB drive in a X4500 may have up to 1 TB 
 of data on it. Simply because nobody buys the 1 TB X4500 just to use 
 10% of the disk space, he would have bought the 250 GB, 500 GB or 750 
 GB model then.
 
 Actually, they do :-)  Some storage vendors insist on it, to keep
 performance up -- short-stroking.
 
 I've done several large-scale surveys of this and the average usage
 is 50%.  This is still a large difference in resilver times between
 ZFS and SVM.

There is RFE 6722786 resilver on mirror could reduce window of 
vulnerability which is aimed to reduce this difference for mirrors.

See here: http://bugs.opensolaris.org/view_bug.do?bug_id=6722786

Wbr,
Victor
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500 vs AVS ?

2008-09-10 Thread Matt Beebe
Just to clarify a few items... consider a setup where we desire to use AVS to 
replicate the ZFS pool on a 4 drive server to like hardware.  The 4 drives are 
setup as RaidZ.

If we lose a drive (say #2) in the primary server, RaidZ will take over, and 
our data will still be available but the array is at a degraded state.

But what happens to the secondary server?  Specifically to its bit-for-bit copy 
of Drive #2... presumably it is still good, but ZFS will offline that disk on 
the primary server, replicate the metadata, and when/if I promote the 
seconday server, it will also be running in a degraded state (ie: 3 out of 4 
drives).  correct?

In this scenario, my replication hasn't really bought me any increased 
availablity... or am I missing something?  

Also, if I do chose to fail over to the secondary, can I just to a scrub the 
broken drive (which isn't really broken, but the zpool would be inconsistent 
at some level with the other online drives) and get back to full speed 
quickly? or will I always have to wait until one of the servers resilvers 
itself (from scratch?), and re-replicates itself??

thanks in advance.

-Matt
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500 vs AVS ?

2008-09-10 Thread Ralf Ramge
Matt Beebe wrote:

 But what happens to the secondary server?  Specifically to its bit-for-bit 
 copy of Drive #2... presumably it is still good, but ZFS will offline that 
 disk on the primary server, replicate the metadata, and when/if I promote 
 the seconday server, it will also be running in a degraded state (ie: 3 out 
 of 4 drives).  correct?



Correct.

 In this scenario, my replication hasn't really bought me any increased 
 availablity... or am I missing something?  



No. You have an increase of availability when the entire primary node 
goes down, but you're not particularly safer when it comes to decreased 
zpools.


 Also, if I do chose to fail over to the secondary, can I just to a scrub the 
 broken drive (which isn't really broken, but the zpool would be 
 inconsistent at some level with the other online drives) and get back to 
 full speed quickly? or will I always have to wait until one of the servers 
 resilvers itself (from scratch?), and re-replicates itself??


I have not tested this scenario, so I can't say anything about this.

-- 

Ralf Ramge
Senior Solaris Administrator, SCNA, SCSA

Tel. +49-721-91374-3963
[EMAIL PROTECTED] - http://web.de/

11 Internet AG
Brauerstraße 48
76135 Karlsruhe

Amtsgericht Montabaur HRB 6484

Vorstand: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Thomas 
Gottschlich, Matthias Greve, Robert Hoffmann, Markus Huhn, Oliver Mauss, 
Achim Weiss
Aufsichtsratsvorsitzender: Michael Scheeren
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500 vs AVS ?

2008-09-09 Thread Ralf Ramge
Richard Elling wrote:

 Yes, you're right. But sadly, in the mentioned scenario of having 
 replaced an entire drive, the entire disk is rewritten by ZFS.
 
 No, this is not true.  ZFS only resilvers data.

Okay, I see we have a communication problem here. Probably my fault, I
should have written the entire data and metadata.
I made the assumption that a 1 TB drive in a X4500 may have up to 1 TB
of data on it. Simply because nobody buys the 1 TB X4500 just to use 10%
of the disk space, he would have bought the 250 GB, 500 GB or 750 GB
model then.
In any case and any disk size scenario, that's something you don't want
to have on your network if there's a chance to avoid this.

-- 

Ralf Ramge
Senior Solaris Administrator, SCNA, SCSA

Tel. +49-721-91374-3963
[EMAIL PROTECTED] - http://web.de/

11 Internet AG
Brauerstraße 48
76135 Karlsruhe

Amtsgericht Montabaur HRB 6484

Vorstand: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Thomas
Gottschlich, Matthias Greve, Robert Hoffmann, Markus Huhn, Oliver Mauss,
Achim Weiss
Aufsichtsratsvorsitzender: Michael Scheeren

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500 vs AVS ?

2008-09-09 Thread Richard Elling
Ralf Ramge wrote:
 Richard Elling wrote:

 Yes, you're right. But sadly, in the mentioned scenario of having 
 replaced an entire drive, the entire disk is rewritten by ZFS.

 No, this is not true.  ZFS only resilvers data.

 Okay, I see we have a communication problem here. Probably my fault, I 
 should have written the entire data and metadata.
 I made the assumption that a 1 TB drive in a X4500 may have up to 1 TB 
 of data on it. Simply because nobody buys the 1 TB X4500 just to use 
 10% of the disk space, he would have bought the 250 GB, 500 GB or 750 
 GB model then.

Actually, they do :-)  Some storage vendors insist on it, to keep
performance up -- short-stroking.

I've done several large-scale surveys of this and the average usage
is 50%.  This is still a large difference in resilver times between
ZFS and SVM.

 In any case and any disk size scenario, that's something you don't 
 want to have on your network if there's a chance to avoid this.

Agree 100%.
-- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500 vs AVS ?

2008-09-08 Thread Ralf Ramge
Jim Dunham wrote:
[...]


Jim, at first: I never said that AVS is a bad product. And I never will. 
  I wonder why you act as if you were attacked personally.

To be honest, if I were a customer with the original question, such a 
reaction wouldn't make me feel safer.


 - ZFS is not aware of AVS. On the secondary node, you'll always have to
 force the `zfs import` due to the unnoticed changes of metadata (zpool
 in use).
 
 This is not true. If on the primary node invokes zpool export while 
 replication is still active, then a forced zpool import is not 
 required. This behavior is the same as with a zpool on dual-ported or 
 SAN storage, and is NOT specific to AVS.

Jim. A graceful shutdown of the primary node may be a valid desaster 
scenario in the laboratory, but it never will be in the real life.

 
 No mechanism to prevent data loss exists, e.g. zpools can be
 imported when the replicator is *not* in logging mode.
 
 This behavior is the same as with a zpool on dual-ported or SAN storage, 
 and is NOT specific to AVS.

And what makes you think that I said that AVS is the problem here?

And by the way, the customer doesn't care *why* there's a problem. He 
only wants to know *if* there's a problem.

 - AVS is not ZFS aware.
 
 AVS is not UFS, QFS, Oracle, Sybase aware either. This makes AVS, and 
 other host based and controller based replication services 
 multi-functional. If you desire ZFS aware functionality, use ZFS send 
 and recv.

Yes, exactly. And that's the problem, sind `zfs send` and `zfs receive` 
are no working solution in a fail-safe two node environment. Again: the 
customer doesn't care *why* there's a problem. He only wants to know 
*if* there's a problem.

 For instance, if ZFS resilves a mirrored disk,
 e.g. after replacing a drive, the complete disk is sent over the network
 to the secondary node, even though the replicated data on the secondary
 is intact.
 
 The complete disk IS NOT sent of the over the network to the secondary 
 node, only those disk blocks that re-written by ZFS. 

Yes, you're right. But sadly, in the mentioned scenario of having 
replaced an entire drive, the entire disk is rewritten by ZFS.

Again: And what makes you think that I said that AVS is the problem here?

 - ZFS  AVS  X4500 leads to a bad error handling. The Zpool may not be
 imported on the secondary node during the replication.
 
 This behavior is the same as with a zpool on dual-ported or SAN storage, 
 and is NOT specific to AVS.

Again: And what makes you think that I said that AVS is the problem 
here? We are not on avs-discuss, Jim.

 I don't understand the relevance to AVS in the prior three paragraphs?

We are not on avs-discuss, Jim. The customer wanted to know what 
drawbacks exist in his *scenario*. Not AVS.

 - I gave AVS a set of 6 drives just for the bitmaps (using SVM soft
 partitions). Weren't enough, the replication was still very slow,
 probably because of an insane amount of head movements, and scales
 badly. Putting the bitmap of a drive on the drive itself (if I remember
 correctly, this is recommended in one of the most referenced howto blog
 articles) is a bad idea. Always use ZFS on whole disks, if performance
 and caching matters to you.
 
 When you have the time, can you replace the probably because of ...  
 with some real performance numbers?

No problem. If you please organize a TryBuy of two X4500 server being 
sent to my address, thank you.


 - AVS seems to require an additional shared storage when building
 failover clusters with 48 TB of internal storage. That may be hard to
 explain to the customer. But I'm not 100% sure about this, because I
 just didn't find a way, I didn't ask on a mailing list for help.
 
 When you have them time, can you replace the AVS seems to ...  with 
 some specific references to what you are referring to?

The installation and configuration process and the location where AVS 
wants to store the shared database. I can tell you details about it the 
next time I give it try. Until then, please read the last sentence you 
quoted once more, thank you.

 If you want a fail-over solution for important data, use the external
 JBODs. Use AVS only to mirror complete clusters, don't use it to
 replicate single boxes with local drives. And, in case OpenSolaris is
 not an option for you due to your company policies or support contracts,
 building a real cluster also A LOT cheaper.
 
 You are offering up these position statements based on what?

My outline agreements, my support contracts, partner web desk and 
finally my experience with projects in high availability scenarios with 
tens of thousands of servers.


Jim, it's okay. I know that you're a project leader at Sun Microsystems 
and that AVS is your main concern. But if there's one thing I cannot 
withstand, it's getting stroppy replies from someone who should know 
better and should have realized that he's acting publicly and in front 
of the people who finance his income instead of 

Re: [zfs-discuss] x4500 vs AVS ?

2008-09-07 Thread Jim Dunham
Jorgen,


 If we get two x4500s, and look at AVS, would it be possible to:

 1) Setup AVS to replicate zfs, and zvol (ufs) from 01 - 02 ?  
 Supported
 by Sol 10 5/08 ?

For Solaris 10, one will need to purchase AVS. It was not until  
OpenSolaris, that AVS became bundled. Also the OpenSolaris version  
will not run on Solaris 10.

 Assuming 1, if we setup a home-made IP fail-over so that; should 01 go
 down, all clients are redirected to 02.


 2) Fail-back, are there methods in AVS to handle fail-back?

Yes, its called SNDR reverse synchronization, and is key feature of  
SNDR and its ability to create DR site.


 Since 02 has
 been used, it will have newer/modified files, and will need to  
 replicate
 backwards until synchronised, before fail-back can occur.

SNDR supports on demand pull, which means that once reverse  
synchronization has been started, the SNDR primary volumes can be  
accessed. In addition to the background resilvering of difference,  
those blocks requested on demand, will be included in the reverse  
synchronization.

 We did ask our vendor, but we were just told that AVS does not  
 support  x4500.

AVS works with any Solaris blocks storage device, independent of  
platform. Period.




 Lund

 -- 
 Jorgen Lundman   | [EMAIL PROTECTED]
 Unix Administrator   | +81 (0)3 -5456-2687 ext 1017 (work)
 Shibuya-ku, Tokyo| +81 (0)90-5578-8500  (cell)
 Japan| +81 (0)3 -3375-1767  (home)
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Jim Dunham
Engineering Manager
Storage Platform Software Group
Sun Microsystems, Inc.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500 vs AVS ?

2008-09-07 Thread Jim Dunham
Ralf,

 [EMAIL PROTECTED] wrote:

  War wounds?  Could you please expand on the why a bit more?

 - ZFS is not aware of AVS. On the secondary node, you'll always have  
 to
 force the `zfs import` due to the unnoticed changes of metadata (zpool
 in use).

This is not true. If on the primary node invokes zpool export while  
replication is still active, then a forced zpool import is not  
required. This behavior is the same as with a zpool on dual-ported or  
SAN storage, and is NOT specific to AVS.

 No mechanism to prevent data loss exists, e.g. zpools can be
 imported when the replicator is *not* in logging mode.

This behavior is the same as with a zpool on dual-ported or SAN  
storage, and is NOT specific to AVS.

 - AVS is not ZFS aware.

AVS is not UFS, QFS, Oracle, Sybase aware either. This makes AVS, and  
other host based and controller based replication services multi- 
functional. If you desire ZFS aware functionality, use ZFS send and  
recv.

 For instance, if ZFS resilves a mirrored disk,
 e.g. after replacing a drive, the complete disk is sent over the  
 network
 to the secondary node, even though the replicated data on the  
 secondary
 is intact.

The complete disk IS NOT sent of the over the network to the secondary  
node, only those disk blocks that re-written by ZFS. This has to be  
this way, since ZFS does not differentiate between writes caused by re- 
silvering, and writes caused my new ZFS filesystem operations.  
Furthermore, only those portions of the ZFS storage pool are  
replicated in this scenario, not every block in the entire storage pool.

 That's a lot of fun with today's disk sizes of 750 GB and 1 TB drives,
 resulting in usually 10+ hours without real redundancy (customers who
 use Thumpers to store important data usually don't have the budget to
 connect their data centers with 10 Gbit/s, so expect 10+ hours *per  
 disk*).

If once creates a ZFS Storage pool whose size is 1 TB, then enables  
AVS after the fact, AVS can not differentiate between blocks that are  
in use by ZFS from those that are not, therefore AVS needs to  
replicate then entire TB of storage.

If one enables AVS first, before the volumes are places in a ZFS  
storage pool, then the sndradm -E ..., option can be used. Then when  
the ZFS storage pool is created, only those I/Os need to initial the  
pool need be replicated.

If one has a ZFS storage pool that is quite large, but in actuality  
there is little of the storage pool in use, by enabling SNDR first on  
a placement volume, then invoking zpool replace ... on multiple  
'vdevs' in the storage pool, and optimal replication of the ZFS  
storage pool can be done.


 - ZFS  AVS  X4500 leads to a bad error handling. The Zpool may not  
 be
 imported on the secondary node during the replication.

This behavior is the same as with a zpool on dual-ported or SAN  
storage, and is NOT specific to AVS.

 The X4500 does
 not have a RAID controller which signals (and handles) drive faults.
 Drive failures on the secondary node may happen unnoticed until the
 primary nodes goes down and you want to import the zpool on the
 secondary node with the broken drive. Since ZFS doesn't offer a  
 recovery
 mechanism like fsck, data loss of up to 20 TB may occur.
 If you use AVS with ZFS, make sure that you have a storage which  
 handles
 drive failures without OS interaction.

 - 5 hours for scrubbing a 1 TB drive. If you're lucky. Up to 48 drives
 in total.

 - An X4500 has no battery buffered write cache. ZFS uses the server's
 RAM as a cache, 15 GB+. I don't want to find out how much time a
 resilver over the network after a power outage may take (a full  
 reverse
 replication would take up to 2 weeks and is no valid option in a  
 serious
 production environment). But the underlying question I asked myself is
 why I should I want to replicate data in such an expensive way, when I
 think the 48 TB data itself are not important enough to be protected  
 by
 a battery?

I don't understand the relevance to AVS in the prior three paragraphs?

 - I gave AVS a set of 6 drives just for the bitmaps (using SVM soft
 partitions). Weren't enough, the replication was still very slow,
 probably because of an insane amount of head movements, and scales
 badly. Putting the bitmap of a drive on the drive itself (if I  
 remember
 correctly, this is recommended in one of the most referenced howto  
 blog
 articles) is a bad idea. Always use ZFS on whole disks, if performance
 and caching matters to you.

When you have the time, can you replace the probably because of ...   
with some real performance numbers?

 - AVS seems to require an additional shared storage when building
 failover clusters with 48 TB of internal storage. That may be hard to
 explain to the customer. But I'm not 100% sure about this, because I
 just didn't find a way, I didn't ask on a mailing list for help.

When you have them time, can you replace the AVS seems to ...  with  
some specific 

Re: [zfs-discuss] x4500 vs AVS ?

2008-09-05 Thread Ralf Ramge
[EMAIL PROTECTED] wrote:

   War wounds?  Could you please expand on the why a bit more?



- ZFS is not aware of AVS. On the secondary node, you'll always have to 
force the `zfs import` due to the unnoticed changes of metadata (zpool 
in use). No mechanism to prevent data loss exists, e.g. zpools can be 
imported when the replicator is *not* in logging mode.

- AVS is not ZFS aware. For instance, if ZFS resilves a mirrored disk, 
e.g. after replacing a drive, the complete disk is sent over the network 
to the secondary node, even though the replicated data on the secondary 
is intact.
That's a lot of fun with today's disk sizes of 750 GB and 1 TB drives, 
resulting in usually 10+ hours without real redundancy (customers who 
use Thumpers to store important data usually don't have the budget to
connect their data centers with 10 Gbit/s, so expect 10+ hours *per disk*).

- ZFS  AVS  X4500 leads to a bad error handling. The Zpool may not be 
imported on the secondary node during the replication. The X4500 does 
not have a RAID controller which signals (and handles) drive faults. 
Drive failures on the secondary node may happen unnoticed until the 
primary nodes goes down and you want to import the zpool on the 
secondary node with the broken drive. Since ZFS doesn't offer a recovery 
mechanism like fsck, data loss of up to 20 TB may occur.
If you use AVS with ZFS, make sure that you have a storage which handles 
drive failures without OS interaction.

- 5 hours for scrubbing a 1 TB drive. If you're lucky. Up to 48 drives 
in total.

- An X4500 has no battery buffered write cache. ZFS uses the server's 
RAM as a cache, 15 GB+. I don't want to find out how much time a 
resilver over the network after a power outage may take (a full reverse 
replication would take up to 2 weeks and is no valid option in a serious 
production environment). But the underlying question I asked myself is 
why I should I want to replicate data in such an expensive way, when I 
think the 48 TB data itself are not important enough to be protected by 
a battery?


- I gave AVS a set of 6 drives just for the bitmaps (using SVM soft 
partitions). Weren't enough, the replication was still very slow, 
probably because of an insane amount of head movements, and scales
badly. Putting the bitmap of a drive on the drive itself (if I remember 
correctly, this is recommended in one of the most referenced howto blog 
articles) is a bad idea. Always use ZFS on whole disks, if performance 
and caching matters to you.

- AVS seems to require an additional shared storage when building 
failover clusters with 48 TB of internal storage. That may be hard to 
explain to the customer. But I'm not 100% sure about this, because I 
just didn't find a way, I didn't ask on a mailing list for help.


If you want a fail-over solution for important data, use the external 
JBODs. Use AVS only to mirror complete clusters, don't use it to 
replicate single boxes with local drives. And, in case OpenSolaris is 
not an option for you due to your company policies or support contracts, 
building a real cluster also A LOT cheaper.


-- 

Ralf Ramge
Senior Solaris Administrator, SCNA, SCSA

Tel. +49-721-91374-3963
[EMAIL PROTECTED] - http://web.de/

11 Internet AG
Brauerstraße 48
76135 Karlsruhe

Amtsgericht Montabaur HRB 6484

Vorstand: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Thomas 
Gottschlich, Matthias Greve, Robert Hoffmann, Markus Huhn, Oliver Mauss, 
Achim Weiss
Aufsichtsratsvorsitzender: Michael Scheeren
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500 vs AVS ?

2008-09-05 Thread Richard Elling
[jumping ahead and quoting myself]
AVS is not a mirroring technology, it is a remote replication technology.
So, yes, I agree 100% that people should not expect AVS to be a mirror.


Ralf Ramge wrote:
 [EMAIL PROTECTED] wrote:

   
   War wounds?  Could you please expand on the why a bit more?
 



 - ZFS is not aware of AVS. On the secondary node, you'll always have to 
 force the `zfs import` due to the unnoticed changes of metadata (zpool 
 in use). No mechanism to prevent data loss exists, e.g. zpools can be 
 imported when the replicator is *not* in logging mode.
   

ZFS isn't special in this regard, AFAIK all file systems, databases and
other data stores suffer from the same issue with remote replication.

 - AVS is not ZFS aware. For instance, if ZFS resilves a mirrored disk, 
 e.g. after replacing a drive, the complete disk is sent over the network 
 to the secondary node, even though the replicated data on the secondary 
 is intact.
 That's a lot of fun with today's disk sizes of 750 GB and 1 TB drives, 
 resulting in usually 10+ hours without real redundancy (customers who 
 use Thumpers to store important data usually don't have the budget to
 connect their data centers with 10 Gbit/s, so expect 10+ hours *per disk*).
   

ZFS only resilvers data.  Other LVMs, like SVM, will resilver the entire 
disk,
though.

 - ZFS  AVS  X4500 leads to a bad error handling. The Zpool may not be 
 imported on the secondary node during the replication. The X4500 does 
 not have a RAID controller which signals (and handles) drive faults. 
 Drive failures on the secondary node may happen unnoticed until the 
 primary nodes goes down and you want to import the zpool on the 
 secondary node with the broken drive. Since ZFS doesn't offer a recovery 
 mechanism like fsck, data loss of up to 20 TB may occur.
 If you use AVS with ZFS, make sure that you have a storage which handles 
 drive failures without OS interaction.
   

If this is the case, then array-based replication would also be similarly
affected by this architectural problem.  In other words, if you say that
a software RAID system cannot be replicated by a software replicator,
then TrueCopy, SRDF, and other RAID array-based (also software)
replicators also do not work.  I think there is enough empirical evidence
that they do work.  I can see where there might be a best practice here,
but I see no fundamental issue.

fsck does not recover data, it only recovers metadata.

 - 5 hours for scrubbing a 1 TB drive. If you're lucky. Up to 48 drives 
 in total.
   

ZFS only scrubs data.  But it is not unusual for a lot of data scrubbing to
take a long time.  ZFS only performs read scrubs, so there is no replication
required during a ZFS scrub, unless data is repaired.

 - An X4500 has no battery buffered write cache. ZFS uses the server's 
 RAM as a cache, 15 GB+. I don't want to find out how much time a 
 resilver over the network after a power outage may take (a full reverse 
 replication would take up to 2 weeks and is no valid option in a serious 
 production environment). But the underlying question I asked myself is 
 why I should I want to replicate data in such an expensive way, when I 
 think the 48 TB data itself are not important enough to be protected by 
 a battery?
   

ZFS will not be storing 15 GBytes of unflushed data on any system I can
imagine today.  While we can all agree that 48 TBytes will be painful to
replicate, that is not caused by ZFS -- though it is enabled by ZFS, because
some other file systems (UFS) cannot be as large as 48 TBytes.

 - I gave AVS a set of 6 drives just for the bitmaps (using SVM soft 
 partitions). Weren't enough, the replication was still very slow, 
 probably because of an insane amount of head movements, and scales
 badly. Putting the bitmap of a drive on the drive itself (if I remember 
 correctly, this is recommended in one of the most referenced howto blog 
 articles) is a bad idea. Always use ZFS on whole disks, if performance 
 and caching matters to you.
   

I think there are opportunities for perormance improvement, but don't
know who is currently actively working on this.

Actually, the cases where ZFS for whole disks is a big win are small.
And, of course, you can enable disk write caches by hand.

 - AVS seems to require an additional shared storage when building 
 failover clusters with 48 TB of internal storage. That may be hard to 
 explain to the customer. But I'm not 100% sure about this, because I 
 just didn't find a way, I didn't ask on a mailing list for help.


 If you want a fail-over solution for important data, use the external 
 JBODs. Use AVS only to mirror complete clusters, don't use it to 
 replicate single boxes with local drives. And, in case OpenSolaris is 
 not an option for you due to your company policies or support contracts, 
 building a real cluster also A LOT cheaper.
   

AVS is not a mirroring technology, it is a remote replication technology.
So, yes, I agree 

Re: [zfs-discuss] x4500 vs AVS ?

2008-09-04 Thread Ralf Ramge
Jorgen Lundman wrote:

 We did ask our vendor, but we were just told that AVS does not support 
 x4500.


The officially supported AVS works on the X4500 since the X4500 came 
out. But, although Jim Dunham and others will tell you otherwise, I 
absolutely can *not* recommend using it on this hardware with ZFS, 
especially with the larger disk sizes. At least not for important, or 
even business critical data - in such a case, using X41x0 servers with
J4500 JBODs and a HAStoragePlus Cluster instead of AVS may be a much 
better and more reliable option, for basically the same price.




-- 

Ralf Ramge
Senior Solaris Administrator, SCNA, SCSA

Tel. +49-721-91374-3963
[EMAIL PROTECTED] - http://web.de/

11 Internet AG
Brauerstraße 48
76135 Karlsruhe

Amtsgericht Montabaur HRB 6484

Vorstand: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Thomas 
Gottschlich, Matthias Greve, Robert Hoffmann, Markus Huhn, Oliver Mauss, 
Achim Weiss
Aufsichtsratsvorsitzender: Michael Scheeren
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500 vs AVS ?

2008-09-04 Thread Brent Jones
On Thu, Sep 4, 2008 at 12:19 AM, Ralf Ramge [EMAIL PROTECTED] wrote:

 Jorgen Lundman wrote:

  We did ask our vendor, but we were just told that AVS does not support
  x4500.


 The officially supported AVS works on the X4500 since the X4500 came
 out. But, although Jim Dunham and others will tell you otherwise, I
 absolutely can *not* recommend using it on this hardware with ZFS,
 especially with the larger disk sizes. At least not for important, or
 even business critical data - in such a case, using X41x0 servers with
 J4500 JBODs and a HAStoragePlus Cluster instead of AVS may be a much
 better and more reliable option, for basically the same price.




 --

 Ralf Ramge
 Senior Solaris Administrator, SCNA, SCSA

I did some Googling, but I saw some limitations sharing your ZFS pool
via NFS while using HAStorage Cluster product as well.
Do similar limitations exist for sharing via the built in CIFS in
OpenSolaris as well?

Here:
http://docs.sun.com/app/docs/doc/820-2565/z4000275997776?a=view


Zettabyte File System (ZFS) Restrictions

If you are using the zettabyte file system (ZFS) as the exported file
system, you must set the sharenfs property to off.

To set the sharenfs property to off, run the following command.

$ zfs set sharenfs=off file_system/volume

To verify if the sharenfs property is set to off, run the following command.

$ zfs get sharenfs file_system/volume




--
Brent Jones
[EMAIL PROTECTED]
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500 vs AVS ?

2008-09-04 Thread Ralf Ramge
Brent Jones wrote:

 I did some Googling, but I saw some limitations sharing your ZFS pool
 via NFS while using HAStorage Cluster product as well.
[...]
   If you are using the zettabyte file system (ZFS) as the exported file
 system, you must set the sharenfs property to off.

That's not a limitation, just looks like one. The cluster's resource 
type called SUNW.nfs decides if a file system is shared or not. And it 
does this with the usual share and unshare commands in a separate 
dfstab file. The ZFS sharenfs flag is set to off to avoid conflicts.

-- 

Ralf Ramge
Senior Solaris Administrator, SCNA, SCSA

Tel. +49-721-91374-3963
[EMAIL PROTECTED] - http://web.de/

11 Internet AG
Brauerstraße 48
76135 Karlsruhe

Amtsgericht Montabaur HRB 6484

Vorstand: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Thomas 
Gottschlich, Matthias Greve, Robert Hoffmann, Markus Huhn, Oliver Mauss, 
Achim Weiss
Aufsichtsratsvorsitzender: Michael Scheeren
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500 vs AVS ?

2008-09-04 Thread Wade . Stuart
[EMAIL PROTECTED] wrote on 09/04/2008 02:19:23 AM:

 Jorgen Lundman wrote:

  We did ask our vendor, but we were just told that AVS does not support
  x4500.


 The officially supported AVS works on the X4500 since the X4500 came
 out. But, although Jim Dunham and others will tell you otherwise, I
 absolutely can *not* recommend using it on this hardware with ZFS,
 especially with the larger disk sizes. At least not for important, or
 even business critical data - in such a case, using X41x0 servers with
 J4500 JBODs and a HAStoragePlus Cluster instead of AVS may be a much
 better and more reliable option, for basically the same price.


Ralf,

  War wounds?  Could you please expand on the why a bit more?

-Wade

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500 vs AVS ?

2008-09-04 Thread Al Hopper
On Thu, Sep 4, 2008 at 10:09 AM,  [EMAIL PROTECTED] wrote:
 [EMAIL PROTECTED] wrote on 09/04/2008 02:19:23 AM:

 Jorgen Lundman wrote:

  We did ask our vendor, but we were just told that AVS does not support
  x4500.


 The officially supported AVS works on the X4500 since the X4500 came
 out. But, although Jim Dunham and others will tell you otherwise, I
 absolutely can *not* recommend using it on this hardware with ZFS,
 especially with the larger disk sizes. At least not for important, or
 even business critical data - in such a case, using X41x0 servers with
 J4500 JBODs and a HAStoragePlus Cluster instead of AVS may be a much
 better and more reliable option, for basically the same price.


 Ralf,

  War wounds?  Could you please expand on the why a bit more?

+1   I'd also be interested in more details.

Thanks,

-- 
Al Hopper Logical Approach Inc,Plano,TX [EMAIL PROTECTED]
 Voice: 972.379.2133 Timezone: US CDT
OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007
http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500 vs AVS ?

2008-09-04 Thread Brent Jones
On Thu, Sep 4, 2008 at 7:38 PM, Al Hopper [EMAIL PROTECTED] wrote:
 On Thu, Sep 4, 2008 at 10:09 AM,  [EMAIL PROTECTED] wrote:
 [EMAIL PROTECTED] wrote on 09/04/2008 02:19:23 AM:

 Jorgen Lundman wrote:

  We did ask our vendor, but we were just told that AVS does not support
  x4500.


 The officially supported AVS works on the X4500 since the X4500 came
 out. But, although Jim Dunham and others will tell you otherwise, I
 absolutely can *not* recommend using it on this hardware with ZFS,
 especially with the larger disk sizes. At least not for important, or
 even business critical data - in such a case, using X41x0 servers with
 J4500 JBODs and a HAStoragePlus Cluster instead of AVS may be a much
 better and more reliable option, for basically the same price.


 Ralf,

  War wounds?  Could you please expand on the why a bit more?

 +1   I'd also be interested in more details.

 Thanks,

 --
 Al Hopper Logical Approach Inc,Plano,TX [EMAIL PROTECTED]
  Voice: 972.379.2133 Timezone: US CDT
 OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007
 http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Story time!

-- 
Brent Jones
[EMAIL PROTECTED]
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500 vs AVS ?

2008-09-03 Thread Marion Hakanson
[EMAIL PROTECTED] said:
 We did ask our vendor, but we were just told that AVS does not support
 x4500. 

You might have to use the open-source version of AVS, but it's not
clear if that requires OpenSolaris or if it will run on Solaris-10.
Here's a description of how to set it up between two X4500's:

  http://blogs.sun.com/AVS/entry/avs_and_zfs_seamless

Regards,

Marion


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss