Re: [zfs-discuss] ZFS on Hitachi SAN, pool recovery

2008-09-25 Thread alan baldwin
Good question.
Well, the hosts are Netbackup Media servers. The idea behind the design is that 
we stream the RMAN stuff to disk, via NFS mounts, and then write to tape during 
the day. With the SAN attached disks sitting on these hosts and with disk 
storage units configured for NBU the data stream only hits the network once, at 
a quiet time.
In this instance the need for high availability is not really there.
The real driver behind this of course is probably the same for most 
companiescost.

BTW, another question if I may.
I noticed that when mounting the ZFS datasets on to individual nodes I have to 
change the permissions to 777 to allow the oracle user to write to them.
It was my understanding that sharenfs=on allows rw by default.
Am I doing something wrong here?
Again, all help much appreciated.
Max
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on Hitachi SAN, pool recovery

2008-09-25 Thread Richard Elling
Richard Elling wrote:
 Tim Haley wrote:
 Vincent Fox wrote:
  
 Just make SURE the other host is actually truly DEAD!

 If for some reason it's simply wedged, or you have lost console 
 access but the hostA is still live, then you can end up with 2 
 systems having access to same ZFS pool.

 I have done this in test, 2 hosts accessing same pool, and the 
 result is catastrophic pool corruption.

 I use the simple method if I think hostA is dead, I call the 
 operators and get them to pull the power cords out of it just to be 
 certain.  Then I force import on hostB with certainty.
 -- 
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 

 This is a common cluster scenario, you need to make sure the other 
 node is dead, so you force that result.  In lustre set-ups they 
 recommend a STONITH (Shoot the Other Node in the Head) approach.  
 They use a combo of a heartbeat setup like described here:

 http://www.linux-ha.org/Heartbeat

 and then something like the powerman framework to 'kill' the offline 
 node.

   Perhaps those things could be made to run on Solaris if they don't 
 already.
   

 Of course, Solaris Cluster (and the corresponding open source effort:
 Open HA Cluster) manage cluster membership and data access.  We
 also use SCSI reservations, so that a rogue node cannot even see the
 data.  IMHO, if you do this without reservations, then you are dancing
 with the devil in the details.

No sooner had I mentioned this, when the optional fencing project
was integrated into Open HA Cluster.  So you will be able to dance
with the devil, even with Solaris Cluster, if you want.
http://blogs.sun.com/sc/

 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on Hitachi SAN, pool recovery

2008-09-24 Thread alan baldwin
Thanks to all for your comments and sharing your experiences.

In my setup the pools are split and then NFS mounted to other nodes, mostly 
Oracle DB boxes. These mounts will provide areas for RMAN Flash backups to be 
written.
If I lose connectivity to any host I will swing the luns over to the alternate 
host and the NFS mount will be repointed on the Oracle node, so 
[u]hopefully[/u] we should be safe with regards pool corruption.

Thanks again.
Max
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on Hitachi SAN, pool recovery

2008-09-24 Thread Marcelo Leal
Just curiosity, why donĀ“t use SC?

 Leal.
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on Hitachi SAN, pool recovery

2008-09-23 Thread Vincent Fox
Just make SURE the other host is actually truly DEAD!

If for some reason it's simply wedged, or you have lost console access but the 
hostA is still live, then you can end up with 2 systems having access to same 
ZFS pool.

I have done this in test, 2 hosts accessing same pool, and the result is 
catastrophic pool corruption.

I use the simple method if I think hostA is dead, I call the operators and get 
them to pull the power cords out of it just to be certain.  Then I force import 
on hostB with certainty.
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss