Re: [Veritas-ha] Cluster Interconnect cables: Direct connect orVLANs?
My only addition to the comments by the esteemed gentleman from Virginia is to make sure you have a solid practice in place to manage cluster ID when you go VLAN, as there may be cases when your network people “cross the streams” From: veritas-ha-boun...@mailman.eng.auburn.edu [mailto:veritas-ha-boun...@mailman.eng.auburn.edu] On Behalf Of Eric Hennessey Sent: Wednesday, September 16, 2009 11:59 AM To: Jon Price; veritas-ha@mailman.eng.auburn.edu Subject: Re: [Veritas-ha] Cluster Interconnect cables: Direct connect orVLANs? The configuration you’re considering – running your cluster interconnects over two separate VLANs – is actually our preferred and recommended method, even when deploying a simple 2-node cluster. While using direct connections between cluster nodes is simple and convenient, it becomes problematic if you decide to add a node to the cluster. Rest easy with your design. :-) Eric From: veritas-ha-boun...@mailman.eng.auburn.edu [mailto:veritas-ha-boun...@mailman.eng.auburn.edu] On Behalf Of Jon Price Sent: Tuesday, September 15, 2009 3:58 PM To: veritas-ha@mailman.eng.auburn.edu Subject: [Veritas-ha] Cluster Interconnect cables: Direct connect or VLANs? Hi, For Veritas Cluster 5.0 We also have Storage Foundation for Oracle. Currently we use direct connect cables between the two nodes in our Veritas Cluster for the heartbeat. However, we are switching to new systems and running the direct connect cables is more difficult than it used to be. So, we are considering the use of two VLANs for this purpose. I believe that traffic on these two VLANs is limited to only Cluster heartbeat connections (though not just ours). What is the downside of using VLANs for the heartbeat? In what scenarios could problems develop? I'm concerned that if our network has a serious problem and goes down that each Node in the Cluster might be isolated and both Nodes import the disk groups, mount volumes, etc and thus data corruption. Is data corruption a possibility if the entire network goes down or in other scenarios? Does Veritas also use quorum or any other methods to protect against split brain induced damage? Thanks ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] LLT heartbeat redundancy
This is not a limitation, as you had two independent failures. Bonding would remove the ability to discriminate between a link and a node failure. My feeling is in the scenario you describe, VCS is operating properly, and it is not a limitation. If you have issues with port or cable failures, add a low pri connection on a third network. -Original Message- From: Imri Zvik [mailto:im...@inter.net.il] Sent: Sunday, May 03, 2009 11:57 AM To: Jim Senicka Cc: veritas-ha@mailman.eng.auburn.edu Subject: Re: [Veritas-ha] LLT heartbeat redundancy On Sunday 03 May 2009 18:25:08 Jim Senicka wrote: You had 2 failures. No real way to design around that. GAB visible would prevent bad things from occurring. Thank you for the fast response :) Well, In linux I can use the bonding module to aggregate the interfaces and work around this limitation. I've read in this discussion: http://www.mail-archive.com/veritas-ha@mailman.eng.auburn.edu/msg01016.h tml That since 5.0MP3 there is a cross-platform solution (I need this for Solaris 10). Do you happen to know more about this feature? Thanks! P.S. Does anyone knows if Sun Cluster has the same limitation? ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] LLT heartbeat redundancy
LLT is designed to use jeopardy to detect the difference between single link fail and dual link fail in most situations. Having a single mesh may remove this capability. Let me check on this with engineering and see if we have any more up to date recommendations -Original Message- From: veritas-ha-boun...@mailman.eng.auburn.edu [mailto:veritas-ha-boun...@mailman.eng.auburn.edu] On Behalf Of Imri Zvik Sent: Sunday, May 03, 2009 12:18 PM To: Jim Senicka Cc: veritas-ha@mailman.eng.auburn.edu Subject: Re: [Veritas-ha] LLT heartbeat redundancy On Sunday 03 May 2009 19:03:16 Jim Senicka wrote: This is not a limitation, as you had two independent failures. Bonding would remove the ability to discriminate between a link and a node failure. I didn't understand this one - With bonding I can maintain full mesh topology - No matter which one of the links fails, if a node still has at least one active link, LLT will still be able to see all the other nodes. This achieves greater HA than without the bonding. My feeling is in the scenario you describe, VCS is operating properly, and it is not a limitation. Of course it is operating properly - that's how it was designed to work :) I'm just saying that the cluster could be more redundant if it wasn't designed that way :) If you have issues with port or cable failures, add a low pri connection on a third network. ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] SUMMARY: filesystem corruption after the cluster nodereboot
Running a non journeled file system in a cluster is always a bad idea, as your recovery time is always effected by file system start up tasks. Running UFS in logging mode was usually a pretty big performance hit. Why not VxFS? -Original Message- From: veritas-ha-boun...@mailman.eng.auburn.edu [mailto:veritas-ha-boun...@mailman.eng.auburn.edu] On Behalf Of Aleksandr Nepomnyashchiy Sent: Tuesday, March 31, 2009 6:07 PM To: veritas-ha@mailman.eng.auburn.edu Subject: [Veritas-ha] SUMMARY: filesystem corruption after the cluster nodereboot Many thanks to Tom Stephens for his help in troubleshooting. What happened : Both fs1 and fs2 became corrupted after the node crash. Most probably VCS tried to FSCK both and was successful with fs1 (size ~4G) and didn't complete within the timeout period on fs2 (size ~100G). So, fsck of fs2 was killed and didn't leave anything in the engine_A.log Suggested actions: A) Implement UFS logging on both fs1 and fs2 - should eliminate the file system corruption and the need for FSCK (I will definitely implement this). B) Increase the OnlineTimeout value for the Mount type from the default of 300 seconds (this should be considered carefully, can cause troubles). PS I was considering adding -y in FsckOpt but it doesn't make any difference - online script adds -y option to fsck (regardless of whether you specify it ot not in the FsckOpt). This is the case for online script version 2.9 from 02/13/01 18:15:47. === Please see the original post below = Dear VCS gurus, Please help me to understand why only 1 out of 2 mount points came up after the carsh. I can see in the log that fs1 was fsck-ed by VCS and brought online. Was fsck even attempted on fs2? And if not why? VCS is 2.0, both fs1 and fs2 are ufs, nothing in FsckOpt. == engine_A.log from the healthy node = TAG_E 2009/03/26 18:25:55 (node_d) VCS:13001:Resource(mnt_fs1): Output of the completed operation (online) mount: the state of /dev/vx/dsk/mydg/fs1 is not okay and it was attempted to be mounted read/write mount: Please run fsck and try again ** /dev/vx/rdsk/mydg/fs1 ** Last Mounted on /mount/fs1 ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames ** Phase 3a - Check Connectivity ** Phase 3b - Verify Shadows/ACLs ** Phase 4 - Check Reference Counts ** Phase 5 - Check Cylinder Groups FILE SYSTEM STATE IN SUPERBLOCK IS WRONG; FIX? yes 7324 files, 2158506 used, 1773622 free (4910 frags, 221089 blocks, 0.1% fragmentation) TAG_E 2009/03/26 18:25:55 VCS:10298:Resource mnt_fs1 (Owner: unknown, Group: srvgrA) is online on node_d (VCS initiated) TAG_E 2009/03/26 18:30:07 (node_d) VCS:13003:Resource(mnt_fs2): Output of the timedout operation (online) mount: the state of /dev/vx/dsk/mydg/fs2 is not okay and it was attempted to be mounted read/write mount: Please run fsck and try again TAG_B 2009/03/26 18:30:07 (node_d) VCS:13012:Resource(mnt_fs2): online procedure did not complete within the expected time. TAG_D 2009/03/26 18:30:07 (node_d) VCS:13065:Agent is calling clean for resource(mnt_fs2) because online did not complete within the expected time. Thank you, Aleksandr ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] SFCFSRAC - node with the highest nodeid panics afternode with the lowest nodeid rejoins
What is your gabtab settings? You seem to have two independent cluster generations. You should have /sbin/gabconfig -c -n4 in gabtab -Original Message- From: Imri Zvik [mailto:im...@inter.net.il] Sent: Tuesday, March 17, 2009 10:16 AM To: Jim Senicka Cc: veritas-ha@mailman.eng.auburn.edu Subject: Re: [Veritas-ha] SFCFSRAC - node with the highest nodeid panics afternode with the lowest nodeid rejoins On Tuesday 17 March 2009 15:32:55 Jim Senicka wrote: A few questions 1. Do you have a support case open? Yes, for over two weeks. 2. Do you reconnect the FC before the node boots? Yes, FC is reconnected immediately after the panic. 3. Is the network available during boot time? Yes. GAB: port b is halting the system due to network failure essentially means that VXFEN is connecting between two clusters with different generation numbers, which should only happen if the clusters booted independent of each other, then were joined at the network level This is weird. As you can see from the logs I've attached before, the cluster nodes 1, 2 and 3 were members, and node 0 rejoined them ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] Removing VCS group
No. It means you do not have to do that. Sent from my Nokia E62 handheld by goodlink. -Original Message- From: i man [mailto:imanuk2...@googlemail.com] Sent: Monday, February 02, 2009 08:07 AM US Mountain Standard Time To: Jim Senicka Cc: veritas-ha@mailman.eng.auburn.edu Subject:Re: [Veritas-ha] Removing VCS group Jim, does it mean that I would need to do the same activity of removing diskgroup coponents and putting them into spare pool on both the parts of cluster ? Ciao. On Mon, Feb 2, 2009 at 2:15 PM, Jim Senicka james_seni...@symantec.comwrote: The diskgroup is destroyed. All info about a VxVM diskgroup is in the dg, so no need to do anything else (no info is on the host). In straight failover VxVM, the only tie point between VCS and VxVM is the VCS agent that imports and deports specified diskgroups. VxVM has no knowledge of VCS and VCS really only knows the name of a DG it is supposed to manage. Sent from my Nokia E62 handheld by goodlink. -Original Message- From: i man [mailto:imanuk2...@googlemail.com] Sent: Monday, February 02, 2009 06:18 AM US Mountain Standard Time To: veritas-ha@mailman.eng.auburn.edu Subject:Re: [Veritas-ha] Removing VCS group Thankyou to all for your help. Now I have some queries regarding the cluster. I have imported and destroyed the diskgroup on one system of the cluster. 1. Do I have to do it on both the systems of the cluster ? We have multipathing enabled on the systems. # vxdmpadm listctlr all CTLR-NAME ENCLR-TYPE STATE ENCLR-NAME = c7 EMC ENABLED MC0 c6 EMC ENABLED MC0 c0 DiskENABLED Disk c7 EMC ENABLED MC1 c6 EMC ENABLED MC1 I am still a little confused as to the integration of the vxvm and vcs. Can somebody send me some link as well which shows how they are constructed together so that I have better understanding. Ciao. On Fri, Jan 30, 2009 at 11:29 AM, Jim Senicka james_seni...@symantec.com wrote: Removal of the service group has zero effect on the storage. You need to use appropriate VxVM commands to manage the disk group. The vxprint command is VxVM and has nothing to do with VCS. Removing the service group was fine. Now you need to complete the VxVM work. Sent from my Nokia E62 handheld by goodlink. -Original Message- From: i man [mailto:imanuk2...@googlemail.com] Sent: Friday, January 30, 2009 04:26 AM US Mountain Standard Time To: veritas-ha@mailman.eng.auburn.edu Subject:[Veritas-ha] Removing VCS group all, I think I'm i bit of trouble. Im trying to remove a cluster service gorup which has a veritas disk group configured . My task is to free up the disks used by the removel of SG and DG and move them to free pool. From the cluster GUI. I have removed the resources, the SG. some questions regarding the same... 1. Did the Veritas disk group got deleted automatically when I removed the cluster component s? 2. I could not see any service group thorugh vxprint command now. 3. How could I now move the disks from the service gorup pool ? Ciao. ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] Removing VCS group
The diskgroup is destroyed. All info about a VxVM diskgroup is in the dg, so no need to do anything else (no info is on the host). In straight failover VxVM, the only tie point between VCS and VxVM is the VCS agent that imports and deports specified diskgroups. VxVM has no knowledge of VCS and VCS really only knows the name of a DG it is supposed to manage. Sent from my Nokia E62 handheld by goodlink. -Original Message- From: i man [mailto:imanuk2...@googlemail.com] Sent: Monday, February 02, 2009 06:18 AM US Mountain Standard Time To: veritas-ha@mailman.eng.auburn.edu Subject:Re: [Veritas-ha] Removing VCS group Thankyou to all for your help. Now I have some queries regarding the cluster. I have imported and destroyed the diskgroup on one system of the cluster. 1. Do I have to do it on both the systems of the cluster ? We have multipathing enabled on the systems. # vxdmpadm listctlr all CTLR-NAME ENCLR-TYPE STATE ENCLR-NAME = c7 EMC ENABLED MC0 c6 EMC ENABLED MC0 c0 DiskENABLED Disk c7 EMC ENABLED MC1 c6 EMC ENABLED MC1 I am still a little confused as to the integration of the vxvm and vcs. Can somebody send me some link as well which shows how they are constructed together so that I have better understanding. Ciao. On Fri, Jan 30, 2009 at 11:29 AM, Jim Senicka james_seni...@symantec.comwrote: Removal of the service group has zero effect on the storage. You need to use appropriate VxVM commands to manage the disk group. The vxprint command is VxVM and has nothing to do with VCS. Removing the service group was fine. Now you need to complete the VxVM work. Sent from my Nokia E62 handheld by goodlink. -Original Message- From: i man [mailto:imanuk2...@googlemail.com] Sent: Friday, January 30, 2009 04:26 AM US Mountain Standard Time To: veritas-ha@mailman.eng.auburn.edu Subject:[Veritas-ha] Removing VCS group all, I think I'm i bit of trouble. Im trying to remove a cluster service gorup which has a veritas disk group configured . My task is to free up the disks used by the removel of SG and DG and move them to free pool. From the cluster GUI. I have removed the resources, the SG. some questions regarding the same... 1. Did the Veritas disk group got deleted automatically when I removed the cluster component s? 2. I could not see any service group thorugh vxprint command now. 3. How could I now move the disks from the service gorup pool ? Ciao. ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] VCS Configuration 1
Comments below with JS From: veritas-ha-boun...@mailman.eng.auburn.edu [mailto:veritas-ha-boun...@mailman.eng.auburn.edu] On Behalf Of i man Sent: Wednesday, January 21, 2009 2:36 PM To: veritas-ha@mailman.eng.auburn.edu Subject: [Veritas-ha] VCS Configuration 1 All, I am trying to configure a VCS resource. Have some confusion regarding the below points. 1) I have created the online, offline, monitor, clean scripts. Can anybody explain me how these scripts are called by VCS. JS They are called when a resource of that type is configured and needs to do specific state changes. Once you have your type definition, and an ABRAAgent in the ABRA directory, you then need to create a resource of that type in the main.cf I know I have defined the .cf file for the application but it seems they are not called from the ArgList parameter. I cross checked with other running applications as well and seems they are not called anywhere. Does arglist by default call these scripts ? I mean I was expecting these scripts to be called at any configuration file. JS Huh? Sorry. The entry point scripts for a resource type are only called when a resource of that type needs to be controlled. So unless you have a resource of tyoe Oracle defined in a service group, the Oracle entry points are not called. For that matter, the OracleAgent is not even started My .cf file looks like below # more ABRA.cf type ABRA ( static int RestartLimit = 2 static str ArgList[] = { Vandalhome, stupiduser } str Vandalhome str stupiduser ) 2) One of the reasons for the above question is , I think all my applications do not have a proper Clean procedure as the clean script does not make sense to me. Just wanted to check if Clean is implemented on the system or not. JS You would have had to have implemented the clean procedure 3) What agent attribute would be best suited for the resource to wait for specific interval of time before starting the procedure. The numerical value returned by online or offline sets the number of seconds before monitor is called Im toying with following attributes. OnlineWaitLimit : But SADG Says Number of monitor intervals to wait after completing the online procedure, and before the resource becomes online. : My requirement is to wait before starting the online procedure OnlineTimeout: Not convinced with this as well ConfInterval: Had seen the implementation of this parameter in main.cmd so not happy about it. Ciao ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] Metro/Global cluster solution options
Talk with your Symantec rep? The System Engineer can easily come in and discuss how VCS can manage your DR Automation needs From: veritas-ha-boun...@mailman.eng.auburn.edu [mailto:veritas-ha-boun...@mailman.eng.auburn.edu] On Behalf Of rajesh Kharya (rkharya) Sent: Wednesday, December 24, 2008 5:15 AM To: veritas-ha@mailman.eng.auburn.edu Subject: [Veritas-ha] Metro/Global cluster solution options Hi, We are evaluating possible clustering solution for one project where entire application environment will be hosted in 2 Data centers, some 50 miles apart. The application environment will be identical on both the DCs and will be accessed via Global Site Selector/Application Control Engines at the network layer. At the very back end we have a requirement of putting a 2 node cluster in each data center on Linux OS preferably. Within a DC one node will be active while the other will be passive. Storage will be configured as mounted file systems. We need to know in what way VCS can help in - A) data replication between the clusters in 2 DCs, assuming the two clusters are working independently. B) Is there a possibility of having all nodes(1-4) part of a single cluster, where they are separated out by 50 miles and have common storage between them(possibly CFS implementation). Node1/3 remains active while Node2/4 are standbys. Site ASite B -- Node1Node3 || Node2 Node4 Any pointers to references/documentations appreciated. Thanks, ~ Rajesh. ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] SRDF agent for cascaded SRDF in global cluster
The SRDF replication control agent for VCS HA/DR does not currently support cascaded SRDF. It only supports STAR. We are looking at adding cascaded, but no official support at this time, and no committed date for cascade support. Speak with your Symantec rep? From: veritas-ha-boun...@mailman.eng.auburn.edu [mailto:veritas-ha-boun...@mailman.eng.auburn.edu] On Behalf Of Pavel A Tsvetkov Sent: Tuesday, December 23, 2008 6:09 AM To: veritas-ha@mailman.eng.auburn.edu Subject: [Veritas-ha] SRDF agent for cascaded SRDF in global cluster Hello all! Just a small question. The new SRDF agent 5.0.0.4 has a support for SRDF STAR. This is a good thing. But what about cascaded SRDF for Symmetrix DMX version 5773 and above? If we use R1 and R12 on one site (Replicated Data Cluster R1-R12) and R2 on another site in global cluster (R12 - R2) is it possible to use SRDF VCS agent in that case? I don't see reasons why this agent cannot be used Kind regards, Pavel Tsvetkov ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] Question about HA and disks
In the original message We had an issue where a serverA failed and serverB took over. However, serverB took over when serverA was still 'crashing' (it took a good 10-15mins to crash), I can assume crash = panic, as crashing has to refer to dumping core to disk. If this is the case, there will be no logs on server A, as it is mid panic. In this case (the node is in the middle of a crash dump), it will not be writing to data disks. What ever was written happened before the kernel call to panic. Fencing will protect that data once the new node imports, but in the case described here, the corruption had to happen before the panic, so fence would not have helped. Bottom line is the node ceased writing as soon as the non maskable interrupt was called for panic (unless Linux somehow violates every Unix kernel rule, which I seriously doubt). When VCS took over the service group on Server B, Server A was down and could not have been writing -Original Message- From: Jon E Price/SYS/NYTIMES [mailto:[EMAIL PROTECTED] Sent: Monday, October 27, 2008 8:14 PM To: Jim Senicka; Andrey Dmitriev; Joshua Fielden; veritas-ha@mailman.eng.auburn.edu Subject: Re: [Veritas-ha] Question about HA and disks Hi, A few questions.. Andrey: Could you post the logs (or even portions of them) which show what ServerA was doing during the takeover? Joshua: You're saying that IO Fencing can prevent split brain situations in which one server is still writing to a filesystem while a 2nd server has taken over that same service group and begun writing to the same fs, thus possibly causing corruption? http://sfdoccentral.symantec.com/sf/5.0/linux/html/vcs_install/ch_vcs_in stall_iofence.html#190559 Jim: What's the evidence that the server panic'd? And is 16 seconds the default for the heartbeat failure? Jon Jim Senicka [EMAIL PROTECTED] mantec.com To Sent by: Andrey Dmitriev veritas-ha-bounce [EMAIL PROTECTED], [EMAIL PROTECTED] veritas-ha@mailman.eng.auburn.edu urn.edu cc Subject 10/27/2008 07:19 Re: [Veritas-ha] Question about HA PMand disks When a server panics, it stops writing to anything but the dump device. VCS did exactly as designed. 16 seconds after heartbeat failure it started takeover. Whatever was damaged on your file system was already damaged at that point, regardless how long it took to dump core to the dump device. I would look at the cause of the panic, and it is likely it was something to do with what garbaged your FS -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Andrey Dmitriev Sent: Monday, October 27, 2008 2:01 PM To: veritas-ha@mailman.eng.auburn.edu Subject: [Veritas-ha] Question about HA and disks We had an issue where a serverA failed and serverB took over. However, serverB took over when serverA was still 'crashing' (it took a good 10-15mins to crash), and apparently still had a hold of file systems (system logs confirm that takeover occurred while serverA was still 'puking'). The file systems on ServerB came up corrupt, and we lost some data b/c of that. HA is setup via heartbeats. File system is vxfs, OS is RedHat 4.0. Is there are any way to avoid that? Thanks, Andrey ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] Question about HA and disks
While I think fencing is always the right choice, I still think this was a system issue. The system stopped heart beating for 16 seconds, plus the 5 seconds gab stable time out. At this point, VCS failed over. Fencing would not have been in play until the import on the second node. So if the corruption happened during the 21 seconds, it would not have helped. If there is a case where the node is nearly dead for an extended period of time, not capable of kernel level heartbeat from LLT, but is still writing to disk, then by all means you need I/O fencing to protect you from the OS. -Original Message- From: Brad Boyer Sent: Monday, October 27, 2008 8:57 PM To: Jim Senicka; Jon E Price/SYS/NYTIMES; Andrey Dmitriev; Joshua Fielden; veritas-ha@mailman.eng.auburn.edu Subject: RE: [Veritas-ha] Question about HA and disks Based on the original description, I would presume that the system did not actually panic immediately. I've seen Linux systems oops without immediate panics many times. I would make no assumption of what the dying system was doing in this case without real evidence, especially not that it actually got as far as a panic. Linux is not UNIX (it's just unofficially POSIX compliant), and you shouldn't make the assumption that Linux will act like UNIX (it definitely acts different in quite a few ways). Seeing as this is RHEL4, this system probably isn't even capable of taking a crash dump, and thus would be unlikely to be taking time writing a crash dump as opposed to doing some damage to the data on disk. Even with the current Red Hat release (RHEL5) crash dumps aren't enabled by default. My suggestion is that using I/O fencing would be the right answer here. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jim Senicka Sent: Monday, October 27, 2008 5:21 PM To: Jon E Price/SYS/NYTIMES; Andrey Dmitriev; Joshua Fielden; veritas-ha@mailman.eng.auburn.edu Subject: Re: [Veritas-ha] Question about HA and disks In the original message We had an issue where a serverA failed and serverB took over. However, serverB took over when serverA was still 'crashing' (it took a good 10-15mins to crash), I can assume crash = panic, as crashing has to refer to dumping core to disk. If this is the case, there will be no logs on server A, as it is mid panic. In this case (the node is in the middle of a crash dump), it will not be writing to data disks. What ever was written happened before the kernel call to panic. Fencing will protect that data once the new node imports, but in the case described here, the corruption had to happen before the panic, so fence would not have helped. Bottom line is the node ceased writing as soon as the non maskable interrupt was called for panic (unless Linux somehow violates every Unix kernel rule, which I seriously doubt). When VCS took over the service group on Server B, Server A was down and could not have been writing -Original Message- From: Jon E Price/SYS/NYTIMES [mailto:[EMAIL PROTECTED] Sent: Monday, October 27, 2008 8:14 PM To: Jim Senicka; Andrey Dmitriev; Joshua Fielden; veritas-ha@mailman.eng.auburn.edu Subject: Re: [Veritas-ha] Question about HA and disks Hi, A few questions.. Andrey: Could you post the logs (or even portions of them) which show what ServerA was doing during the takeover? Joshua: You're saying that IO Fencing can prevent split brain situations in which one server is still writing to a filesystem while a 2nd server has taken over that same service group and begun writing to the same fs, thus possibly causing corruption? http://sfdoccentral.symantec.com/sf/5.0/linux/html/vcs_install/ch_vcs_in stall_iofence.html#190559 Jim: What's the evidence that the server panic'd? And is 16 seconds the default for the heartbeat failure? Jon Jim Senicka [EMAIL PROTECTED] mantec.com To Sent by: Andrey Dmitriev veritas-ha-bounce [EMAIL PROTECTED], [EMAIL PROTECTED] veritas-ha@mailman.eng.auburn.edu urn.edu cc Subject 10/27/2008 07:19 Re: [Veritas-ha] Question about HA PMand disks When a server panics, it stops writing to anything but the dump device. VCS did exactly as designed. 16 seconds after heartbeat failure it started takeover. Whatever was damaged on your file system was already damaged at that point, regardless how long it took to dump core to the dump device. I would look at the cause of the panic, and it is likely it was something to do with what garbaged your FS -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Andrey Dmitriev Sent: Monday, October 27, 2008 2:01 PM To: veritas-ha@mailman.eng.auburn.edu Subject: [Veritas-ha] Question about HA and disks We had an issue where a serverA failed and serverB took over
Re: [Veritas-ha] VCS 5.0 MP1: issue probing disk-group !?
Cut and paste main.cf for the service group in question? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Pascal Grostabussiat Sent: Tuesday, October 21, 2008 9:11 AM Cc: Veritas-ha@mailman.eng.auburn.edu Subject: Re: [Veritas-ha] VCS 5.0 MP1: issue probing disk-group !? To Jim, Scott and Gene. Jim Senicka wrote: Is the disk group agent running on the systems? Yes it is: root 16295 1 0 16:16:01 ? 1:29 /opt/VRTSvcs/bin/DiskGroup/DiskGroupAgent -type DiskGroup Has the cluster been started since you created the service group definition? Yes. I restarted VCS hopping it might somehow change something, but no. I am thinking about rebooting one server. Are all resources enabled in the service groups? Yes. I tried to disable them and re-enable them. But I come back to the same situation. Scott3, James wrote: Have you made sure the volumes are ENABLED ACTIVE? Can you send a vxprint on the group? Is it a shared group or a active/passive group? Also send a vxdg list. Enabled and active, yes. The disk-group is active/passive (to be mounted on one host at a time). bash-3.00# vxprint -l dba_DG Disk group: dba_DG Group:dba_DG info: dgid=1224062934.119.hostname version: 140 alignment: 8192 (bytes) detach-policy: global dg-fail-policy: dgdisable copies: nconfig=default nlog=default devices: max=32767 cur=3 minors: = 62000 cds=on bash-3.00# vxprint -g dba_DG TY NAME ASSOCKSTATE LENGTH PLOFFS STATETUTIL0 PUTIL0 dg dba_DG dba_DG ----- - dm dba_DG01 c0t216000C0FF87E774d10s2 - 525417536 - -- - v dba_archive fsgenENABLED 20971520 -ACTIVE - - pl dba_archive-01 dba_archive ENABLED 20971520 -ACTIVE - - sd dba_DG01-02 dba_archive-01 ENABLED 20971520 0 -- - v dba_data fsgenENABLED 104857600 - ACTIVE - - pl dba_data-01 dba_data ENABLED 104857600 - ACTIVE - - sd dba_DG01-03 dba_data-01 ENABLED 104857600 0 -- - v dba_redo fsgenENABLED 20971520 -ACTIVE - - pl dba_redo-01 dba_redo ENABLED 20971520 -ACTIVE - - sd dba_DG01-01 dba_redo-01 ENABLED 20971520 0-- - bash-3.00# vxdg list NAME STATE ID xxx_DG enabled,cds 1224062531.89.hostname xxx_DG enabled,cds 1224062634.101.hostname xxx_DG enabled,cds 1224062699.109.hostname dba_DG enabled,cds 1224062934.119.hostname xxx_DG enabled,cds 1224062443.81.hostname xxx_DGenabled,cds 1224062569.93.hostname xxx_DG enabled,cds 1224062672.105.hostname xxx_DG enabled,cds 1224062491.85.hostname Gene Henriksen wrote: If you have a ? in the GUI, then it cannot probe the resource on one system or the other. It will not import on either until it is probed on both. This is to avoid a concurrency violation. Yes. Fully agree. Hold the cursor on the resource and a pop-up box should show the status so you can see where it is not probed. Status is unkown on both server A and B. This could be due to one system having never seen the DG. Can you run vxdisk -o alldgs list and see the DG on both systems? I can import/deport that disk-group using vxdg without a problem bash-3.00# vxdisk -o alldgs list DEVICE TYPEDISK GROUPSTATUS c0t216000C0FF87E774d0s2 auto:none --online invalid c0t216000C0FF87E774d1s2 auto:cdsdiskxxx_DG01 xxx_DG online c0t216000C0FF87E774d2s2 auto:cdsdiskxxx_DG01xxx_DG online c0t216000C0FF87E774d3s2 auto:cdsdiskxxx_DG01xxx_DG online c0t216000C0FF87E774d4s2 auto:cdsdiskxxx_DG01 xxx_DGonline c0t216000C0FF87E774d5s2 auto:cdsdisk-(xxx_DG)online c0t216000C0FF87E774d6s2 auto:cdsdiskxxx_DG01xxx_DG online c0t216000C0FF87E774d7s2 auto:cdsdiskxxx_DG01xxx_DG online c0t216000C0FF87E774d8s2 auto:cdsdiskxxx_DG01 xxx_DG online c0t216000C0FF87E774d9s2 auto:cdsdisk-(xxx_DG) online c0t216000C0FF87E774d10s2 auto:cdsdiskdba_DG01 dba_DG online c2t0d0s2 auto:none --online invalid c2t2d0s2 auto:none --online invalid c2t3d0s2 auto:none --online invalid The other possibility is a typo in the DiskGroup resource attribute. Make sure it has no leading spaces, is the correct case (just like vxdisk list shows it. I thought about this and double-checked. Nothing. I recreated the resource and paid attention to such possibility, nothing. Regards, /Pascal ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] VCS 5.0 MP1: issue probing disk-group !?
Is the disk group agent running on the systems? Has the cluster been started since you created the service group definition? Are all resources enabled in the service groups? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Pascal Grostabussiat Sent: Tuesday, October 21, 2008 7:09 AM To: Veritas-ha@mailman.eng.auburn.edu Subject: [Veritas-ha] VCS 5.0 MP1: issue probing disk-group !? Hi, I have experiencing a weird issue since yesterday and I cannot get that solve buy surfing and checking around. So I hope to get a hint using the mailing-list. Our sysadmin recently installed a system with two Sun SPARC for me with VxVM, VxFS and VCS. In short I have VERITAS Foundation 5.0 with MP1. DESC: Veritas Cluster Server by Symantec PSTAMP: Veritas-5.0MP1-11/29/06-17:15:00 DESC: Virtual Disk Subsystem PSTAMP: Veritas-5.0-MP1.26:2007-02-28 DESC: Commercial File System PSTAMP: VERITAS-FS-5.0.1.0-2007-01-17-5.0MP1=123202-02 Now I have an issue with all disk-groups, like for example dba_DG. Using the command line or the VERITAS Enterprise Administrator I can import/deport the disk-group, I can mount the corresponding volumes and creates file-systems on them. No issue there. Now I go to the VERITAS Cluster Administrator and there our sysadmin had already created resources for the disk-groups. However, I cannot bring anyone online because the GUI keeps on telling me that the resource has not been probed on the system (I have two systems, tried to online on A and B, but same behavior). I deleted the resource, created a new one, same issue. I still have a ? mark on the resource. Issuing a probe does not solve anything. I checked the engine_A.log and can see that the probe was fired, but nothing more. I can run the hares -probe dba_DG -sys A and I get the prompt back, nothing else appears !? I am puzzled ! Any idea ? Any known issue ? Many thanks in advance. Regards, /Pascal ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] IPMultiNICB, mpathd and network outages
I would be more concerned about future failures being handled properly. If you were able to take out all networks from all nodes at same time, you have a SPOF. If this was a one time maintenance upgrade to your network gear and not a normal event, setting VCS to not respond to network events means that future cable or port issues will not be handled. If it is a common occurrence for all networks to be lost, perhaps you need to address the network issues :-) -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of DeMontier, Frank Sent: Monday, October 20, 2008 11:10 AM To: Paul Robertson; veritas-ha@mailman.eng.auburn.edu Subject: Re: [Veritas-ha] IPMultiNICB, mpathd and network outages FaultPropagation=0 should do it. Buddy DeMontier State Street Global Advisors Infrastructure Technical Services Boston Ma 02111 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Paul Robertson Sent: Monday, October 20, 2008 10:37 AM To: veritas-ha@mailman.eng.auburn.edu Subject: [Veritas-ha] IPMultiNICB, mpathd and network outages We recently experienced a Cisco network issue which prevented all nodes in that subnet from accessing the default gateway for about a minute. The Solaris nodes which run probe-based IPMP reported that all interfaces had failed because they were unable to ping the default gateway; however, they came back within seconds once the network issue was resolved. Fine. Unfortunately, our VCS nodes initiated an offline of the service group after the IPMultiNICB resources detected the IPMP fault. Since the service group offline/online takes several minutes, the outage on these nodes was more painful. Furthermore, since the peer cluster nodes in the same subnet were also experiencing the same mpathd fault, there would have been little advantage to failing over the service group to another node. We would like to find a way to configure VCS so that the service group does not offline (and any dependent resources within the service group are not offlined) in the event of an mpathd (i.e. IPMultiNICB) failure. In looking through the documentation, it seems that the closest we can come is to increase the IPMultiNICB ToleranceLimit from 1 to a huge value: # hatype -modify IPMultiNICB ToleranceLimit This should achieve our desired goal, but I can't help thinking that it's an ugly hack, and that there must be a better way. Any suggestions are appreciated. Cheers, Paul P.S. A snippet of the main.cf file is listed below: group multinicbsg ( SystemList = { app04 = 1, app05 = 2 } Parallel = 1 ) MultiNICB multinicb ( UseMpathd = 1 MpathdCommand = /usr/lib/inet/in.mpathd -a Device = { ce0 = 0, ce4 = 2 } DefaultRouter = 192.168.9.1 ) Phantom phantomb ( ) phantomb requires multinicb group app_grp ( SystemList = { app04 = 0, app05 = 0 } ) IPMultiNICB app_ip ( BaseResName = multinicb Address = 192.168.9.34 NetMask = 255.255.255.0 Proxy appmnic_proxy ( TargetResName = multinicb ) (various other resources, including some that depend on app_ip excluded for brevity) app_ip requires appmnic_proxy ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] Server crashes but VCS doesn't detect it
If power cycle fixed it, it was still heartbeating on LLT. Sent from my Nokia E62 handheld by goodlink. -Original Message- From: Andrey Dmitriev [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 17, 2008 03:33 PM US Mountain Standard Time To: veritas-ha@mailman.eng.auburn.edu Cc: veritas-ha@mailman.eng.auburn.edu Subject:[Veritas-ha] Server crashes but VCS doesn't detect it Had sort of a weird case today. We had a server failure, lost network, console was being filled with some sort of crash info. The cluster however, showed everything online. We also had a netdump configured (linux), but that couldn't work b/c network was down. Customer is unhappy why it didn't fail over. Anyone can think of a reason or think of how I can prevent something similar in the future? I sort of suspect LLT might still have been up somewhat. It wasn't until we powercycled the box did the other nodes detect it was down. -andrey ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] vxfencing 2 nodes
For any cluster larger than 1 node, I/O fencing is highly recommended to protect data integrity in the event of a split brain. 2 nodes is not in any way more resistant to split brain than 3 nodes or more. VCS does not use any form of quorum based membership (quorum has a number of it's own ugly issues), so there is no difference in how our membership works when you have 2, 3, or 32 nodes -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Shashi Kanth Boddula Sent: Thursday, June 12, 2008 7:30 AM To: Mayank Vasa Cc: veritas-ha@mailman.eng.auburn.edu Subject: Re: [Veritas-ha] vxfencing 2 nodes Ok, thanks for clarification. I have seen many clustering products documentation which says that fencing/quorum is optional/not_required for more than 2 node clusters, and they says that there is very very less chance of happening split brain condition for more than 2 node clusters. -- Shashi Mayank Vasa wrote: Shashi: The number of nodes is not a decision making factor for fencing. For a cluster greater than 2 nodes, fencing helps to protect your data in the case of a split brain scenario. SFRAC requires fencing. It is not supported without it. Regards, + Mayank -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Shashi Kanth Boddula Sent: Wednesday, June 11, 2008 12:23 AM To: veritas-ha@mailman.eng.auburn.edu Subject: [Veritas-ha] vxfencing 2 nodes Is vxfencing required if we go for =3 node cluster ? Or, vxfencing is optional if we go for =3 node cluster ? I am going for 4-node VCS5 SFRAC, still vxfencing required for me ? , does all VCS5 SFRAC modules work properly without vxfencing ? ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] Veritas Cluster Server
Have you opened a support case? To the best of my knowledge, VCS 4.1 does not support RHEL 5. Support can confirm From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Goutham N Sent: Thursday, June 05, 2008 8:37 AM To: veritas-ha@mailman.eng.auburn.edu Subject: [Veritas-ha] Veritas Cluster Server Hi, I am installing Veritas Cluster server 4.1 on Red Hat Linux 5.x environment. I am getting the following error message. Can anyone help with a solution. Cluster Server configured successfully. Starting Cluster Server: Starting LLT on usplselux141 /etc/init.d/llt start 21 exit=256 Starting LLT: LLT: loading module... LLT:Error: cannot find compatible module binary /sbin/lltconfig 21 exit=256 LLT lltconfig ERROR V-14-2-15000 open /dev/llt failed: No such file or directory Error CPI ERROR V-9-120-1171 Could not start LLT on usplselux141: LLT lltconfig ERROR V-14-2-15000 open /dev/llt failed: No such file or directory The installvcs log is saved at: /opt/VRTS/install/logs/installvcs605084118.log -- N. Gowthaman ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] VCS with replicated storage
You are attempting to build what is called a Replicated Data Cluster. This should be documented in the UG as I recall. You will use identical DG and volume resources, with the appropriate replication management resource under the DG. To do this and comply with the EULA, you need the HA/DR Edition of VCS, to license you to use the replication agents. In an RDC, the replication agent manages read/write enabling and direction of replication. When you failover, the opposite node is write enabled, then the normal DG and volume agents bring up the storage Hugh Shannon here at Symantec is the Technical Product Manager responsible for these type configs From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Esson, Paul Sent: Thursday, June 05, 2008 9:40 AM To: veritas-ha@mailman.eng.auburn.edu Subject: [Veritas-ha] VCS with replicated storage Folks, My background with VCS is limited to local clusters with shared storage arrays ad software mirroring of volumes using VxVM. I have been asked to implement a VCS 5.0 cluster on Solaris 10 using replicated block-level NetApps storage. This will be a stretched cluster with one node on each of two sites and heartbeat connections using VLANs. What I am struggling with at the moment is how to configure the storage resources within VCS. I am use to defining shared volume groups/volumes but as I see it each node will effectively have a local LUN or LUNs with blocks being replicated at the array level from the active to the inactive node. Do I create separate Volume Groups and Volumes on each node and set the associated attributes on a per system basis such that failover starts the application up mounting the file system on the replica volume of the alternative node? Regards Paul Esson Redstor Limited Direct: +44 (0) 1224 595381 Mobile: +44 (0) 7766 906514 E-Mail: [EMAIL PROTECTED] Web:www.redstor.com REDSTOR LIMITED Torridon House 73-75 Regent Quay Aberdeen UK AB11 5AR Disclaimer: The information included in this e-mail is of a confidential nature and is intended only for the addressee. If you are not the intended addressee, any disclosure, copying or distribution by you is prohibited and may be unlawful. Disclosure to any party other than the addressee, whether inadvertent or otherwise is not intended to waive privilege or confidentiality. ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] .stale file
Right. But that can also be done via CLI or GUI with the cluster running. From: i man [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 03, 2008 9:48 AM To: Jim Senicka Cc: Gene Henriksen; veritas-ha@mailman.eng.auburn.edu Subject: Re: [Veritas-ha] .stale file Jim, This is to update systems with some new service groups. This is not on a single system but rather large number of systems (100+) Also so many thanks to Gene and John for resolving my doubts. Ciao, On Tue, Jun 3, 2008 at 2:30 PM, Jim Senicka [EMAIL PROTECTED] wrote: Bigger question is what are you routinely using stop -force to accomplish? From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Gene Henriksen Sent: Tuesday, June 03, 2008 8:17 AM To: i man; veritas-ha@mailman.eng.auburn.edu Subject: Re: [Veritas-ha] .stale file It indicates you did not close and save the cluster configuration after making modifications. It is a warning. If you close and save the config, it goes away. From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of i man Sent: Tuesday, June 03, 2008 7:28 AM To: veritas-ha@mailman.eng.auburn.edu Subject: [Veritas-ha] .stale file All, Had some queries regarding the .stale file present in the /etc/VRTSvcs/conf/config directory. I know that if the haagents are restarted with hastop -all -force and this file is present the cluster memebers could be in stale admin wait state. I have been deleting this file then hastop -all -force and then hastart on the the nodes. I do not want the service groups to go offline that's why -force. My query is what is the use of .stale ? Would hastart -force help to get nodes back if this file is present ? Is file deletion the only method to get the nodes back ? I noticed recently that when getting the cluster back, this way my clusters the information about the admin password. I thnk I'm doing something wrong.any help. Ciao. ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] .stale file
Bigger question is what are you routinely using stop -force to accomplish? From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Gene Henriksen Sent: Tuesday, June 03, 2008 8:17 AM To: i man; veritas-ha@mailman.eng.auburn.edu Subject: Re: [Veritas-ha] .stale file It indicates you did not close and save the cluster configuration after making modifications. It is a warning. If you close and save the config, it goes away. From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of i man Sent: Tuesday, June 03, 2008 7:28 AM To: veritas-ha@mailman.eng.auburn.edu Subject: [Veritas-ha] .stale file All, Had some queries regarding the .stale file present in the /etc/VRTSvcs/conf/config directory. I know that if the haagents are restarted with hastop -all -force and this file is present the cluster memebers could be in stale admin wait state. I have been deleting this file then hastop -all -force and then hastart on the the nodes. I do not want the service groups to go offline that's why -force. My query is what is the use of .stale ? Would hastart -force help to get nodes back if this file is present ? Is file deletion the only method to get the nodes back ? I noticed recently that when getting the cluster back, this way my clusters the information about the admin password. I thnk I'm doing something wrong.any help. Ciao. ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] .stale file
you only need one notifier, usually in the CSG. No need for proxy anywhere else. From: i man [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 03, 2008 12:00 PM To: Gene Henriksen Cc: John Cronin; Jim Senicka; veritas-ha@mailman.eng.auburn.edu Subject: Re: [Veritas-ha] .stale file Gene,John,Jim, Thats excellent. So many thanks again for the new ideas. There is one last query regarding the whole activity. This is regarding the use of Proxy for notifier. Nobody has been able to tell me definately whether this is required for the notifier. If I create my notifier in the Cluster service group or any other service group does it require a proxy to send alerts. If so and if I create the notifier in separate service group is it fine if I create the proxy in Cluster service group. Having gone thorugh BARG there are sample examples which explain notifier dependency on proxy, but even without the proxy things seem to be working fine for me in a test system.Also when installing thorugh GUI it does ask about some NIC card information, the step which I always skipped, don't know how relevant this is for the creation and working of notifier. Ciao On Tue, Jun 3, 2008 at 4:22 PM, Gene Henriksen [EMAIL PROTECTED] wrote: Putting the Notifier in the cluster service group also has an advantage because CSG is the first SG up and the hardest to kill, therefore in times of lots of problems you will get notification more so than if the service group you arbitrarily chose to use is faulted on all systems in the cluster, then notification is also down. You could create the CSG in one system, save the configuration, run hacf -cftocmd . in the /etc/VRTSvcs/conf/config directory, then edit the main.cmd (look toward the bottom) to find the commands to create the CSG and Notifier, make a script and modify to run on other clusters. From: John Cronin [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 03, 2008 10:45 AM To: i man Cc: Jim Senicka; Gene Henriksen; veritas-ha@mailman.eng.auburn.edu Subject: Re: [Veritas-ha] .stale file It would be no problem to create a Notifier resource in any arbitrary service group with the CLI. If I understand this correctly, what you are doing is shutting down VCS, and then editing main.cf to change the config? If this was for one or two clusters, it might be an OK way to do it, but if this is for hundreds of systems, it would be better to learn how to use the CLI and then script the changes. Also, what is the problem with putting the notifier in the ClusterService group? I can't see how putting it in another service group would provide you any particular benefit - the Notifier if going to do the same things no matter which service group it is in. Since it is a cluster wide service, it makes sense that it should be in the ClusterService group. As for using hastop -all -force, I tend to use it frequently on production systems when I am doing something that requires stopping the cluster, but does not require stopping the systems or the services running on those systems (e.g. patching or upgrading VCS, or reconfiguring GAB or LLT). However, I would not do this to accomplish something that can be done with CLI commands. -- John Cronin On 6/3/08, i man [EMAIL PROTECTED] wrote: Correct Jim, If this would have been a normal cluster service group I would loved to have done that. What I'm trying to obtain is creation of snmp notifier in a separate service group . Through GUI you cannot create it in your own service group but could only create it as a part of Clusterservicegroup. Not sure if this is achievable through CLI. Any suggestions ? On Tue, Jun 3, 2008 at 2:52 PM, Jim Senicka [EMAIL PROTECTED] wrote: Right. But that can also be done via CLI or GUI with the cluster running. From: i man [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 03, 2008 9:48 AM To: Jim Senicka Cc: Gene Henriksen; veritas-ha@mailman.eng.auburn.edu Subject: Re: [Veritas-ha] .stale file Jim, This is to update systems with some new service groups. This is not on a single system but rather large number of systems (100+) Also so many thanks to Gene and John for resolving my doubts. Ciao, On Tue, Jun 3, 2008 at 2:30 PM, Jim Senicka [EMAIL PROTECTED] wrote: Bigger question is what are you routinely using stop -force to accomplish? From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED
Re: [Veritas-ha] Importance of NIC Proxy in Clusterservice group
You should be monitoring the NIC in some service group on the box. A NIC Proxy is used to prevent duplicate monitoring by other service groups From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of i man Sent: Monday, June 02, 2008 12:13 PM To: veritas-ha@mailman.eng.auburn.edu Subject: [Veritas-ha] Importance of NIC Proxy in Clusterservice group All, Can anybody let me know why is a NIC proxy required in clusterservice group ? Also is this necessary to create a NIC proxy in clusterservice group for SNMP notifier which is created in separate service group. Cioa. ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] question about hastop
Hastop -force -all does not take down resources. But why not add the resources online? Hastop -force -all is really only used for heavy lifting, like upgrading VCS bits. You can add the resources on the fly using CLI or GUI -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Paveza, Gary Sent: Friday, May 23, 2008 10:13 AM To: 'Veritas HA' Subject: [Veritas-ha] question about hastop I currently have a Veritas Cluster for RAC which really only is responsible for mounting the filesystems for the cluster. The database start / stop and CSSD are handled via system startup scripts. I need to modify the main.cf file to add a resource for Networker. If I issue the hastop -all -force command (as outlined in the Networker manual), will this shutdown the cluster and make the filesystems umount? Or will everything remain up and running? - Gary Paveza, Jr. AIG - Personal Lines Division Technical Specialist - Architecture - HP CSE, SCSA (302) 252-4831 - phone ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] Veritas Cluster Server 5.0 available for RHEL 5.x
5.0MP3 will add RHEL 5 support. Talk with your rep on release dates? From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Tom Stephens Sent: Thursday, May 22, 2008 11:48 AM To: Goutham N; veritas-ha@mailman.eng.auburn.edu Subject: Re: [Veritas-ha] Veritas Cluster Server 5.0 available for RHEL 5.x Not according to the release notes for the product. These can be found at: ftp://exftpp.symantec.com/pub/support/products/ClusterServer_UNIX/283850 .pdf (For Linux 5.0) ftp://exftpp.symantec.com/pub/support/products/ClusterServer_UNIX/287175 .pdf (For Linux 5.0 MP1) ftp://exftpp.symantec.com/pub/support/products/ClusterServer_UNIX/289442 .pdf (For Linux 5.0 MP2) . Tom From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Goutham N Sent: Thursday, May 22, 2008 1:40 AM To: veritas-ha@mailman.eng.auburn.edu Subject: [Veritas-ha] Veritas Cluster Server 5.0 available for RHEL 5.x Is Veritas Cluster Server 5.0 available for RHEL Version 5 ? -- N. Gowthaman ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] bundled HP-UX vxfm/vxfs
With SFRAC already installed, shouldn't you have VCS already installed? Sent from my Nokia E62 handheld by goodlink. -Original Message- From: Shashi Kanth Boddula [mailto:[EMAIL PROTECTED] Sent: Friday, April 18, 2008 02:12 AM US Mountain Standard Time To: veritas-ha@mailman.eng.auburn.edu Subject:[Veritas-ha] bundled HP-UX vxfm/vxfs I used to get the bellow message whenever i install VCS SFRAC version 4.1 includes VRTSvxvm version 4.1.010. A more recent version of VRTSvxvm, 4.1.011, is already installed. CPI WARNING V-9-10-1400 In this situation VRTSvxvm version 4.1.011 will not be installed or downgraded. SFRAC version 4.1 may not operate correctly with this more recent package. The VRTSvxvm package must be removed manually before version 4.1.010 can be installed. SFRAC version 4.1 includes VRTSvxfs version 4.1. A more recent version of VRTSvxfs, 4.1.001, is already installed. CPI WARNING V-9-10-1400 In this situation VRTSvxfs version 4.1.001 will not be installed or downgraded. SFRAC version 4.1 may not operate correctly with this more recent package. The VRTSvxfs package must be removed manually before version 4.1 can be installed. Is there any known issues/problems if we proceed to install VCS without removing operating system bundled VxVM/VxFS, and continue to install VCS with operating system bundled VxVM/VxFS (not VCS bundled VxVM/VxFS) ? Or, we can simply ignore this message, and cluster will operate normally with out any problems ? ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] Coordinator disks
No. A. It must be an odd number. (otherwise no majority possible) B. You cannot add online. You will need to bounce the cluster (or at least the fence driver) to move to the new array Sent from my Nokia E62 handheld by goodlink. -Original Message- From: Rongsheng Fang [mailto:[EMAIL PROTECTED] Sent: Friday, April 11, 2008 04:09 PM US Mountain Standard Time To: veritas-ha@mailman.eng.auburn.edu Subject:[Veritas-ha] Coordinator disks Hi, Does anybody know how many coordinator disks the coordinator disk group can have? The VCS installation guide says 3, but doesn't say if more is supported. We currently have 3 coordinator disks configured for a VCS 5.0 MP1 cluster with IO fencing enabled. We will need to shutdown the array where the coordinator disks reside for temporarily (for a few days). So I am thinking if I can add another three coordinator disks from another array to the coordinator disk group. This way the coordinator disk group would still have 3 available coordinator disks while the original three are down. Would this work? Or what's the best way to deal with this situation? Thanks, Rongsheng ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] Coordinator disks
Fence won't start if even. Sent from my Nokia E62 handheld by goodlink. -Original Message- From: Joshua Fielden [mailto:[EMAIL PROTECTED] Sent: Friday, April 11, 2008 04:30 PM US Mountain Standard Time To: Rongsheng Fang; veritas-ha@mailman.eng.auburn.edu Subject:Re: [Veritas-ha] Coordinator disks 3 *or more*, but they need to be an odd number, so minimize the amount of time they're even -- coordinator races are decided by holding a majority, so you have exposure while the total number of disks are even. Cheers, jf Sent by GoodLink (www.good.com) -Original Message- From: Rongsheng Fang [mailto:[EMAIL PROTECTED] Sent: Friday, April 11, 2008 04:09 PM US Mountain Standard Time To: veritas-ha@mailman.eng.auburn.edu Subject:[Veritas-ha] Coordinator disks Hi, Does anybody know how many coordinator disks the coordinator disk group can have? The VCS installation guide says 3, but doesn't say if more is supported. We currently have 3 coordinator disks configured for a VCS 5.0 MP1 cluster with IO fencing enabled. We will need to shutdown the array where the coordinator disks reside for temporarily (for a few days). So I am thinking if I can add another three coordinator disks from another array to the coordinator disk group. This way the coordinator disk group would still have 3 available coordinator disks while the original three are down. Would this work? Or what's the best way to deal with this situation? Thanks, Rongsheng ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] Question re SFRAC 5.0
Kelly, That is not normal. If the DB is top of the tree, and set to non critical, it should not cause the group to offline. Even after we introduced FaultPropagation and ManageFaults the core Critical/Non-Critical behavior should not have changed. Can you open a case on this? Jim -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: Friday, February 29, 2008 11:57 AM To: veritas-ha@mailman.eng.auburn.edu Subject: [Veritas-ha] Question re SFRAC 5.0 We are testing a new SFRAC 5.0 cluster. One of the scenarios is a shutdown abort to one instance. When we did this, it took the whole group offline on that node even though the database is the top resource in the dependency tree. Is this normal behavior? I don't remember this every happening before. I remember it only taking the database offline and leaving the mounts up. The database is a non-critical resource with nothing depending on it. Thanks in advance for your help! ** The information contained in this message, including attachments, may contain privileged or confidential information that is intended to be delivered only to the person identified above. If you are not the intended recipient, or the person responsible for delivering this message to the intended recipient, Alltel requests that you immediately notify the sender and asks that you do not read the message or its attachments, and that you delete them without copying or sending them to anyone else. ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] LLT crossed links
I disagree, as long as the SAP stuff is taken care of. 2 dedicated + 2 additional (even sharing a VLAN) is pretty good. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Joshua Fielden Sent: Tuesday, February 19, 2008 11:40 AM To: Ceri Davies Cc: veritas-ha@mailman.eng.auburn.edu Subject: Re: [Veritas-ha] LLT crossed links One can't set up a successful cluster planning for the best case -- one has to plan for the worst case. 2, 4, or 40 links, the underlying discipline doesn't change. What happens, in the below scenario, when you lose both dedicated heartbeats? You're left with two links on the same VLAN, which is verboten. Cheers, jf Sent by GoodLink (www.good.com) -Original Message- From: Ceri Davies [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 19, 2008 09:35 AM US Mountain Standard Time To: Joshua Fielden Cc: veritas-ha@mailman.eng.auburn.edu Subject:Re: [Veritas-ha] LLT crossed links Even if I have four links? The situation is that I have: e1000g0 - public interface, VLAN 2, say e1000g1 - heartbeart interface, VLAN 3 nxge0 - public interfae, VLAN 2 nxge1 - heartbeat interface, VLAN 4 I don't see how having e1000g0 and nxge0 both on VLAN2 can cause the problems you mention given the presence of the other high priority links. Are you certain that's the case? Thanks, Ceri On Tue, Feb 19, 2008 at 09:28:55AM -0700, Joshua Fielden wrote: Having multiple LLT links on the same VLAN/network can cause a variety of problems such as split-brain scenarios, inability to rejoin the cluster, and cluster failures. The heartbeats really need to be isolated from each other. Cheers, jf Sent by GoodLink (www.good.com) -Original Message- From: Ceri Davies [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 19, 2008 09:25 AM US Mountain Standard Time To: veritas-ha@mailman.eng.auburn.edu Subject: [Veritas-ha] LLT crossed links I have a couple of clusters running Solaris 10, VCS 5. I'm running IPMP on my public links and I want to configure each public interface as a low-priority link. Since they're connected to the same VLAN, when I start LLT I get the following warning: llt: LLT WARNING V-14-1-10497 crossed links? link 0 and link 3 of node 1 on the same network I'm fully aware of what this means, but I'm not 100% sure if this is likely to cause me a problem or whether it's just a warning in case I thought I'd connected them to different VLANs. Is this likely to be OK? I have two other links per node which have a dedicated VLAN each. Ceri -- That must be wonderful! I don't understand it at all. -- Moliere -- That must be wonderful! I don't understand it at all. -- Moliere ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] Is VxVM mirror supported in VCS GCO option?
We made a decision to not support VxVM mirror in a GCO environment because it breaks our ability to use SCSI-III based fencing. While you could make the mirror work, it is not a Symantec supported configuration. For dual cluster configs we would require some form of replication. From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Gene Henriksen Sent: Friday, December 28, 2007 5:49 AM To: Pavel A Tsvetkov; veritas-ha@mailman.eng.auburn.edu Subject: Re: [Veritas-ha] Is VxVM mirror supported in VCS GCO option? To mirror volumes you must be dealing with a relatively small distance, such as less than 80K. For these distances, why not use a single cluster called a stretch or campus cluster? In SF 5.0 there is the concept of site awareness so that VM is aware of the two sites and if a volume at the remote site becomes detached, then all volumes at the remote site are detached thereby maintaining consistency of the site. I have not heard of the limitation you mention, I do know that in a Replicated Data Cluster (VVR within a cluster), synchronous replication is required because unlike GCO there is nothing to prevent failover and we don't want the cluster to experience failovers and take over with old data automatically. With mirroring, it certainly would be possible. As in the case of replicating data we do not recommend automatic failover. Automatic failover could result in split brain destroying the data if the link between the two clusters were interrupted making it appear the primary cluster was down. A lot of configurations are possible, a lot will work, but they may not be supported. I am not sure who told you this, but I would ask for an explanation. One possible problem could be the loss of SAN between sites for hours followed by a failover to the remote site with old data with the VCS admin being unaware of the storage problem. I think the primary concern is split brain. With replication, you are working with two distinct data sets. If both sides become active due to a loss of connectivity, the data is not being corrupted, the two sites are just growing further apart. From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Pavel A Tsvetkov Sent: Friday, December 28, 2007 5:15 AM To: veritas-ha@mailman.eng.auburn.edu Subject: [Veritas-ha] Is VxVM mirror supported in VCS GCO option? Hello all! Just one interesting question about VCS GCO. I was told that VxVM mirror is not supported if using with Global Cluster Option. Only replicated volumes can be used ... Is it true? It seems strange to me... Why not? I think it is quite possible to failover mirrored VxVM volume between clusters... Or not??? Kind regards Pavel Tsvetkov ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] Inbound and outbound traffic
Not really a VCS issue. It really depends on the IP stack from the OS, or modifying the application to bind to a specific IP. Usually the source IP of an outbound packet will be whatever the base address (first address configured) is on that interface. One possible solution is to set the base address to be on a different subnet, that way only your VIP is on the subnet in use, and will be the first configured interface From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Pablo Calvo Sent: Wednesday, December 12, 2007 10:37 AM To: veritas-ha@mailman.eng.auburn.edu Subject: [Veritas-ha] Inbound and outbound traffic How can I set inbound and outbound traffic to use the same interface (physical and virtual address)? Uniqs S.A. Sturiza 503 - Olivos Buenos Aires - Argentina TE: (5411) 4711-7755/4799-5516 Cel: (54911) 53747697 No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.5.503 / Virus Database: 269.17.1/1182 - Release Date: 12/12/2007 11:29 AM ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
[Veritas-ha] New VCS course!
Howdy all. Education informed me that we have a new class online around multiple clusters Our new course that includes GCO, Secure Clusters, CMC, Solaris Zones, RemoteSG agent and the campus cluster capability in VM that allows site tagging is now available. The schedule for it is as follows (all we need is students): Oak Brook IL Jan 30 thru 1 Feb Mountain View Feb 4-7 Herndon, VA Feb 20-22 Jim Senicka Senior Director, Technical Product Management Server and Storage Management Group Symantec Corporation www.symantec.com http://www.symantec.com/ - Office: 757-766-0200 Mobile: 757-870-3484 Email: [EMAIL PROTECTED] - att326b1.jpg___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] best way for patching of cluster servers
We will get that resolved (Eric and I). Jim Senicka Sent from my Nokia E62 handheld by goodlink. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Friday, December 07, 2007 01:11 PM US Mountain Standard Time To: Eric Hennessey Cc: veritas-ha@mailman.eng.auburn.edu Subject:Re: [Veritas-ha] best way for patching of cluster servers Hi Eric, That's funny, I've been told by Veritas Support that Veritas does not support nodes in the same Cluster running at different Solaris patch levels, no less different versions of Solaris. Jon Eric Hennessey [EMAIL PROTECTED] ymantec.com To Sent by: [EMAIL PROTECTED] veritas-ha-bounce cc [EMAIL PROTECTED] veritas-ha@mailman.eng.auburn.edu urn.edu Subject Re: [Veritas-ha] best way for patching of cluster servers 12/07/2007 06:12 AM Hi Upen, My guess is you spoke with Sun sales when you asked this question. Try rephrasing your question to your Sun contact. Ask him/her if they will support a collection of systems running Solaris 9 running at different patch levels, without regard to them being clustered. That you're running VCS on these systems isn't Sun's support problem, it's ours, and we unequivocally support mixing not only different patch levels but different Solaris versions in the same cluster. We do this so you can leverage the cluster as an operational support tool to enable rolling upgrades of the OS with a minimum of application down time. The response you got sounds like it came from someone interested in selling Sun Cluster. Just because THEY won't support different patch levels and Solaris versions in the same cluster doesn't mean WE won't. :-) Cheers! Eric From: upen [mailto:[EMAIL PROTECTED] Sent: Thursday, December 06, 2007 7:43 PM To: Eric Hennessey Cc: veritas-ha@mailman.eng.auburn.edu Subject: Re: [Veritas-ha] best way for patching of cluster servers Thanks Eric One question, Does Veritas/symantec provide support for Patching Sun servers involved in Veritas -ha cluster. I contacted Sun for support (we have valid gold contract) but they still refused to support as machines are part of veritas cluster. I am not having Veritas contract number but I know that our contract was renewed and I have valid contract. Is there anyway I can find out from symantec/veritas what is my contract number if I am able to give them necessary information-machine serial and company info. I don't know whosoever renewed contract at work place does not seem to be of much help in term of contract number information ... I am not able to see the site properly on my linux machine and may be I am not looking at proper place..if anyone can give me some contact of veritas/symantec where I can find this information about my contract details and support for patching sun after this.. Thanks On Dec 6, 2007 10:25 AM, Eric Hennessey [EMAIL PROTECTED] wrote: The typical approach to applying OS patches in a clustered environment is to patch an idle server, let it reboot and rejoin the cluster, and make sure it's running OK. If it is, use the cluster software to switch application(s) from an active server to the one you just patched, and if the app comes up successfully there, apply the patch to the server that's now idle. Keep doing this until all nodes in the cluster have been patched. Eric From: [EMAIL PROTECTED] [mailto: [EMAIL PROTECTED] On Behalf Of upen Sent: Thursday, December 06, 2007 4:03 PM To: veritas-ha@mailman.eng.auburn.edu Subject: [Veritas-ha] best way for patching of cluster servers Hi when it is patching of Sun stand alone servers I can patch them and I know after reboot everything will be fine. I wanted to how to patch Veritas-ha clustered Sun OS 5.9 Machines. Right now the cluster service and application services are running on Server 2. I am not sure after patching if something messes the cluster or applications running. Please let me know best practices to update patches on Cluster servers so that machines will have
Re: [Veritas-ha] connectivity delays
What address to you telnet to? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Tihomir Cavuzic Sent: Wednesday, November 28, 2007 4:43 AM To: veritas-ha@mailman.eng.auburn.edu Subject: [Veritas-ha] connectivity delays Hello, Let me introduce my little connectivity question, maybe VCS-related: VCS 4.1, 2 Netras 440 with Solaris 10, 5 service groups, one of them is network. Config files attached. The problem is that often we experience connectivity delays which are demonstrated for instance by telnet hold-ups, temporary outages of diameter links and similar, all in duration of couple of seconds. As soon as it is over, everything goes back to normal, telnet buffer is emptied, diameter links are up again automatically etc. Is there any chance this could have something to do with VCS, or I should be looking only to Solaris, switch (port) configuration and ethernet interfaces on my Solaris boxes? I ask this since many boxes are connected to the same switch, switch ports are uniformly configured, and still only my machines have trouble with delays. The only differece is that only my machines have VCS and Solaris 10 -- all the others have Solaris 8/9 and no VCS. Sorry if it sounds trivial, I'm just not sure where to start looking into... Thanks/Regards Tihomir ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] Interconnect hardware specifications
A switch? No. 2 switches? Ok. We would be looking for 100BaseT or Gigabit, full duplex. Not so much from a bandwidth standpoint, just reliability. Full duplex removes collision issues. No problems with dedicated switches per interconnect network -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Stefhen Hovland Sent: Tuesday, November 27, 2007 3:02 PM To: veritas-ha@mailman.eng.auburn.edu Subject: [Veritas-ha] Interconnect hardware specifications Does anyone have any information location as to a minimum hardware type to be used for VCS interconnects? We have some production boxes running with a Linksys switch in between the hosts and I would like to know for sure if this is a good idea or not. Thanks, Stefhen ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] SF/HA 5.0 on Solaris 9: HAD Self Check error
HAD is not talking to GAB. Excessive system utilization, or a blocked /var file system or some such issue. From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Marianne Van Den Berg Sent: Tuesday, November 13, 2007 1:17 PM To: veritas-ha@mailman.eng.auburn.edu Subject: [Veritas-ha] SF/HA 5.0 on Solaris 9: HAD Self Check error Hi all Brand new installation - 2-node cluster, Solaris 9 with latest O/S patches, SF/HA 5.0 with MP1. IPMultiNICB config'ed as parallel sg (using mpathd) and ClusterService group. Getting these errors about 3 minutes after hastart. Any ideas?? /var/adm/messages: Nov 13 15:59:11 drp-db-1 gab: [ID 272231 kern.notice] GAB WARNING V-15-1-20057 Port h process 140 inactive 7 sec Nov 13 15:59:12 drp-db-1 gab: [ID 272231 kern.notice] GAB WARNING V-15-1-20057 Port h process 140 inactive 8 sec Nov 13 15:59:13 drp-db-1 gab: [ID 272231 kern.notice] GAB WARNING V-15-1-20057 Port h process 140 inactive 9 sec Nov 13 15:59:14 drp-db-1 gab: [ID 272231 kern.notice] GAB WARNING V-15-1-20057 Port h process 140 inactive 10 sec Nov 13 15:59:15 drp-db-1 Had[140]: [ID 702911 daemon.alert] VCS WARNING V-16-1-51047 HAD Self Check: Excessive delay in the HAD heartbeat to GAB (10 seconds) Nov 13 15:59:15 drp-db-1 gab: [ID 272231 kern.notice] GAB WARNING V-15-1-20057 Port h process 140 inactive 11 sec Nov 13 15:59:16 drp-db-1 gab: [ID 272231 kern.notice] GAB WARNING V-15-1-20057 Port h process 140 inactive 12 sec Nov 13 15:59:17 drp-db-1 gab: [ID 272231 kern.notice] GAB WARNING V-15-1-20057 Port h process 140 inactive 13 sec Nov 13 15:59:18 drp-db-1 gab: [ID 272231 kern.notice] GAB WARNING V-15-1-20057 Port h process 140 inactive 14 sec Nov 13 15:59:19 drp-db-1 gab: [ID 191522 kern.notice] GAB WARNING V-15-1-20058 Port h process 140: heartbeat failed, killing process Nov 13 15:59:19 drp-db-1 gab: [ID 975177 kern.notice] GAB INFO V-15-1-20059 Port h heartbeat interval 15000 msec. Statistics: Nov 13 15:59:19 drp-db-1 gab: [ID 217350 kern.notice] GAB INFO V-15-1-20129 Port h: heartbeats in 0 ~ 3000 msec: 3869 Nov 13 15:59:19 drp-db-1 gab: [ID 217350 kern.notice] GAB INFO V-15-1-20129 Port h: heartbeats in 3000 ~ 6000 msec: 0 Nov 13 15:59:19 drp-db-1 gab: [ID 217350 kern.notice] GAB INFO V-15-1-20129 Port h: heartbeats in 6000 ~ 9000 msec: 0 Nov 13 15:59:19 drp-db-1 gab: [ID 217350 kern.notice] GAB INFO V-15-1-20129 Port h: heartbeats in 9000 ~ 12000 msec: 0 Nov 13 15:59:19 drp-db-1 gab: [ID 217350 kern.notice] GAB INFO V-15-1-20129 Port h: heartbeats in 12000 ~ 15000 msec: 0 Nov 13 15:59:19 drp-db-1 gab: [ID 259915 kern.notice] GAB INFO V-15-1-20094 number of processes: 158 Nov 13 15:59:19 drp-db-1 gab: [ID 631272 kern.notice] GAB INFO V-15-1-20095 load average in 1 min: 0. 6 Nov 13 15:59:19 drp-db-1 gab: [ID 587815 kern.notice] GAB INFO V-15-1-20096 load average in 5 min: 0. 8 Nov 13 15:59:19 drp-db-1 gab: [ID 980060 kern.notice] GAB INFO V-15-1-20097 load average in 15 min:0.10 Nov 13 15:59:19 drp-db-1 gab: [ID 559196 kern.notice] GAB INFO V-15-1-20098 pagein rate: 0 Nov 13 15:59:19 drp-db-1 gab: [ID 582491 kern.notice] GAB INFO V-15-1-20099 pageout rate: 0 Nov 13 15:59:19 drp-db-1 gab: [ID 940236 kern.notice] GAB INFO V-15-1-20041 Port h: client process failure: killing process Nov 13 15:59:19 drp-db-1 Had[140]: [ID 702911 daemon.alert] VCS WARNING V-16-1-53034 HAD Signal SIGABRT received Nov 13 15:59:19 drp-db-1 Had[140]: [ID 702911 daemon.alert] VCS NOTICE V-16-1-53038 Beginning execution of the diagnostics script Nov 13 15:59:21 drp-db-1 Had[140]: [ID 702911 daemon.alert] VCS NOTICE V-16-1-53039 Completed execution of the diagnostics script Nov 13 15:59:22 drp-db-1 gab: [ID 397130 kern.notice] GAB INFO V-15-1-20032 Port h closed Nov 13 15:59:22 drp-db-1 syslog[29181]: [ID 702911 daemon.notice] VCS ERROR V-16-1-11103 VCS exited. It will restart had restarts, but the same thing happens again after a couple of minutes. Regards Marianne ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] SF/HA 5.0 on Solaris 9: HAD Self Check error
Sorry Randy, that was not a case of saying dunno. HAD not heartbeating GAB is usually indicative of a system load issue or something blocking the ability of HAD to open necessary lock files. These are general statements as this can happen on any environment and should be easy to track down. Specific questions, or more difficult to solve issues need to be opened as a support case. This is a general discussion forum, not a support avenue for VCS. Since the support guys have access to explorer output, core files, and far more day to day experience, they can answer far better. From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Randy Slead Sent: Tuesday, November 13, 2007 2:43 PM To: veritas-ha@mailman.eng.auburn.edu Subject: Re: [Veritas-ha] SF/HA 5.0 on Solaris 9: HAD Self Check error I have seen this on all version of VCS (4/5) even at 10% system utilization. And Symantec going I dunno, is not helpful. Jim Senicka [EMAIL PROTECTED] wrote: HAD is not talking to GAB. Excessive system utilization, or a blocked /var file system or some such issue. From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Marianne Van Den Berg Sent: Tuesday, November 13, 2007 1:17 PM To: veritas-ha@mailman.eng.auburn.edu Subject: [Veritas-ha] SF/HA 5.0 on Solaris 9: HAD Self Check error Hi all Brand new installation - 2-node cluster, Solaris 9 with latest O/S patches, SF/HA 5.0 with MP1. IPMultiNICB config'ed as parallel sg (using mpathd) and ClusterService group. Getting these errors about 3 minutes after hastart. Any ideas?? /var/adm/messages: Nov 13 15:59:11 drp-db-1 gab: [ID 272231 kern.notice] GAB WARNING V-15-1-20057 Port h process 140 inactive 7 sec Nov 13 15:59:12 drp-db-1 gab: [ID 272231 kern.notice] GAB WARNING V-15-1-20057 Port h process 140 inactive 8 sec Nov 13 15:59:13 drp-db-1 gab: [ID 272231 kern.notice] GAB WARNING V-15-1-20057 Port h process 140 inactive 9 sec Nov 13 15:59:14 drp-db-1 gab: [ID 272231 kern.notice] GAB WARNING V-15-1-20057 Port h process 140 inactive 10 sec Nov 13 15:59:15 drp-db-1 Had[140]: [ID 702911 daemon.alert] VCS WARNING V-16-1-51047 HAD Self Check: Excessive delay in the HAD heartbeat to GAB (10 seconds) Nov 13 15:59:15 drp-db-1 gab: [ID 272231 kern.notice] GAB WARNING V-15-1-20057 Port h process 140 inactive 11 sec Nov 13 15:59:16 drp-db-1 gab: [ID 272231 kern.notice] GAB WARNING V-15-1-20057 Port h process 140 inactive 12 sec Nov 13 15:59:17 drp-db-1 gab: [ID 272231 kern.notice] GAB WARNING V-15-1-20057 Port h process 140 inactive 13 sec Nov 13 15:59:18 drp-db-1 gab: [ID 272231 kern.notice] GAB WARNING V-15-1-20057 Port h process 140 inactive 14 sec Nov 13 15:59:19 drp-db-1 gab: [ID 191522 kern.notice] GAB WARNING V-15-1-20058 Port h process 140: heartbeat failed, killing process Nov 13 15:59:19 drp-db-1 gab: [ID 975177 kern.notice] GAB INFO V-15-1-20059 Port h heartbeat interval 15000 msec. Statistics: Nov 13 15:59:19 drp-db-1 gab: [ID 217350 kern.notice] GAB INFO V-15-1-20129 Port h: heartbeats in 0 ~ 3000 msec: 3869 Nov 13 15:59:19 drp-db-1 gab: [ID 217350 kern.notice] GAB INFO V-15-1-20129 Port h: heartbeats in 3000 ~ 6000 msec: 0 Nov 13 15:59:19 drp-db-1 gab: [ID 217350 kern.notice] GAB INFO V-15-1-20129 Port h: heartbeats in 6000 ~ 9000 msec: 0 Nov 13 15:59:19 drp-db-1 gab: [ID 217350 kern.notice] GAB INFO V-15-1-20129 Port h: heartbeats in 9000 ~ 12000 msec: 0 Nov 13 15:59:19 drp-db-1 gab: [ID 217350 kern.notice] GAB INFO V-15-1-20129 Port h: heartbeats in 12000 ~ 15000 msec: 0 Nov 13 15:59:19 drp-db-1 gab: [ID 259915 kern.notice] GAB INFO V-15-1-20094 number of processes: 158 Nov 13 15:59:19 drp-db-1 gab: [ID 631272 kern.notice] GAB INFO V-15-1-20095 load average in 1 min: 0. 6 Nov 13 15:59:19 drp-db-1 gab: [ID 587815 kern.notice] GAB INFO V-15-1-20096 load average in 5 min: 0. 8 Nov 13 15:59:19 drp-db-1 gab: [ID 980060 kern.notice] GAB INFO V-15-1-20097 load average in 15 min:0.10 Nov 13 15:59:19 drp-db-1 gab: [ID 559196 kern.notice] GAB INFO V-15-1-20098 pagein rate: 0 Nov 13 15:59:19 drp-db-1 gab: [ID 582491 kern.notice] GAB INFO V-15-1-20099 pageout rate: 0 Nov 13 15:59:19 drp-db-1 gab: [ID 940236 kern.notice] GAB INFO V-15-1-20041 Port h: client process failure: killing process Nov 13 15:59:19 drp-db-1 Had[140]: [ID 702911 daemon.alert] VCS WARNING V-16-1-53034 HAD Signal SIGABRT received Nov 13 15:59:19 drp-db-1 Had[140]: [ID 702911 daemon.alert] VCS NOTICE V-16-1-53038 Beginning execution of the diagnostics script
Re: [Veritas-ha] OnlineRetryLimit weird behaviour
Online Retry Limit sets how many times to attempt to online a resource when initial attempt fails. This is not a service group setting From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Gurugunti, Mahesh Sent: Tuesday, October 09, 2007 11:54 AM To: veritas-ha@mailman.eng.auburn.edu Subject: [Veritas-ha] OnlineRetryLimit weird behaviour I set OnlineRetryLimit = 1 for a service group, the service group keeps on restarting more that once inspite of this setting. Any ideas? Mahesh - The information contained in this transmission may be privileged and confidential and is intended only for the use of the person(s) named above. If you are not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender immediately by reply e-mail and destroy all copies of the original message. Please note that we do not accept account orders and/or instructions by e-mail, and therefore will not be responsible for carrying out such orders and/or instructions. If you, as the intended recipient of this message, the purpose of which is to inform and update our clients, prospects and consultants of developments relating to our services and products, would not like to receive further e-mail correspondence from the sender, please reply to the sender indicating your wishes. In the U.S.: 1345 Avenue of the Americas, New York, NY 10105. ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] Adding a LUN in Veritas Cluster
Nothing. Unless you use volume resources in the dependency tree Sent from my Nokia E62 handheld by goodlink. -Original Message- From: Artur Baruchi [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 09, 2007 05:46 PM US Mountain Standard Time To: veritas-ha@mailman.eng.auburn.edu Subject:[Veritas-ha] Adding a LUN in Veritas Cluster Hi, After the server recognize a LUN, what is the steps to add these LUNs in Veritas Cluster, I already have a VG that is shared. Thanks, Artur Baruchi ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] Conversion from Assymetric to Symmetric VCS Cluster
Add a second service group and set its auto start list to have node B first From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Shivalingam Vanam Sent: Tuesday, September 11, 2007 9:38 AM To: veritas-ha@mailman.eng.auburn.edu Subject: [Veritas-ha] Conversion from Assymetric to Symmetric VCS Cluster Hi, Can some one point me to the documentation on the subject matter? We would like to create a new SG ono B node by doing so. Thanks VSL More photos; more messages; more whatever - Get MORE with Windows Live(tm) Hotmail(r). NOW with 5GB storage. Get more! http://imagine-windowslive.com/hotmail/?locale=en-usocid=TXT_TAGHM_mig ration_HM_mini_5G_0907 ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] change cluster node
cluster service group is a VCS thing. It will not effect your app at all, and does not need to be running for your application to run. It is there for the Web UI and to host the connector if GCO is configured From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of upen Sent: Tuesday, September 11, 2007 10:45 AM To: veritas-ha@mailman.eng.auburn.edu Subject: [Veritas-ha] change cluster node Hi, Following is the result of hastatus -summary. I want to make both groups on any one node but with least down time if at all. bb is service group while ClusterService is cluster group. how to change ClusterService on node2 ONLINE and ClusterService on OFFLINE so that both service groups will be on single node. Also will this involve any application services downtime ? hastatus -summary -- SYSTEM STATE -- System StateFrozen A node1RUNNING 0 A node2RUNNING 0 -- GROUP STATE -- Group System Probed AutoDisabledState B ClusterService node1Y N ONLINE B ClusterService node2Y N OFFLINE B bb node1Y N OFFLINE B bb node2Y N ONLINE I am new to VCS, so please help with complete commands. Thanks in advance. - upen, emerge -uD life (Upgrade Life with dependencies) ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] regarding veritas ha and apache logs from blackboard
This is pretty much an apache issue, not VCS. If you need to bounce apache to make it happen, you would simply freeze the service group while doing so to keep VCS from reacting, or use VCS to stop/start apache. As for the command to clear logs, I cannot help you there. Sent from my Nokia E62 handheld by goodlink. -Original Message- From: upen [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 05, 2007 12:21 PM US Mountain Standard Time To: veritas-ha@mailman.eng.auburn.edu Subject:[Veritas-ha] regarding veritas ha and apache logs from blackboard Hi we are using blackboard application with apache 1.3.33 version on Sun nodes on veritas ha cluster. I was told that if apache logs increase beyond 2 GB the Blackboard application misbehaves. How can I clear the logs or transfer and make 0 size . Please let me know the procedure to have a minimum downtime with application services. Thanks in advance upendra -- upen, emerge -uD life (Upgrade Life with dependencies) ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] Fw: gab restarts had
A couple points. - HAD getting recycled by GAB is due to HAD not heart beating with GAB on the local box. This has pretty much zero to do with LLT heartbeat between boxes. - HAD not heart beating GAB is indicative of HAD either being swapped out due to extreme high load (it runs as a real time process on Solaris) or HAD blocking for some reason in an I/O call. This can really only happen if /var is full or write protected, as there is some lock file activity there. - The only way HAD could possibly be effected by physical networks was if it was blocking on some piece of data that must be sent, but you would also see lots of corresponding LLT alarms. So, based on what I see here, HAD is not running correctly due to either a problem with /var or a load issue. -Original Message- From: Peter DrakeUnderkoffler [mailto:[EMAIL PROTECTED] Sent: Thursday, August 09, 2007 9:26 AM To: Kiss László - Károly Cc: Jim Senicka; veritas-ha@mailman.eng.auburn.edu Subject: Re: [Veritas-ha] Fw: gab restarts had -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 But it shouldn't be halting the system now, gab will still kill had. Do you have any llt errors or more importantly, any layer 2 errors with the heartbeat networks? How do you know it's not the load? What are you using to determine that? What do you see in the /var/adm/messages or the output to dmesg a little after this is happening. Those errors are a symptom of a system under too much load, but other things can cause that kind of symptom. You need to actually start digging into the O/S layer and figure out what the system is doing. The adjustment I mentioned to gabtab allows you that opportunity. The other solution is to open a support call with Symantec and let them figure out what is going on. Thanks Peter Peter DrakeUnderkoffler Xinupro, LLC 617-834-2352 Kiss László - Károly wrote: Hi, I followed your instruction adn edited gabtab which now looks like: /sbin/gabconfig -c -k -n2 but still the HAD is restarted by the gab. BR, Laszlo - Original Message From: Peter DrakeUnderkoffler [EMAIL PROTECTED] To: Jim Senicka [EMAIL PROTECTED] Cc: Kiss László - Károly [EMAIL PROTECTED]; veritas-ha@mailman.eng.auburn.edu Sent: Wednesday, 8 August, 2007 5:33:32 PM Subject: Re: [Veritas-ha] Fw: gab restarts had I agree with Jim, that is the failure scenario when the system is overloaded and gab isn't able to communicate for a period of time. As a temporary measure, you can add a -k to gabtab and restart gab. This will have it not force the system to panic giving you time to resolve the underlying issue. I wouldn't leave this in place though Thanks Peter Peter DrakeUnderkoffler Xinupro, LLC 617-834-2352 Jim Senicka wrote: is the system heavily loaded? GAB restarts HAD when HAD does not communicate with GAB for 16 seconds. This usually happens only in super overload situations *From:* [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] *On Behalf Of *Kiss László - Károly *Sent:* Wednesday, August 08, 2007 11:04 AM *To:* veritas-ha@mailman.eng.auburn.edu *Subject:* [Veritas-ha] Fw: gab restarts had Sorry, I forgot the file :( Here it is - Forwarded Message From: Kiss László - Károly [EMAIL PROTECTED] To: veritas-ha@mailman.eng.auburn.edu Sent: Wednesday, 8 August, 2007 5:02:42 PM Subject: Re: [Veritas-ha] gab restarts had Hi, We have a two node cluster, VCS 4.1. When I try to bring online/offline a resource or when I try to make a switchover I get some very strange behaviour. The gab daemon restarts the veritas and thus I can't do anything with it. I checked all the stepps from the install guide Verifying LLT, GAB, and Cluster Operation chapter an everything looks fine but when I try to do something it just restarts. I attached the complete log of a restart, here is a snipet from it: Aug 8 22:29:23 NTMS1AN1 gab: [ID 272231 kern.notice] GAB WARNING V-15-1-20057 Port h process 5182 inactive 14 sec Aug 8 22:29:24 NTMS1AN1 Had[5182]: [ID 702911 daemon.alert] VCS WARNING V-16-1-53024 HAD Signal SIGABRT received Aug 8 22:29:24 NTMS1AN1 Had[5182]: [ID 702911 daemon.alert] VCS NOTICE V-16-1-53028 Beginning execution of the diagnostics script Aug 8 22:29:24 NTMS1AN1 gab: [ID 191522 kern.notice] GAB WARNING V-15-1-20058 Port h process 5182: heartbeat failed, killing process Thanks. BR, Laszlo Yahoo! Answers - Get better answers from someone who knows. Try it now http://uk.answers.yahoo.com/;_ylc=X3oDMTEydmViNG02BF9TAzIxMTQ3MTcxOTAEc2VjA21haWwEc2xrA3RhZ2xpbmU. Yahoo! Mail is the world's favourite email. Don't settle for less, sign up for your free account today http://uk.rd.yahoo.com/evt=44106/*http
Re: [Veritas-ha] Fw: gab restarts had
what OS/Version and what version of VCS? Something is blocking HAD ability to heartbeat GAB From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Kiss László - Károly Sent: Wednesday, August 08, 2007 11:38 AM To: Peter DrakeUnderkoffler; Jim Senicka Cc: veritas-ha@mailman.eng.auburn.edu Subject: Re: [Veritas-ha] Fw: gab restarts had Thanks for both of you! Looks like the system is not loaded, an oracle and a java app is running on it but is not loaded. and this error comes only when I try to do something with vcs. I definitly would not let this in place, that's why I would like to get some info, what to do in a siutation like this. Thanks. - Original Message From: Peter DrakeUnderkoffler [EMAIL PROTECTED] To: Jim Senicka [EMAIL PROTECTED] Cc: Kiss László - Károly [EMAIL PROTECTED]; veritas-ha@mailman.eng.auburn.edu Sent: Wednesday, 8 August, 2007 5:33:32 PM Subject: Re: [Veritas-ha] Fw: gab restarts had -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I agree with Jim, that is the failure scenario when the system is overloaded and gab isn't able to communicate for a period of time. As a temporary measure, you can add a -k to gabtab and restart gab. This will have it not force the system to panic giving you time to resolve the underlying issue. I wouldn't leave this in place though Thanks Peter Peter DrakeUnderkoffler Xinupro, LLC 617-834-2352 Jim Senicka wrote: is the system heavily loaded? GAB restarts HAD when HAD does not communicate with GAB for 16 seconds. This usually happens only in super overload situations *From:* [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] *On Behalf Of *Kiss László - Károly *Sent:* Wednesday, August 08, 2007 11:04 AM *To:* veritas-ha@mailman.eng.auburn.edu *Subject:* [Veritas-ha] Fw: gab restarts had Sorry, I forgot the file :( Here it is - Forwarded Message From: Kiss László - Károly [EMAIL PROTECTED] To: veritas-ha@mailman.eng.auburn.edu Sent: Wednesday, 8 August, 2007 5:02:42 PM Subject: Re: [Veritas-ha] gab restarts had Hi, We have a two node cluster, VCS 4.1. When I try to bring online/offline a resource or when I try to make a switchover I get some very strange behaviour. The gab daemon restarts the veritas and thus I can't do anything with it. I checked all the stepps from the install guide Verifying LLT, GAB, and Cluster Operation chapter an everything looks fine but when I try to do something it just restarts. I attached the complete log of a restart, here is a snipet from it: Aug 8 22:29:23 NTMS1AN1 gab: [ID 272231 kern.notice] GAB WARNING V-15-1-20057 Port h process 5182 inactive 14 sec Aug 8 22:29:24 NTMS1AN1 Had[5182]: [ID 702911 daemon.alert] VCS WARNING V-16-1-53024 HAD Signal SIGABRT received Aug 8 22:29:24 NTMS1AN1 Had[5182]: [ID 702911 daemon.alert] VCS NOTICE V-16-1-53028 Beginning execution of the diagnostics script Aug 8 22:29:24 NTMS1AN1 gab: [ID 191522 kern.notice] GAB WARNING V-15-1-20058 Port h process 5182: heartbeat failed, killing process Thanks. BR, Laszlo Yahoo! Answers - Get better answers from someone who knows. Try it now http://uk.answers.yahoo.com/;_ylc=X3oDMTEydmViNG02BF9TAzIxMTQ3MTcxOTAEc2VjA21haWwEc2xrA3RhZ2xpbmU. Yahoo! Mail is the world's favourite email. Don't settle for less, sign up for your free account today http://uk.rd.yahoo.com/evt=44106/*http://uk.docs.yahoo.com/mail/winter07.html. ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.5 (FreeBSD) iD8DBQFGueJMl+lekZRM55oRAlHCAKCphUmbjZPjOGoPIJPqLhUvxrMiJQCeM4j5 TkSvq1fjh7bB6GHtHmKFCZc= =/UyC -END PGP SIGNATURE- Yahoo! Mail is the world's favourite email. Don't settle for less, sign up for your free account today http://uk.rd.yahoo.com/evt=44106/*http://uk.docs.yahoo.com/mail/winter07.html . ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] LLTnot configured error after reboot
GAB saying LLT not configured means LLT is not running. It is not saying LLT is not configured correctly in llttab Sent from my Nokia E62 handheld by goodlink. -Original Message- From: robertinoau [mailto:[EMAIL PROTECTED] Sent: Wednesday, June 20, 2007 11:40 PM Mountain Standard Time To: Damodharan K; veritas-ha@mailman.eng.auburn.edu Subject:Re: [Veritas-ha] LLTnot configured error after reboot Try this: /etc/rc2.d/S92gab start Then gabconfig -c -x --- Damodharan K [EMAIL PROTECTED] wrote: hi all After unloading and loading GAB Its giving the following error # gabconfig -c -x GAB gabconfig ERROR V-15-2-25015 LLT not configured But the LLT are correctly configured test02-ap: more /etc/llttab set-node test02-ap set-cluster 70 link qfe2 /dev/qfe:2 - ether - - link qfe7 /dev/qfe:7 - ether - - test02-ap: more /etc/gabtab /sbin/gabconfig -c -n2 test02-ap: gabconfig -a GAB Port Memberships === test02-ap: more /etc/gabtab /sbin/gabconfig -c -n2 Damodharan K Tata Consultancy Services Mailto: [EMAIL PROTECTED] Website: http://www.tcs.com robertinoau [EMAIL PROTECTED] 06/21/2007 05:08 AM To Damodharan K [EMAIL PROTECTED], veritas-ha@mailman.eng.auburn.edu cc Subject Re: [Veritas-ha] LLTand GAB problem after first rebooting when configured Try this: 1.) Unload GAB # gabconfig -U 2.) Restart GAB. # gabconfig -c -x 3.) Finally restart HAD. # hastart --- Damodharan K [EMAIL PROTECTED] wrote: Dear all, Iam having V480 2 servers with vcs 4.1 and vxvm 4.1 Iam newly building two node cluster. At installation and configuration the cluster service worked fine. But after reboot the LLT , GAB is not running and not able to start Cluster service .Please help to slove this issue .Iam sending configuraion and the engine log Engine_A.log 2007/04/18 14:13:44 VCS INFO V-16-1-10125 GAB timeout set to 15000 ms 2007/04/18 14:13:44 VCS ERROR V-16-1-10116 GabHandle::open failed errno = 261 2007/04/18 14:13:44 VCS ERROR V-16-1-11033 GAB open failed. Exiting 2007/04/18 14:13:54 VCS NOTICE V-16-1-11022 VCS engine (had) started 2007/04/18 14:13:54 VCS NOTICE V-16-1-11027 VCS engine startup arguments=-restar Configurations test02-ap: gabconfig -l GAB Driver Configuration Driver state : Unconfigured Partition arbitration: Disabled Control port seed: Enabled Halt on process death: Disabled Missed heartbeat halt: Disabled Halt on rejoin : Disabled Keep on killing : Disabled Quorum flag : Disabled Restart : Disabled Node count : 2 Disk HB interval (ms): 1000 Disk HB miss count : 4 IOFENCE timeout (ms) : 15000 Stable timeout (ms) : 5000 test02-ap: more /etc/llttab set-node test02-ap set-cluster 70 link qfe2 /dev/qfe:2 - ether - - link qfe7 /dev/qfe:7 - ether - - test02-ap: more /etc/gabtab /sbin/gabconfig -c -n2 test02-ap: gabconfig -a GAB Port Memberships === test02-ap: more /etc/gabtab /sbin/gabconfig -c -n2 test02-ap: more main.cf include types.cf cluster vcsdev-ap ( UserNames = { admin = bopHojOlpKppNxpJom } ClusterAddress = 172.25.7.98 Administrators = { admin } CredRenewFrequency = 0 UseFence = SCSI3 CounterInterval = 5 ) system test01-ap ( Limits = { Processors = 4 } ) system test02-ap ( Limits = { Processors = 4 } ) group ClusterService ( SystemList = { test01-ap = 0, test02-ap = 1 } AutoStartList = { test01-ap, test02-ap } FailOverPolicy = Load AutoStartPolicy = Load OnlineRetryLimit = 3 OnlineRetryInterval = 120 Load = 4 ) IP webip ( Device = ce0 Address = 172.25.7.98 NetMask = 255.255.255.248 ) NIC csgnic ( Device = ce0 ) VRTSWebApp VCSweb ( Critical = 0 AppName = vcs InstallDir = /opt/VRTSweb/VERITAS TimeForOnline = 5 RestartLimit = 3 ) VCSweb requires webip webip requires csgnic // resource dependency tree // // group ClusterService === message truncated === _ Yahoo!7 Mail has just got even bigger and better with unlimited storage on all webmail accounts. http://au.docs.yahoo.com/mail/unlimitedstorage.html
Re: [Veritas-ha] LLTand GAB problem after first rebooting whenconfigured
LLT is not starting right? All other data is non relevant. Fix the llt issue so gab can start so had can start Sent from my Nokia E62 handheld by goodlink. -Original Message- From: Damodharan K [mailto:[EMAIL PROTECTED] Sent: Thursday, June 21, 2007 04:06 PM Mountain Standard Time To: veritas-ha@mailman.eng.auburn.edu Subject:[Veritas-ha] LLTand GAB problem after first rebooting whenconfigured Dear all, Iam having V480 2 servers with vcs 4.1 and vxvm 4.1 Iam newly building two node cluster. At installation and configuration the cluster service worked fine. But after reboot the LLT , GAB is not running and not able to start Cluster service .Please help to slove this issue .Iam sending configuraion and the engine log Engine_A.log 2007/04/18 14:13:44 VCS INFO V-16-1-10125 GAB timeout set to 15000 ms 2007/04/18 14:13:44 VCS ERROR V-16-1-10116 GabHandle::open failed errno = 261 2007/04/18 14:13:44 VCS ERROR V-16-1-11033 GAB open failed. Exiting 2007/04/18 14:13:54 VCS NOTICE V-16-1-11022 VCS engine (had) started 2007/04/18 14:13:54 VCS NOTICE V-16-1-11027 VCS engine startup arguments=-restar Configurations test02-ap: gabconfig -l GAB Driver Configuration Driver state : Unconfigured Partition arbitration: Disabled Control port seed: Enabled Halt on process death: Disabled Missed heartbeat halt: Disabled Halt on rejoin : Disabled Keep on killing : Disabled Quorum flag : Disabled Restart : Disabled Node count : 2 Disk HB interval (ms): 1000 Disk HB miss count : 4 IOFENCE timeout (ms) : 15000 Stable timeout (ms) : 5000 test02-ap: more /etc/llttab set-node test02-ap set-cluster 70 link qfe2 /dev/qfe:2 - ether - - link qfe7 /dev/qfe:7 - ether - - test02-ap: more /etc/gabtab /sbin/gabconfig -c -n2 test02-ap: gabconfig -a GAB Port Memberships === test02-ap: more /etc/gabtab /sbin/gabconfig -c -n2 test02-ap: more main.cf include types.cf cluster vcsdev-ap ( UserNames = { admin = bopHojOlpKppNxpJom } ClusterAddress = 172.25.7.98 Administrators = { admin } CredRenewFrequency = 0 UseFence = SCSI3 CounterInterval = 5 ) system test01-ap ( Limits = { Processors = 4 } ) system test02-ap ( Limits = { Processors = 4 } ) group ClusterService ( SystemList = { test01-ap = 0, test02-ap = 1 } AutoStartList = { test01-ap, test02-ap } FailOverPolicy = Load AutoStartPolicy = Load OnlineRetryLimit = 3 OnlineRetryInterval = 120 Load = 4 ) IP webip ( Device = ce0 Address = 172.25.7.98 NetMask = 255.255.255.248 ) NIC csgnic ( Device = ce0 ) VRTSWebApp VCSweb ( Critical = 0 AppName = vcs InstallDir = /opt/VRTSweb/VERITAS TimeForOnline = 5 RestartLimit = 3 ) VCSweb requires webip webip requires csgnic // resource dependency tree // // group ClusterService // { // VRTSWebApp VCSweb // { // IP webip // { // NIC csgnic // } // } // } Damodharan K Tata Consultancy Services Mailto: [EMAIL PROTECTED] Website: http://www.tcs.com =-=-= Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] sample for apache application
what OS? The Linux 5.0 bundled agent reference guide has the Apache agent documented, and I believe the other OS do as well _ From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of osk Sent: Monday, April 30, 2007 3:01 AM To: veritas-ha@mailman.eng.auburn.edu Subject: [Veritas-ha] sample for apache application Hi, I am new to vcs, can you give me one example to configure apache as resoure. recommandation are welcome. regards Karthikeyan.N -- winners don't do different things they do things differently ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] Resource Group Dependencies
We are not planning to address that in VCS at this time. (Multiple children). Please have your account team contact me inside Symantec? Also, what are you running that a 40 second shutdown is too long? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Ceri Davies Sent: Tuesday, April 24, 2007 9:39 AM To: veritas-ha@mailman.eng.auburn.edu Subject: [Veritas-ha] Resource Group Dependencies I note that there is a restriction that a group may have only one child group; is there any future in which this might be relaxed? As a usage case, this is why I want this: I have a multi-node cluster in which I have multiple zones on each node, and fail applications over between the zones. I don't wish to use the configuration quoted in the User's Guide for zones, as this configuration requires that, when a service group fails over, that the zone be stopped on the failing node and then started on the node that the group is failing over to. This is bad as, in my testing, starting a zone is very quick, but waiting for one to shut down takes about 40 seconds. Therefore, I'm eschewing this and have created a parallel resource group that starts a zone on each node and have the application resource groups simply configured with a firm local dependency on the zone resource group; e.g. with an identically configured zone vleappp on each node, I use: group vleappp_zones ( SystemList = { clna = 0, clnb = 0 } Parallel = 1 AutoStartList = { clna, clnb } ) Zone vleappp_zone ( ZoneName = vleappp ) group vle_app_prod ( SystemList = { clna = 1, clnb = 0 } ) Application vleappp_apache ( StartProgram = /nondistinct/vle/application start StopProgram = /nondistinct/vle/application stop PidFiles = { /zones/local/roots/vleappp/root/nondistinct/vle/logs/httpd .pid } ContainerName = vleappp ) Mount vleappp_mount ( ) Blah otherstuff () ... requires group vleappp_zones online local firm This works perfectly for me, except now I want to add a global dependency on the vle_ora_prod group as well. Aargh. I simply can't wait for the zones to shut down so is there some other option? Ceri -- That must be wonderful! I don't understand it at all. -- Moliere ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] Naming conventions for VCS; VCS style guide?
comments below _ From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Colb, Andrew Sent: Tuesday, April 24, 2007 2:14 PM To: veritas-ha@mailman.eng.auburn.edu Subject: [Veritas-ha] Naming conventions for VCS; VCS style guide? All, We are about to initiate and upgrade several VCS clusters. The plan for these upgrades will enable us to build/test in parallel with existing production clusters and to revisit our traditional cluster nomenclature and naming conventions. Our configuration has a five-node production Solaris Veritas cluster at our headquarters; we are building a four-node equivalent at our warm business continuity site (active data replication). The two sites are connected by a point-to-point DS-3; firewall rules allow one site to see and interact with the other. Our current VCS nomenclature is pretty much ad hoc. The new VCS nomenclature would have the following structure: stem_object## where stem is either a functional name (e.g., db) or a singular, universal name (e.g., prod), object is dg or sg, and ## is a zero-padded numeric for serialized differentiation. Question 1: Can we use identical names for VCS diskgroups and service groups at the two sites (HQ and Continuity) simultaneously? Host names will, of course, be different. The clusters will have different cluster IDs. If we do use identical names, will that create a problem if we move on to Global Cluster Option and/or to VVR? For example, if we have a service group named db_sg01 in both our headquarters cluster and our business continuity cluster, will VCS complain? JS Absolutely. No issues with identical names in separate clusters Question 2: Is there an advantage in Veritas management/administration if all the stem names are the same? That is, if we replace existing stem names such as db, auth, appsrv, etc with a single universal name such as prod, will we be gaining anything in exchange for giving up the functional association? JS Cluster Management console either has or will be providing a search function, so it all depends on how you want to search :-) Thanks in advance for any discussion, advice, ideas, guidance, and warnings, Andy Colb Investment Company Institute ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] Proxy resource in status unknown
The actual NIC is not enabled, so the Proxy cannot probe. (at least that is my first thought here) -Original Message- From: Fred Grieco [mailto:[EMAIL PROTECTED] Sent: Monday, April 16, 2007 11:29 AM To: Jim Senicka; veritas-ha@mailman.eng.auburn.edu Subject: RE: [Veritas-ha] Proxy resource in status unknown Here are the snipets from the main.cf. There are three SGs, one with the actual NIC resource and two with proxies. Both proxies show the online status unknown state. group ClusterService ( SystemList = { pa-ocsun-01 = 0, pa-ocsun-02 = 1 } AutoStartList = { pa-ocsun-01, pa-ocsun-02 } OnlineRetryLimit = 3 OnlineRetryInterval = 120 ) IP webip ( Device = ce0 Address = 192.168.49.146 NetMask = 255.255.255.0 ) ... Proxy NICProxycsg ( Critical = 0 TargetResName = nic1 ) group VVR-Remote ( SystemList = { pa-ocsun-01 = 0, pa-ocsun-02 = 1 } ) ... IP replip ( Critical = 0 Device = ce0 Address = 192.168.49.68 NetMask = 255.255.255.0 ) NIC nic1 ( Enabled = 0 Device = ce0 NetworkType = ether NetworkHosts = { 192.168.49.1 } ) ... group oc451 ( SystemList = { pa-ocsun-01 = 0, pa-ocsun-02 = 1 } AutoStartList = { pa-ocsun-01, pa-ocsun-02 } ) ... IP VIP ( Critical = 0 Device = ce0 Address = 192.168.49.145 NetMask = 255.255.255.0 ) ... Proxy NIC-Proxy ( Critical = 0 TargetResName = nic1 ) ... Fred --- Jim Senicka [EMAIL PROTECTED] wrote: Can you cut/paste main.cf sections? -Original Message- From: Fred Grieco [mailto:[EMAIL PROTECTED] Sent: Monday, April 16, 2007 9:30 AM To: Jim Senicka; veritas-ha@mailman.eng.auburn.edu Subject: RE: [Veritas-ha] Proxy resource in status unknown Yes, with the same priorities. --- Jim Senicka [EMAIL PROTECTED] wrote: Are the system lists for both service groups the same? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Fred Grieco Sent: Monday, April 16, 2007 9:08 AM To: veritas-ha@mailman.eng.auburn.edu Subject: [Veritas-ha] Proxy resource in status unknown I've set up a proxy resource that references a NIC resource in another service group. The NIC resource is online, but the proxy resource shows Online|status unknown. What does this mean in a Proxy resource? And is there any way to clear the unknown status? This is on a live Oracle cluster so I don't have the opportunity to down everything, etc. TIA, Fred __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] Step-by-Step instructions for adding storage to cluster
You need to get through VCS training to be honest. Without knowing every detail, I cannot give exact steps. With basic VCS training this would be trivial and you would be fully confident to make the changes [Sent from my Nokia E62 handheld via Goodlink] -Original Message- From: Lynette Oliver [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 10, 2007 08:04 PM Pacific Standard Time To: Jim Senicka; veritas-ha@mailman.eng.auburn.edu Subject:RE: [Veritas-ha] Step-by-Step instructions for adding storage to cluster Thank you for your response, Jim. Do you have the steps? I've inherited a VCS configuration but have never worked on it before. I'm afraid to make changes for fear of creating a situation that could cause a failover. _ From: Jim Senicka [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 10, 2007 7:33 PM To: Lynette Oliver; veritas-ha@mailman.eng.auburn.edu Subject: RE: [Veritas-ha] Step-by-Step instructions for adding storage to cluster if you add volumes you will need to add additional volume resources (if you use volume resources) in the service group, plus whatever additional file systems you add as additional file system resources Growing file systems requires no changes in the cluster _ From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Lynette Oliver Sent: Wednesday, April 11, 2007 12:49 AM To: veritas-ha@mailman.eng.auburn.edu Subject: [Veritas-ha] Step-by-Step instructions for adding storage to cluster Hello HA GURUs, I'm looking for someone to provide me with step-by-step instructions for adding storage to a cluster. For example, I have an existing cluster that requires a new volume group to be added. I have documentation to indicate how to create volume groups and volumes using vxvm but nothing that describes how to integrate this with an existing cluster. In addition, if I need to grow a filesystem for a given volume group managed by a cluster, how do I do so? Please help. This is VCS 4.1 on Solaris 2.9 running on Hitachi USP. Thanks, loliver ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] VCS 5.0 / Solaris 10 Resource Controls / Oracle Agent
Bryan Unfortunately, at this time the VCS 5.x agents are pretty much not designed to work in an SRM environment. We are looking at what it will take to support this -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Bryan Pepin Sent: Tuesday, April 10, 2007 4:34 PM To: veritas-ha@mailman.eng.auburn.edu Subject: [Veritas-ha] VCS 5.0 / Solaris 10 Resource Controls / Oracle Agent Hello, In the process of deploying Oracle 10g on top of SFRAC 5.0 running Solaris 10, I've noticed the following issues around setting shared memory parameters for Oracle. The Oracle Agent does not assume the project that I have assigned to the Oracle user? It is assuming the system project, and when I try to add the resource controls to that system or the default project, that does not work either? Here are the details: Trying to use Solaris' new project methodology to establish the IPC tunables, here is what I did: # projadd -c 'IPC Tunables' -U oracle -G dba -K 'project.max-shm-memory=(privileged,16gb,deny)' user.oracle Now, as the Oracle user, this allows the DB to open without issue. However, when I configure the Oracle VCS agent to start the DB, it appears that the VCS processes are assuming the system project, and when they start the database processes, they are assuming the roles of that project, rather than those of the oracle user that I have defined? Here is the error in the messages file when the DB tries to open from the VCS agent: [ID 883052 kern.notice] privileged rctl project.max-shm-memory (value 6291603456) exceeded by project 0 So I logically thought I could apply the same tunings to the system project, but that does not work either. This is what my project file looks like: system:0process.max-sem-nsems=(privileged,4096,deny);\ process.max-sem-ops=(privileged,4096,deny);project.max-sem-ids=(privileg ed,4096,deny);\ project.max-shm-ids=(privileged,512,deny);project.max-shm-memory=(privil eged,17179869184,deny) user.root:1 noproject:2 default:3 group.staff:10 user.oracle100:IPC Tunables:oracle:dba:process.max-sem-nsems=(privileged,4096,deny);\ process.max-sem-ops=(privileged,4096,deny);project.max-sem-ids=(privileg ed,4096,deny);\ project.max-shm-ids=(privileged,512,deny);project.max-shm-memory=(privil eged,17179869184,deny) What I have been able to do is change the parameters on the fly with prctl: # ps -ef -o pid,project,args | grep -i OracleAgent -- to get the PID and Project # prctl -n project.max-shm-memory -i process PID -- to display # prctl -n project.max-shm-memory -r -v 16gb -i process PID -- to set Once I do that, it allows me to start the database via the Oracle Agent. Has anyone run into this issue? This may be me not properly setting up the system project, but I figure someone must have run into this and they could share how they resolved it. I'm hoping there is an easy solution out there, rather than having to always change the parameter on the running Agent? Hope that all makes sense. Thanks. -Bryan PS. What I have realized is that if I put the shmmax parameters in the /etc/system that works, but I was hoping to have to fall back into that routine. -- Bryan Pepin Unix Enterprise Systems EMC Corporation 4400 Computer Drive Westboro, MA 01580 508-898-4776 [EMAIL PROTECTED] ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] load-balancing in VCS
No, you will need an IP per node and run a off the shelf IP load balancer out front. This is far more standard approach than pumping all traffic through one node and let it forward to all others in the cluster. A serious case of marketecture versus real feature on the Sun Cluster side [Sent from my Nokia E62 handheld via goodlink] -Original Message- From: Rongsheng Fang [mailto:[EMAIL PROTECTED] Sent: Friday, April 06, 2007 10:29 AM Pacific Standard Time To: veritas-ha@mailman.eng.auburn.edu Subject:[Veritas-ha] load-balancing in VCS Hi, Does VCS has (or support) the equivalent functionality of Scalable Data Service in Sun Cluster, which can balance the load between cluster nodes? http://docs.sun.com/app/docs/doc/819-0579/6n30dc0nf?a=view I know that in VCS the service instances can start/run on different cluster nodes in parallel mode, but can these service instances share the same virtual IP which can only be up on one node? Thanks, Rongsheng ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] Custom Agent
If you already have start/stop/monitor, Take a look at the Application Agent in the BARG. That should cover like 98% of apps -Original Message- From: Fred Butler [mailto:[EMAIL PROTECTED] Sent: Thursday, April 05, 2007 11:47 AM To: 'Stanley, Jon'; veritas-ha@mailman.eng.auburn.edu; Jim Senicka Subject: RE: [Veritas-ha] Custom Agent Thanks Jon / Jim! I know you guys don't want to hear this but I write these agents all the time for Sun Cluster and this is my first request to do one for VCS. I already have the start / stop / monitor scripts already created and I just needed the info to incorporate them into the VCS Framework. I will have to write a clean script after I determine if there are things like, shared memory, semaphores or lock files that need to be cleaned up. Jon - Agent Developers Guide huh :-)! Next time I will RTFM I will also read the document Jim sent me. Thanks again! Regards, Fred Butler (484) 241-5912 (Cell #1) (484) 903-4742 (Cell #2) http://www.arch.com/ Pin#: 8778977117 -Original Message- From: Stanley, Jon [mailto:[EMAIL PROTECTED] Sent: Thursday, April 05, 2007 11:06 AM To: Fred Butler; veritas-ha@mailman.eng.auburn.edu Subject: RE: [Veritas-ha] Custom Agent Have you looked at the aptly named 'Agent Developers Guide'? :-) Or maybe the Application agent does what you need it to do instead? If you can provide external scripts to do the online, offline, monitor, and clean functionality, then that's all that you need... -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Fred Butler Sent: Thursday, April 05, 2007 14:45 To: veritas-ha@mailman.eng.auburn.edu Subject: [Veritas-ha] Custom Agent Team - I need to write a custom agent in VCS and I need to know what manual has this information. Or - if someone has some notes on this process they would like to share I would be very appreciative. ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] LVMVG agent does work with VIO ! ! !
We have a number of issues with reservations, and breaking reservations and such. So as of now, if the HCL says not supported, it is not. Please work with your account team to find out what can be done (if anything) to get this added _ From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Pavel A Tsvetkov Sent: Monday, April 02, 2007 9:43 AM To: Veritas-ha@mailman.eng.auburn.edu Subject: [Veritas-ha] LVMVG agent does work with VIO ! ! ! Hello all! My last post was a question about Symantec support of LVMVG agent in VIO configuration. It seems to me nobody could answer my question... So I decided to check it out myself. I installed VCS Cluster 5 MP1 for AIX on my 570 server with two LPARs and two VIO-s. I used only one VIO in my configuration. One disk was shared by VIO for two LPAR-s. The LVM group was created and clustered. Everything was quite right! No problems with switching over of the LVM group from one LPAR to another. So I'd like very much to get any comments from Symantec people ! Regards, Pavel ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] Veritas Volume Replicator in Replicated DataCluster question
RDC = autofailover. GCO = Operator confirmed failover. So in GCO, an operator makes a choice to startup on old data or wait on original primary. This is not possible inside a single cluster. And for bunker, this is a GCO config as well [Sent from my Nokia E62 handheld via goodlink] -Original Message- From: Pavel A Tsvetkov [mailto:[EMAIL PROTECTED] Sent: Wednesday, March 14, 2007 02:09 AM Pacific Standard Time To: Jim Senicka; Veritas-ha@mailman.eng.auburn.edu Subject:Re: Re: [Veritas-ha] Veritas Volume Replicator in Replicated DataCluster question Hello Jim! Thank you for the answer. But if I have a choice for replication mode in global cluster it can be useful to have the same thing in RDC. Let it be not up-to-date data on the Secondary , but it is still consistent. :) So the application can be started. And we should take the bunker into consideration! The bunker uses synchronous connection with Primary SRL, so the asynchronous Secondary site can have up-to-date data from the bunker. So if we use bunker and the bunker agent it is quite possible to take RDC in asynchronous mode. With best regards, Pavel We do not support an automatic failover to out of date secondary. So RDC is sync only. If you need async, you need to not treat the the replication like a shared disk, and instead treat it like replication. Take a look at global cluster option, now part of VCS HA/DR edition ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] Veritas Volume Replicator in ReplicatedDataCluster question
If you configure auto failover in GCO (not recommended), then you need to make sure you are using sync replication only. -Original Message- From: Cronin, John S [mailto:[EMAIL PROTECTED] Sent: Wednesday, March 14, 2007 10:07 AM To: Jim Senicka; Pavel A Tsvetkov; Veritas-ha@mailman.eng.auburn.edu Subject: RE: [Veritas-ha] Veritas Volume Replicator in ReplicatedDataCluster question I believe auto-failover is a configurable option with GCO, unless something has changed recently (I didn't go look at the docs). The default is operator confirmation before fail-over, but I have agreed to configured GCO for auto-failover before if a split-brain did not present any significant risk to the customer (eg the customer said it was OK and their preference, and after inquiring into the facts of the situation, I agreed with their conclusions). -- John Cronin 678-480-6266 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jim Senicka Sent: Wednesday, March 14, 2007 5:14 AM To: Pavel A Tsvetkov; Veritas-ha@mailman.eng.auburn.edu Subject: Re: [Veritas-ha] Veritas Volume Replicator in ReplicatedDataCluster question RDC = autofailover. GCO = Operator confirmed failover. So in GCO, an operator makes a choice to startup on old data or wait on original primary. This is not possible inside a single cluster. And for bunker, this is a GCO config as well [Sent from my Nokia E62 handheld via goodlink] -Original Message- From: Pavel A Tsvetkov [mailto:[EMAIL PROTECTED] Sent: Wednesday, March 14, 2007 02:09 AM Pacific Standard Time To: Jim Senicka; Veritas-ha@mailman.eng.auburn.edu Subject:Re: Re: [Veritas-ha] Veritas Volume Replicator in Replicated DataCluster question Hello Jim! Thank you for the answer. But if I have a choice for replication mode in global cluster it can be useful to have the same thing in RDC. Let it be not up-to-date data on the Secondary , but it is still consistent. :) So the application can be started. And we should take the bunker into consideration! The bunker uses synchronous connection with Primary SRL, so the asynchronous Secondary site can have up-to-date data from the bunker. So if we use bunker and the bunker agent it is quite possible to take RDC in asynchronous mode. With best regards, Pavel We do not support an automatic failover to out of date secondary. So RDC is sync only. If you need async, you need to not treat the the replication like a shared disk, and instead treat it like replication. Take a look at global cluster option, now part of VCS HA/DR edition ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha * The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential, proprietary, and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from all computers. GA622 ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] Veritas Volume Replicator in Replicated DataClusterquestion
We do not support an automatic failover to out of date secondary. So RDC is sync only. If you need async, you need to not treat the replication like a shared disk, and instead treat it like replication. Take a look at global cluster option, now part of VCS HA/DR edition. [Sent from my Nokia E62 handheld via goodlink] -Original Message- From: Eric Hennessey [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 13, 2007 09:07 AM Pacific Standard Time To: Pavel A Tsvetkov; Veritas-ha@mailman.eng.auburn.edu Subject:Re: [Veritas-ha] Veritas Volume Replicator in Replicated DataClusterquestion RDCs are supported only with synchronous replication, regardless of the type of replication used. It doesn't matter if it's VVR or some form of array-based replication. Eric _ From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Pavel A Tsvetkov Sent: Tuesday, March 13, 2007 11:08 AM To: Veritas-ha@mailman.eng.auburn.edu Subject: [Veritas-ha] Veritas Volume Replicator in Replicated Data Clusterquestion Hello all! It is known that VVR 4.x can work in Replicated Data Cluster only in synchronous mode. What about version 5 ? Is it possible to have RLINK asynchonous between Primary and Secondary sites? Thanks! Pavel ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] IO Fencing
I/O fencing removes any chance of a split brain in corner cases where all interconnects are severed between sets of nodes and the nodes remain running. [Sent from my Nokia E62 handheld via goodlink] -Original Message- From: Tharindu Rukshan Bamunuarachchi [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 06, 2007 04:46 AM Pacific Standard Time To: Veritas-ha@mailman.eng.auburn.edu Cc: veritas-vx@mailman.eng.auburn.edu Subject:Re: [Veritas-ha] IO Fencing Seems to be I can not enable I/O fencing. My disk controller does not support SCSI3-PR. Can someone pls tell me the effect on applications, If Fencing I/O is disabled. Thankx Tharindu On 3/6/07, Tharindu Rukshan Bamunuarachchi [EMAIL PROTECTED] wrote: Dear All, I have installed Veritas SFCFS on Sun 3310 DiskArray. But I could not enables I/O Fencing. Can someone please explain what is I/O Fencing in Veritas SFCFS. How can I enable I/O fencing, What the benefits I would get with I/O fencing. Thankx Tharindu -- Tharindu Rukshan Bamunuarachchi all fabrications are subject to decay -- Tharindu Rukshan Bamunuarachchi all fabrications are subject to decay ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] VCS with Blades
VCS does not *require* private links. We recommend but do not require. We do require 2 links. You will need to make omne NIC high pri and one low. [Sent from my Nokia E62 handheld via goodlink] -Original Message- From: Kiss László - Károly [mailto:[EMAIL PROTECTED] Sent: Friday, March 02, 2007 07:38 AM Pacific Standard Time To: [EMAIL PROTECTED] Subject:[Veritas-ha] VCS with Blades Hi, Does anyone have experience using VCS with IBM Blades? Especially with Blade LS21? We are just planning to use this hardware and the first problem is the lack of resource for heartbeat link. The Blade has only 2 network ports and both are used for the public network so no interface remains for the heartbeat private network. Is there any other choice for the heartbeat link than private network? Thanks. Best Regards, Laszlo No need to miss a message. Get email on-the-go with Yahoo! Mail for Mobile. Get started. http://mobile.yahoo.com/mail ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] VCS with Blades
No need for node ID to be different, but cluster ID must be managed. Newer releases of VCS allow up to 64k cluster numbers if I recall correctly. [Sent from my Nokia E62 handheld via goodlink] -Original Message- From: Andrey Dmitriev [mailto:[EMAIL PROTECTED] Sent: Friday, March 02, 2007 10:33 AM Pacific Standard Time To: [EMAIL PROTECTED] Subject:Re: [Veritas-ha] VCS with Blades If you do over public. Make sure your cluster ID is unique, and maybe node IDs too across different clusters on the network/subnet. That bit us on an older version of VERITAS Cluster (1.3) -a _ From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Stanley, Jon Sent: Friday, March 02, 2007 10:51 AM To: Kiss László - Károly; [EMAIL PROTECTED] Subject: Re: [Veritas-ha] VCS with Blades I know that in HP blades you can put in mezziane cards that give you additional ports beyond the on-board (they have two slots, so you could put in a dual-channel FC card, I think, and a quad-port Ethernet adapter). I think that you *can* use the public network for LLT, not sure if this is actually supported or not for anything other than a lowpri link. _ From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Kiss László - Károly Sent: Friday, March 02, 2007 10:23 AM To: [EMAIL PROTECTED] Subject: [Veritas-ha] VCS with Blades Hi, Does anyone have experience using VCS with IBM Blades? Especially with Blade LS21? We are just planning to use this hardware and the first problem is the lack of resource for heartbeat link. The Blade has only 2 network ports and both are used for the public network so no interface remains for the heartbeat private network. Is there any other choice for the heartbeat link than private network? Thanks. Best Regards, Laszlo _ Don't get soaked. Take a http://tools.search.yahoo.com/shortcuts/?fr=oni_on_mail#news quick peak at the forecast with theYahoo! http://tools.search.yahoo.com/shortcuts/?fr=oni_on_mail#news Search weather shortcut. ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] Creating a new clustermemebrship with existing ones
Is this a one time thing, and cluster will stay spit? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of R Sent: Tuesday, January 30, 2007 5:47 AM To: veritas-ha@mailman.eng.auburn.edu Subject: Re: [Veritas-ha] Creating a new clustermemebrship with existing ones One way of splitting your 6-node cluster into 2 x 3-node clusters could be as follows: 1. Switch all the existing Service Groups to the first 3 nodes. #hagrp -switch sg_name -to system_name 2. Delete SystemList entries of the second 3 nodes from all the service groups. #hagrp -modify sg_name SystemList -delete system_name 3. Delete second three nodes from the cluster #hasys -delete system_name 4. Create a new 3-node cluster using the deleted nodes. If Service Groups on the 6-node cluster needs to be split between the 2 x 3 node cluster, then you might have to take the Service Groups offline atleast once. -R ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] SF4.1 VCS5.0
That will work _ From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Pavel A Tsvetkov Sent: Monday, January 29, 2007 9:45 AM To: veritas-ha@mailman.eng.auburn.edu Subject: [Veritas-ha] SF4.1 VCS5.0 Hello all! Is it possible to run VCS5.0 with SF4.1 ? Any problems? Thsnks a lot! Pavel. ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] Need to move my site from one location to another (using VCS 4.1)
From VCS side, you will need to update host names in main.cf and llthosts. You will also need to update virtual IP in main.cf per service group. Oracle will likely need an update in listener.ora to reflect the new VIP for listener (Sent from my Blackberry wireless handheld) -Original Message- From: [EMAIL PROTECTED] [EMAIL PROTECTED] To: veritas-ha@mailman.eng.auburn.edu veritas-ha@mailman.eng.auburn.edu Sent: Mon Dec 11 07:11:52 2006 Subject: [Veritas-ha] Need to move my site from one location to another (using VCS 4.1) hi all Iam a new member for this group and proud to hear and join to this group. Iam having 4 nodes(Sun solaris -10) in a cluster using veritas Cluster 4.1 which got connected to SAN disks. These servers having 7 service groups in which each will providing different oracle application service. Here we are planning to move all server including SAN from our location A to another location B, Only IPs and hostname need to changed at B location. How can i go head for this site shifting (configuring the cluster) in step by step. Give your valuable inputs, how and what are the things to be changed in cluster and OS . Also please clarify if any changes in cluster config will reflect any problem for Oracle Database. Specfication for your OS : sunsolaris 10 VCS : 4.1 San: Hitachi Oracle : 9i and 10g Cheers and Regards Damodharan K Tata Consultancy Services Mailto: [EMAIL PROTECTED] Website: http://www.tcs.com =-=-= Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] HA nodes Patch levels
VCS supports any release and patch level that the version of VCS supports within a single cluster. So you do not need identical patch levels, or even same OS release. best practice would be to keep same, but we can easily support multiple versions during upgrades. From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Evsyukov, SergeySent: Monday, November 13, 2006 10:18 AMTo: veritas-ha@mailman.eng.auburn.eduSubject: [Veritas-ha] HA nodes Patch levels Hello colleagues, We have two nodes forHA cluster installation. They has identical OS version (Solaris 5.9), but different kernel patch level 118558-30 vs. 118558-25. Is it admissible configuration of cluster or even patch level must be identical? Thanks, Sergey ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] low priority heartbeat vs I/O Fencing
EMC = 170 odd Gig disks. Coordinator needs 3 10mb LUNS. I waste more space than that on storing bad jokes from email.. I/O fencing is best possible config to prevent split brain. Low Priority is a best practice, but is still not bulletproof. -Original Message- From: Steven Sim [mailto:[EMAIL PROTECTED] Sent: Tuesday, September 19, 2006 11:18 AM To: Jim Senicka; veritas-ha@mailman.eng.auburn.edu Subject: low priority heartbeat vs I/O Fencing Hello Gurus; Firstly, I wish to thank James Senicka of Symantec for his wonderfully fast and very technically accurate replies. I wish all other product vendors were so efficient. Support like this is one reason why I will continue to push VCS as a clustering solution. I am currently trying to convince a customer to implement I/0 Fencing with three SAN based co-ordinator disks. I've told him 3 are required (minimum) and they cannot be used for data. At which point he threw a look at me and asked me whether I was aware of how much per byte his EMC was costing him. Sosome bright spark suggested a low priority heartbeat. Which I was going to implement anyway, with or without I/O Fencing. My question is; Is a low priority heartbeat sufficient for I/O Fencing? If so, why the strong recommendation for I/O Fencing with three co-ordinator disks? I have been telling people that a low priority heartbeat is not sufficient protection against split brain scenarios. Could you guys comment? Warmest Regards Steven Sim Fujitsu Asia Pte. Ltd. _ This e-mail is confidential and may also be privileged. If you are not the intended recipient, please notify us immediately. You should not copy or use it for any purpose, nor disclose its contents to any other person. Opinions, conclusions and other information in this message that do not relate to the official business of my firm shall be understood as neither given nor endorsed by it. ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha
Re: [Veritas-ha] LLT errors - delayed and lost hb ticks
Title: LLT errors - delayed and lost hb ticks you have two LLT streams sharing common infrastructure/switch/VLAN. Each LLT link must be completely independent and neither stream should see packets from the other. From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Kawaley WinstonSent: Friday, September 15, 2006 11:18 AMTo: veritas-ha@mailman.eng.auburn.eduSubject: [Veritas-ha] LLT errors - delayed and lost hb ticks Hi all, We are running VCS 4.1 on a two Solaris 9 systems and have configured a local cluster for our Configuration Management software called Clearcase. Recently I have been receiving a lot of the following LLT latency errors: Sep 14 17:24:18 ncfbvcs01 llt: [ID 794702 kern.notice] LLT INFO V-14-1-10019 delayed hb 18561 ticks from 1 link 0 (bge1)Sep 14 17:24:18 ncfbvcs01 llt: [ID 602713 kern.notice] LLT INFO V-14-1-10023 lost 373 hb seq 30608288 from 1 link 0 (bge1)Sep 14 17:24:18 ncfbvcs01 llt: [ID 794702 kern.notice] LLT INFO V-14-1-10019 delayed hb 18561 ticks from 1 link 1 (bge2)Sep 14 17:24:18 ncfbvcs01 llt: [ID 602713 kern.notice] LLT INFO V-14-1-10023 lost 373 hb seq 30608288 from 1 link 1 (bge2)Sep 14 17:24:18 ncfbvcs01 llt: [ID 602713 kern.notice] LLT INFO V-14-1-10023 lost -4 hb seq 30608285 from 1 link 1 (bge2)Sep 14 17:24:18 ncfbvcs01 llt: [ID 602713 kern.notice] LLT INFO V-14-1-10023 lost -4 hb seq 30608285 from 1 link 0 (bge1)Sep 14 17:24:48 ncfbvcs01 llt: [ID 794702 kern.notice] LLT INFO V-14-1-10019 delayed hb 2955 ticks from 1 link 1 (bge2)Sep 14 17:24:48 ncfbvcs01 llt: [ID 602713 kern.notice] LLT INFO V-14-1-10023 lost 62 hb seq 30608348 from 1 link 1 (bge2)Sep 14 17:24:48 ncfbvcs01 llt: [ID 794702 kern.notice] LLT INFO V-14-1-10019 delayed hb 2955 ticks from 1 link 0 (bge1)Sep 14 17:24:48 ncfbvcs01 llt: [ID 602713 kern.notice] LLT INFO V-14-1-10023 lost 62 hb seq 30608348 from 1 link 0 (bge1)Sep 14 17:24:48 ncfbvcs01 llt: [ID 602713 kern.notice] LLT INFO V-14-1-10023 lost -4 hb seq 30608345 from 1 link 0 (bge1)Sep 14 17:24:48 ncfbvcs01 llt: [ID 602713 kern.notice] LLT INFO V-14-1-10023 lost -4 hb seq 30608345 from 1 link 1 (bge2) Does anyone know what exactly is causing these delayed and lost ticks and how they can be corrected? Thanks, Winston Kawaley ___ Veritas-ha maillist - Veritas-ha@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-ha