2007/10/15, Jim Dunham : msl wrote:
> I'm here to know how can we fix the problem with the AVS > configuration lost. I did some posts about that error here, and i > got some reply from Jim Dunham about the "/var/adm/messages", but i > could not find anything relevant. So, i'm using AVS with solaris 10 > u3, and i really want to know if i have to install a opensolaris > distro to help you guys with the solution. The AVS code is the one > from opensolaris.org, i think other people using AVS "should" be > getting that error too (even if using with opensolaris). To be very clear, you are the "only" one that is loosing there AVS configuration on S10u3. There are no customers that have purchased AVS 4.0 running it on Solaris 10 Update 3, or OpenSolaris consumers running Nevada seeing this issue. Therefore it is highly like that something in your environment is causing this problems. The AVS configuration is fairly simple. In a single node Solaris environment, the "file" /etc/dscfg_local contains the "AVS" configuration. There are three ways to loose the AVS configuration under this scenario: 1). Deleting the file "/etc/dscfg_local" 2). Corrupting the contents of "/etc/dscfg_local" 3). Some software issuing "dscfg -i " Because AVS supports a two-phase commit protocol, within itself, it can't corruption its own dscfg database. Ok, i'm replicating discs between two cluster nodes, but that (i think), is irrelevant to AVS. But the AVS is detecting that it is in a cluster environment, and is requesting a cluster configuration. I think i could just use the AVS like a two separate nodes. ..Sorry, but i don't "see" where the cluster environment is relevant to AVS in my case. Answering your questions: 1) The file "/etc/dscfg_local" is not deleted, and the command "dscfg -l" works. 2) See above. i have saved the output of the command "dscfg -l" before reboot, and after to make a diff... the output are the same. Besides, i think the AVS software would know about corruptions in that file. right? 3) That i can't answer... In a multi node Sun Cluster environment there is the same scenarios as above, plus the Sun Cluster part of the AVS configuration. There are seven addition ways to loose the AVS configuration under this scenario: 4). Deleting the file /etc/dscfg_cluster" That file is always there, always "without the *new line* at the end" :) 5). Changing the contents of /etc/dscfg_cluster Same as above.. 6). Corrupting the DID partition pointed to by the contents of /etc/ dscfg_cluster I was using a partition (s1), on a disc that was being used (s0), on a ZFS pool. Now, i'm using the same partition (s1), and the other slice (s0) is not used for anything else. And the problem is always there... 7). Some software issuing "dscfg -C -i" 8). The DID partition is /dev/did/dsk/ds, not /dev/did/rdsk/ ds The partition is "/dev/did/rdsk/d2s1". I did try to configure another way to see if i could... but the answer is no! The AVS software complains about "dsk"... 9). The DID partition is not the same DID device on all nodes in the Sun Cluster # /usr/cluster/bin/scdidadm -L 2 node1:/dev/rdsk/c0t5006048449AF62A7d34 /dev/did/rdsk/d2 2 node2:/dev/rdsk/c0t5006048449AF62A7d34 /dev/did/rdsk/d2 10). Within all nodes of the Sun Cluster, there needs to be a process called "dscfglockd" running #ps -eoargs | grep dscfg /usr/lib/dscfglockd -f /etc/dscfg_lockdb In both nodes... It is my opinion that failure "6" is the situation you are seeing. It is likely caused by 8, 9 or 10. If you say so, i believe. :) but we need a way to see it... after a reboot, that is the situation: COMMAND: dscfg -l # Consolidated Dataservice Configuration # Do not edit out whitespace or dashes # File created on: Tue Oct 16 16:02:52 2007 # Availability Suite - dscfg configuration database # # Storage Cache Manager - scmadm # threads csiz wrtcache filpat reserved1 niobuf ntdaemon fwrthru nofwrthru scm: 128 64 - - - - - - - # # Cache Hints - scmadm # device wrthru nordcache # # Point-in-Time Copy - iiadm # master shadow bitmap mode(D|I) [overflow] [device-group] [options] [group] # # Remote Mirror (internal) SetId # setid [device-group] setid: 4 setid-ctag # # Remote Mirror - sndradm # p_host p_dev p_bmp s_host s_dev s_bmp protocol(ip/fcal_device) mode \ # [group] [device-group] [options] [diskq] sndr: node1 /dev/rdsk/c2d0s0 /dev/rdsk/c2d0s1 node2 /dev/rdsk/c2d0s0 /dev/rdsk/c2d0s1 ip sync B2007 - setid=3; - sndr: node1 /dev/rdsk/c3d0s0 /dev/rdsk/c3d0s1 node2 /dev/rdsk/c3d0s0 /dev/rdsk/c3d0s1 ip sync B2007 - setid=4; - # # Remote Mirror - Point-in-Time mapping # SNDR-secondary II-shadow II-bitmap state [device-group] # # Bitmap filesystem to mount before other filesystems # pathname_or_special_device [resource-group] # # Storage volumes - svadm # pathname [mode] [device-group] sv: /dev/rdsk/c2d0s0 - - sv: /dev/rdsk/c2d0s1 - - sv: /dev/rdsk/c3d0s0 - - sv: /dev/rdsk/c3d0s1 - - # # Ncall Core # nodeid [device-group] # # DsVol - volume usage # volume [device-group] users dsvol: /dev/rdsk/c2d0s0 - sndr dsvol: /dev/rdsk/c2d0s1 - sndr dsvol: /dev/rdsk/c3d0s0 - sndr dsvol: /dev/rdsk/c3d0s1 - sndr COMMAND: svadm /dev/rdsk/c2d0s0 /dev/rdsk/c2d0s1 /dev/rdsk/c3d0s0 /dev/rdsk/c3d0s1 COMMAND: dsstat COMMAND: sndradm -C local -P Because AVS supports a two-phase commit protocol, with the assistance of "dscfglockd", within itself, it can't corruption its own dscfg database. I still believe in you. :) Jim > So, how can we find a fix to this? after a reboot, the sndr > information is lost, and i lost all the replication information... > that's a really "bad" behavior. I did some dtrace "scripts" to try > to find the error, but Mr. Dunham said that is "digging to deep". > Any ideas? > Thanks for your time! > > ps.: > The dtrace post is here: > http://www.posix.brte.com.br/blog/?p=79 > -- > > This message posted from opensolaris.org > > _______________________________________________ > ha-clusters-discuss mailing list > ha-clusters-discuss at opensolaris dot org > http://mail.opensolaris.org/mailman/listinfo/ha-clusters-discuss Jim Dunham Solaris, Storage Software Group Sun Microsystems, Inc. 1617 Southwood Drive Nashua, NH 03063 Email: James dot Dunham at Sun dot COM http://blogs.sun.com/avs Thanks very much!!! Leal -- pOSix rules This message posted from opensolaris.org _______________________________________________ storage-discuss mailing list [email protected] http://mail.opensolaris.org/mailman/listinfo/storage-discuss
