Hi Jim,

Thanks for your reply,

Jim Dunham wrote:
> Rick,
>
>> I followed Jim Dunham's AVS & ZFS seamless guide on OpenSolaris 2008.11,
>> and I'm running into a problem.  Actually, I ran into a few problems,
>> but this is where I'm really stuck :)
>>
>> Both nodes /var/adm/ds.log show the same errors for each disk:
>> Jan 19 15:37:08 librdc: SNDR: Could not open file
>> sysvoltwo:/dev/rdsk/c4d0s0 on remote node
>> Jan 19 15:37:09 sndr: SNDR: Could not open file
>> sysvoltwo:/dev/rdsk/c5d0s0 on remote node
>
> SNDR is a client / server replication model, and thus all of AVS must 
> be running on both nodes involved in replication. This can be verified 
> by running "dscfgadm -i", and assuring there are no errors. If there 
> are errors, "dscfgsdm -d" (disable), following be "dscfgadm -e" 
> (enable), should resolve all errors. Check "dscfgadm -i", one more time.
My dscfgadm -i appears to be good.   I didn't post both nodes log 
outputs because I didn't want this to get too big, but here are both 
dscfgadm -i outputs.

sysvolone:~# dscfgadm -i
SERVICE         STATE           ENABLED
nws_scm         online          true
nws_sv          online          true
nws_ii          online          true
nws_rdc         online          true
nws_rdcsyncd    online          true

Availability Suite Configuration:
Local configuration database: valid

sysvoltwo:~# dscfgadm -i
SERVICE         STATE           ENABLED
nws_scm         online          true
nws_sv          online          true
nws_ii          online          true
nws_rdc         online          true
nws_rdcsyncd    online          true

Availability Suite Configuration:
Local configuration database: valid

>
>> I ran rpcinfo -p on each node and they're identical:
>
> From rpcinfo(1M), the following command syntax is covered in the AVS 
> troubleshooting guide (819-6151-10)
>
>     # rpcinfo -T tcp node1 100143
>
>     rpcinfo -T transport host prognum [versnum]
>
> SNDR's program number is 100143
Yes, I did do this, and version 4 'failed'  on both nodes (old 
documentation I assumed):
# rpcinfo -T tcp sysvolone 100143 4
rpcinfo: RPC: Program/version mismatch; low version = 5, high version = 7
program 100143 version 4 is not available

But, as shown by the mismatch error, version 7 does work - on both nodes:
# rpcinfo -T tcp sysvolone 100143 7
program 100143 version 7 ready and waiting

>
>> rpcinfo -p sysvoltwo
>>   program vers proto   port  service
>>    100000    4   tcp    111  rpcbind
>>    100000    3   tcp    111  rpcbind
>>    100000    2   tcp    111  rpcbind
>>    100000    4   udp    111  rpcbind
>>    100000    3   udp    111  rpcbind
>>    100000    2   udp    111  rpcbind
>>    100229    1   tcp  62457  metad
>>    100229    2   tcp  62457  metad
>>    100143    5   tcp    121
>>    100143    6   tcp    121
>>    100143    7   tcp    121
>>
>> Originally, I couldn't connect with rpcinfo at all and then I was
>> missing port 121 on one node - but I've fixed those services and I
>> turned off the 'local only' setting for the rpc/bind service.
>
> I am concerned about the above statement. There is never a need for a 
> system admin to use rpcinfo on behalf of AVS (SNDR). I am therefore 
> concerned have made incompatible changes.
There were two parts to this - and I probably did it backwards because 
my dscfginfo was buggy (had to fix line 1020).  I didn't fix it until 
afterwards.  First I tried to connect to port 121 via telnet on each 
node, but one node didn't respond.  I noticed the nws_rdc and 
nws_rdcsyncd services weren't running (as I tried to figure out what 
service bound to what port).  So I manually added those with 'svcadm 
enable'.  I was then able to connect to port 121, but it still wasn't 
working.  I came across a thread that mentiond using rpcinfo -p to check 
the services, but they wouldn't respond, which led me to the 'local' 
setting for the rpc/bind service.  That's all I changed for that.  local 
to public. 
I would think if I did anything wrong it would be the manual service enable.

>> So this is where I'm stuck.  I'm a Solaris newbie, and I'm finding it a
>> little difficult because things like the AVS Troubleshooting guide just
>> give commands to run - but I don't know what output I'm looking for.
>
> The encapsulation of AVS startup and shutdown into 'dscfgadm', is an 
> improvement over prior versions. If 'dscfgadm -i' does not come back 
> without errors, one can run 'dscfgadm -i -x', to get a look inside the 
> script as to what operations are failing.
The script that comes with OpenSolaris is busted, and I didn't stumble 
across a fix until I already had everything else fixed. :(
But now it definitely looks clean - right?
>
>> The above output looks fine to me, but am I missing something else?
>
> There are two places, one either the SNDR primary or SNDR secondary 
> node where error messages are logged on behalf of AVS. They are 
> /var/adm/messages, and  /var/svc/log/*nws_*
Ahh This is what I was looking for, I just didn't know where.  I'm not 
sure if it helps though, the nws-scm service is the only one with 
anything odd:
[ Jan 19 15:27:54 Executing start method ("/lib/svc/method/svc-scm 
start"). ]
scmadm: cache enable failed
SDBC: Cache enable failed.

All the other services appear normal, and identical:
[ Jan 19 15:27:44 Enabled. ]
[ Jan 19 15:27:55 Executing start method ("/lib/svc/method/svc-sv start"). ]
[ Jan 19 15:27:56 Method "start" exited with status 0. ]

Thanks,

Rick



_______________________________________________
storage-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/storage-discuss

Reply via email to