Hi Winston,
You can fix your problem by using “SAP”.
(Service Access Point), from the LLT configuration. (/etc/llttab)
I attach for you explain on it and when
you need to configure.
Regards,
Philippe
############################################################################################################
Details:
A service access point
(SAP), also known as "ethertype," is a network layer 2 method of
differentiating between various communications protocols. All layer 2
packets have an assigned SAP: there is a SAP for NetBIOS, another for SNA,
another for NetWare, and so on. The default value for an LLT packet is
0xCAFE. Under certain circumstances, you may want to specify a unique SAP
to LLT links to eliminate any confusion in identifying packets at the receiving
node.
The LLT heartbeat communications are broadcast to all nodes, while other
messages, such as VERITAS Cluster Server (tm) state information, is relayed
point to point. Occasionally, if LLT links are sharing a hub (or a VLAN)
you may see messages referring to lost heartbeats on the console or in the /var/adm/messages file (see below). It is
usually due to the fact that both links can "see" each other's
traffic, and on Sun hardware the EEPROM variable local-mac-address? is
"false" (Sun by default uses a variation of the hostID as the MAC
address for *all network interfaces* in the box; you can change this variable
to "true" and reboot to obtain unique MAC addresses for each
interface).
Here is an example of the lost heartbeat messages found in the messages file:
Dec 11 09:39:35 testserver llt: LLT:10023: lost 6 hb seq 5879 from 0 link 0
Dec 11 09:39:35 testserver llt: LLT:10023: lost 6 hb seq 5879 from 0 link 1
Dec 11 09:39:40 testserver llt: LLT:10019: delayed hb 1150 ticks from 1 link 1
Dec 11 09:39:40 testserver llt: LLT:10023: lost 22 hb seq 7138 from 1 link 1
The manpage for llttab states it this way:
"Also, if there is more than one lowpri link configured on a system and
sharing the same network, then each must use a unique SAP or else
LLT will generate messages referring to lost heartbeats on the
console, since links are normally required to be completely
isolated."
Note that this statement is made only for configurations with more than one
lowpri link across a single public network, where separating the links is
impossible (it's the same public network). This is only mentioned in the
manpage and the issue is not specifically addressed in other documentation because
missed heartbeats are most often corrected by examining the physical cabling of
private networks and resolving cable mismatches; specifying the wrong network
port in the llttab as a normal "high priority" or "private"
link, which in fact is connected to the public network, instead of the properly
cabled private network port.
Multiple heartbeat links in a given network broadcast domain (same hub or VLAN
of switch) are supported only when you have at least one additional private
heartbeat in a separate broadcast domain (hub or VLAN) on separate electrical
power to eliminate the single point of failure. An example would be
public networks use VLAN1 and private networks each use their own private VLAN:
where VLAN2 is hosted on a physically separate switch and the power circuit
comes from that of VLAN3. See the sample llttab below.
If you experience the "lost heartbeats" messages described above when
using multiple lowpri links, you may assign unique SAP tags to each link in
llttab:
set-node 0
# omitting the set-cluster directive causes LLT to assume a default ID of 0
set-cluster 0
link-lowpri mylink0 /dev/hme:0 - ether 0xcaf0
-
link-lowpri mylink1 /dev/hme:1 - ether 0xcaf1
-
link qfe:2 /dev/qfe:2 - ether - -
link mylink3 /dev/qfe:3 - ether - -
# exclude the following node IDs
exclude 2-31
start
# lltstat -l
LLT link information:
Link Tag State
Type Pri SAP MTU Addrlen
Xmit
Recv Err
LateHB
Broadcast
0 mylink0 on
ether lowpri 0xCAF0 1500 6 ## note
"lowpri" in output
250705
241197 0
0
FF:FF:FF:FF:FF:FF
1 mylink1 on
ether lowpri 0xCAF1 1500 6
77456
222957 0
0
FF:FF:FF:FF:FF:FF
2 qfe:2
on ether hipri 0xCAFE 1500 6 ## note
"hipri" in output
22846
241197 0
0
FF:FF:FF:FF:FF:FF
3 mylink3 on
ether hipri 0xCAFE 1500 6
34564
241197 0
0
FF:FF:FF:FF:FF:FF
#
With this hypothetical configuration, cable the interfaces to your switches as
follows:
/dev/hme:0 SWITCH-A VLAN1 (public)
/dev/hme:1 SWITCH-B VLAN1 (public)
/dev/qfe:2 SWITCH-A VLAN2 # each qfe:2 interface of each
node all use SWITCH-A
/dev/qfe:3 SWITCH-B VLAN3 # each qfe:3 interface of each
node all use SWITCH-B
It is recommended that the VLANs used for private communications not be trunked
across switches.
############################################################################################################
From:
[EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Jim Senicka
Sent: vendredi 15 septembre 2006
17:43
To: Kawaley Winston;
veritas-ha@mailman.eng.auburn.edu
Subject: Re: [Veritas-ha] LLT
errors - delayed and lost hb ticks
you have two LLT streams sharing common
infrastructure/switch/VLAN.
Each LLT link must be completely
independent and neither stream should see packets from the other.
From:
[EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Kawaley Winston
Sent: Friday, September 15, 2006
11:18 AM
To:
veritas-ha@mailman.eng.auburn.edu
Subject: [Veritas-ha] LLT errors -
delayed and lost hb ticks
Hi
all,
We
are running VCS 4.1 on a two Solaris 9
systems and have configured a local cluster for our Configuration
Management software called Clearcase. Recently I have been receiving a lot
of the
following LLT latency errors:
Sep 14 17:24:18 ncfbvcs01 llt: [ID 794702 kern.notice] LLT INFO
V-14-1-10019 delayed hb 18561 ticks from 1 link 0 (bge1)
Sep 14 17:24:18 ncfbvcs01 llt: [ID 602713 kern.notice] LLT INFO V-14-1-10023
lost 373 hb seq 30608288 from 1 link 0 (bge1)
Sep 14 17:24:18 ncfbvcs01 llt: [ID 794702 kern.notice] LLT INFO V-14-1-10019
delayed hb 18561 ticks from 1 link 1 (bge2)
Sep 14 17:24:18 ncfbvcs01 llt: [ID 602713 kern.notice] LLT INFO V-14-1-10023
lost 373 hb seq 30608288 from 1 link 1 (bge2)
Sep 14 17:24:18 ncfbvcs01 llt: [ID 602713 kern.notice] LLT INFO V-14-1-10023
lost -4 hb seq 30608285 from 1 link 1 (bge2)
Sep 14 17:24:18 ncfbvcs01 llt: [ID 602713 kern.notice] LLT INFO V-14-1-10023
lost -4 hb seq 30608285 from 1 link 0 (bge1)
Sep 14 17:24:48 ncfbvcs01 llt: [ID 794702 kern.notice] LLT INFO V-14-1-10019
delayed hb 2955 ticks from 1 link 1 (bge2)
Sep 14 17:24:48 ncfbvcs01 llt: [ID 602713 kern.notice] LLT INFO V-14-1-10023
lost 62 hb seq 30608348 from 1 link 1 (bge2)
Sep 14 17:24:48 ncfbvcs01 llt: [ID 794702 kern.notice] LLT INFO V-14-1-10019
delayed hb 2955 ticks from 1 link 0 (bge1)
Sep 14 17:24:48 ncfbvcs01 llt: [ID 602713 kern.notice] LLT INFO V-14-1-10023
lost 62 hb seq 30608348 from 1 link 0 (bge1)
Sep 14 17:24:48 ncfbvcs01 llt: [ID 602713 kern.notice] LLT INFO V-14-1-10023
lost -4 hb seq 30608345 from 1 link 0 (bge1)
Sep 14 17:24:48 ncfbvcs01 llt: [ID 602713 kern.notice] LLT INFO V-14-1-10023
lost -4 hb seq 30608345 from 1 link 1 (bge2)
Does
anyone know what exactly is causing these delayed and
lost ticks and how they can be corrected?
Thanks,
Winston Kawaley