Hi,
I am having difficulties to get multicast communication running on the
heartbeat (http://www.linux-ha.org) port. When I configure it for multicast
and startup the cluster node, I see the following in /var/log/messages:
Oct 25 09:02:51 defiant heartbeat: [7910]: ERROR: glib: Unable to send mcast
packet [-1]: Host is down
Oct 25 09:02:51 defiant heartbeat: [7910]: ERROR: write failure on mcast
fxp0.: Host is down
Oct 25 09:02:51 defiant heartbeat: [7910]: ERROR: glib: Unable to send mcast
packet [-1]: Host is down
Oct 25 09:02:51 defiant heartbeat: [7910]: ERROR: write failure on mcast
fxp0.: Host is down
Oct 25 09:02:51 defiant heartbeat: [7910]: ERROR: glib: Unable to send mcast
packet [-1]: Host is down
Oct 25 09:02:51 defiant heartbeat: [7910]: ERROR: write failure on mcast
fxp0.: Host is down
Oct 25 09:02:53 defiant heartbeat: [7910]: ERROR: glib: Unable to send mcast
packet [-1]: Host is down
Oct 25 09:02:53 defiant heartbeat: [7910]: ERROR: write failure on mcast
fxp0.: Host is down
and tcpdump sees this:
Oct 25 09:05:08.038580 00:0e:7b:fc:c0:a0 01:00:5e:00:00:01 0800 42: 10.0.0.5
> 239.0.0.1: igmp nreport 239.0.0.1 [ttl 1]
Oct 25 09:05:12.063762 00:0e:7b:fc:c0:a0 01:00:5e:00:00:01 0800 42: 10.0.0.5
> 239.0.0.1: igmp nreport 239.0.0.1 [ttl 1]
that's all on multicast communication. Above was on a i386 machine, on
another i386 machine the same happens. one with a rl0, one with a fxp0 card.
Then I started a second node on a sparc64, tcpdump sees this:
# tcpdump -n -i hme0 multicast
tcpdump: listening on hme0, link-type EN10MB
10:40:09.218991 10.0.0.24 > 239.0.0.1: igmp nreport 239.0.0.1 [ttl 1]
Bus error
Nevertheless, despite of the some outgoing multicast packets, the cluster
nodes do not see each other.
I found this part of the heartbeat code where the error message comes from:
mcast_write(struct hb_media* hbm, void *pkt, int len)
{
struct mcast_private * mcp;
int rc;
MCASTASSERT(hbm);
mcp = (struct mcast_private *) hbm->pd;
if ((rc=sendto(mcp->wsocket, pkt, len, 0
, (struct sockaddr *)&mcp->addr
, sizeof(struct sockaddr))) != len) {
PILCallLog(LOG, PIL_CRIT, "Unable to send mcast packet
[%d]: %s"
, rc, strerror(errno));
return(HA_FAIL);
}
does anybody has an idea what the problem here could be? The same compiled
on Linux works well. Maybe anyone else porting a multicast based application
had to fiddle around with similar problems?
any idea is greatly appreciated
Sebastian