And another difference we have is that, during the network upgrade, we made
each cluster node has two network cards. One for 192.168.11.* and another
for 10.0.2.*, and we found for some of the machines, the ip_forward is
turned off( 5 in 37).


I knew almost nothing about the network, so it might be a stupid question.

I am interested in if we didn't have ip_forward turned on, will it also
impact the hbase communication?


Thanks.


On Sat, May 14, 2011 at 7:39 PM, Stanley Xu <[email protected]> wrote:

> Dear all,
>
> We have met problem with hbase these days after a network update.
> Basically, the behavior is that after 3-4 hours of the cluster startup. Some
> of the RegionServer try to find the data from a deleted block.
>
> And if we restarted the cluster, the problem just went away, and the data
> is not missing.
>
> The detail description of the problem could be found at
>
> http://search-hadoop.com/m/ZpgJ623GoyU1/.META.+inconsistency&subj=The+META+data+inconsistency+issue
>
> I just found some doubt issues in the network configuration of our cluster.
> I found some of the cluster node has different broadcast address and Mask
> comparing to other nodes, for example, as the following, the hadoopsh11092
> use Bcast for 10.255.255.255 and Mask 255.0.0.0, and hadoopsh11103 use Bcast
> for 10.0.2.255 and Mask 255.255.255.0
>
> hadoopsh11092
> eth0      Link encap:Ethernet  HWaddr 00:A0:D1:EE:C1:7C
>           inet addr:10.0.2.19  Bcast:10.255.255.255  Mask:255.0.0.0
>           inet6 addr: fe80::2a0:d1ff:feee:c17c/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:1864321949 errors:0 dropped:1465 overruns:0 frame:0
>           TX packets:1867202791 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:1811900116811 (1.6 TiB)  TX bytes:1879509303203 (1.7
> TiB)
>           Memory:face0000-fad00000
>
>
> hadoopsh11103
> eth0      Link encap:Ethernet  HWaddr 00:A0:D1:EE:AE:C4
>           inet addr:10.0.2.30  Bcast:10.0.2.255  Mask:255.255.255.0
>           inet6 addr: fe80::2a0:d1ff:feee:aec4/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:1726779928 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:1716762766 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:1804202744690 (1.6 TiB)  TX bytes:1824085255121 (1.6
> TiB)
>           Memory:face0000-fad00000
>
> But with these settings, we could have the cluster startup successfully and
> the cluster works pretty fine after startup, the problem comes after 3-4
> hours. And I could connect to different machine by SSH with their hosts name
> correctly.
>
> I knew that Zookeeper has some kind of broadcast during communication. I am
> wondering if our settings should work, or it should be the root cause of our
> problem?
>
> Thanks in advance.
>
> Best wishes,
> Stanley Xu
>
>

Reply via email to