Hi,
I've installed hbase on the following configuration :
12 x (rest hbase + regionserver hbase + datanode hadoop)
2 x (zookeeper + hbase master)
1 x (zookeeper + hbase master + namenode hadoop)
OS used is ubuntu lucid (10.04)
The issue is that when I try to load data using rest api, some hosts
become unreachable even if I can ping them. I can no longer connect to
them and even monitoring tools can not work during a laps of time. For
example, I use SAR on each host and you can see that between 7:10 and
7:35 pm the host does not write any information :
06:45:01 PM all 0.18 0.00 0.37 3.61
0.25 95.58
06:45:01 PM 0 0.24 0.00 0.54 6.62
0.35 92.25
06:45:01 PM 1 0.12 0.00 0.20 0.61
0.15 98.92
06:50:02 PM all 5.69 0.00 1.79 4.23
1.94 86.36
06:50:02 PM 0 5.68 0.00 3.00 7.91
2.21 81.21
06:50:02 PM 1 5.70 0.00 0.59 0.55
1.66 91.51
06:55:01 PM all 0.68 0.00 0.14 1.62
0.23 97.33
06:55:01 PM 0 0.87 0.00 0.20 3.19
0.31 95.44
06:55:01 PM 1 0.49 0.00 0.08 0.05
0.15 99.22
06:58:36 PM all 0.03 0.00 0.02 0.45
0.07 99.43
06:58:36 PM 0 0.01 0.00 0.02 0.40
0.13 99.43
06:58:36 PM 1 0.04 0.00 0.01 0.51
0.00 99.43
07:05:01 PM all 0.03 0.00 0.00 0.10
0.07 99.80
07:05:01 PM 0 0.02 0.00 0.00 0.10
0.10 99.78
07:05:01 PM 1 0.04 0.00 0.01 0.09
0.03 99.83 <--- last measure before host becomes reachable
07:40:07 PM all 14.72 0.00 17.93 0.02
13.31 54.02 <--- new measure after host becomes reachable
07:40:07 PM 0 29.43 0.00 35.87 0.00
26.57 8.13
07:40:07 PM 1 0.00 0.00 0.00 0.04
0.04 99.91
07:45:01 PM all 0.55 0.00 0.25 0.04
0.27 98.89
07:45:01 PM 0 0.54 0.00 0.14 0.05
0.21 99.07
07:45:01 PM 1 0.55 0.00 0.36 0.04
0.33 98.72
07:50:01 PM all 0.11 0.00 0.05 0.18
0.06 99.60
07:50:01 PM 0 0.12 0.00 0.06 0.13
0.09 99.60
07:50:01 PM 1 0.11 0.00 0.04 0.23
0.04 99.59
07:55:01 PM all 0.00 0.00 0.01 0.05
0.07 99.88
07:55:01 PM 0 0.00 0.00 0.01 0.01
0.13 99.84
07:55:01 PM 1 0.00 0.00 0.00 0.08
0.00 99.91
08:05:01 PM all 0.01 0.00 0.00 0.00
0.05 99.94
08:05:01 PM 0 0.00 0.00 0.00 0.00
0.08 99.91
08:05:01 PM 1 0.03 0.00 0.00 0.00
0.01 99.96
I suppose it's caused by a high load but I don't have any proof :( Is
there a known bug about that ? I had a similar issue with Cassandra that
forced me to upgrade to linux kernel > 3.0
thanks.
--
Cyril SCETBON