I forgot to say that we're using Amazon EC2 instances. Maybe an issue is
known ?
On 5/29/12 5:17 PM, Cyril Scetbon wrote:
Hi,
I've installed hbase on the following configuration :
12 x (rest hbase + regionserver hbase + datanode hadoop)
2 x (zookeeper + hbase master)
1 x (zookeeper + hbase master + namenode hadoop)
OS used is ubuntu lucid (10.04)
The issue is that when I try to load data using rest api, some hosts
become unreachable even if I can ping them. I can no longer connect to
them and even monitoring tools can not work during a laps of time. For
example, I use SAR on each host and you can see that between 7:10 and
7:35 pm the host does not write any information :
06:45:01 PM all 0.18 0.00 0.37 3.61 0.25
95.58
06:45:01 PM 0 0.24 0.00 0.54 6.62 0.35
92.25
06:45:01 PM 1 0.12 0.00 0.20 0.61 0.15
98.92
06:50:02 PM all 5.69 0.00 1.79 4.23 1.94
86.36
06:50:02 PM 0 5.68 0.00 3.00 7.91 2.21
81.21
06:50:02 PM 1 5.70 0.00 0.59 0.55 1.66
91.51
06:55:01 PM all 0.68 0.00 0.14 1.62 0.23
97.33
06:55:01 PM 0 0.87 0.00 0.20 3.19 0.31
95.44
06:55:01 PM 1 0.49 0.00 0.08 0.05 0.15
99.22
06:58:36 PM all 0.03 0.00 0.02 0.45 0.07
99.43
06:58:36 PM 0 0.01 0.00 0.02 0.40 0.13
99.43
06:58:36 PM 1 0.04 0.00 0.01 0.51 0.00
99.43
07:05:01 PM all 0.03 0.00 0.00 0.10 0.07
99.80
07:05:01 PM 0 0.02 0.00 0.00 0.10 0.10
99.78
07:05:01 PM 1 0.04 0.00 0.01 0.09 0.03
99.83 <--- last measure before host becomes reachable
07:40:07 PM all 14.72 0.00 17.93 0.02 13.31
54.02 <--- new measure after host becomes reachable
07:40:07 PM 0 29.43 0.00 35.87 0.00 26.57
8.13
07:40:07 PM 1 0.00 0.00 0.00 0.04 0.04
99.91
07:45:01 PM all 0.55 0.00 0.25 0.04 0.27
98.89
07:45:01 PM 0 0.54 0.00 0.14 0.05 0.21
99.07
07:45:01 PM 1 0.55 0.00 0.36 0.04 0.33
98.72
07:50:01 PM all 0.11 0.00 0.05 0.18 0.06
99.60
07:50:01 PM 0 0.12 0.00 0.06 0.13 0.09
99.60
07:50:01 PM 1 0.11 0.00 0.04 0.23 0.04
99.59
07:55:01 PM all 0.00 0.00 0.01 0.05 0.07
99.88
07:55:01 PM 0 0.00 0.00 0.01 0.01 0.13
99.84
07:55:01 PM 1 0.00 0.00 0.00 0.08 0.00
99.91
08:05:01 PM all 0.01 0.00 0.00 0.00 0.05
99.94
08:05:01 PM 0 0.00 0.00 0.00 0.00 0.08
99.91
08:05:01 PM 1 0.03 0.00 0.00 0.00 0.01
99.96
I suppose it's caused by a high load but I don't have any proof :( Is
there a known bug about that ? I had a similar issue with Cassandra
that forced me to upgrade to linux kernel > 3.0
thanks.
--
Cyril SCETBON