Hi,

I've installed hbase on the following configuration :

12 x (rest hbase + regionserver hbase + datanode hadoop)
2 x (zookeeper + hbase master)
1 x (zookeeper + hbase master + namenode hadoop)

OS used is ubuntu lucid (10.04)

The issue is that when I try to load data using rest api, some hosts become unreachable even if I can ping them. I can no longer connect to them and even monitoring tools can not work during a laps of time. For example, I use SAR on each host and you can see that between 7:10 and 7:35 pm the host does not write any information :

06:45:01 PM all 0.18 0.00 0.37 3.61 0.25 95.58 06:45:01 PM 0 0.24 0.00 0.54 6.62 0.35 92.25 06:45:01 PM 1 0.12 0.00 0.20 0.61 0.15 98.92 06:50:02 PM all 5.69 0.00 1.79 4.23 1.94 86.36 06:50:02 PM 0 5.68 0.00 3.00 7.91 2.21 81.21 06:50:02 PM 1 5.70 0.00 0.59 0.55 1.66 91.51 06:55:01 PM all 0.68 0.00 0.14 1.62 0.23 97.33 06:55:01 PM 0 0.87 0.00 0.20 3.19 0.31 95.44 06:55:01 PM 1 0.49 0.00 0.08 0.05 0.15 99.22 06:58:36 PM all 0.03 0.00 0.02 0.45 0.07 99.43 06:58:36 PM 0 0.01 0.00 0.02 0.40 0.13 99.43 06:58:36 PM 1 0.04 0.00 0.01 0.51 0.00 99.43 07:05:01 PM all 0.03 0.00 0.00 0.10 0.07 99.80 07:05:01 PM 0 0.02 0.00 0.00 0.10 0.10 99.78 07:05:01 PM 1 0.04 0.00 0.01 0.09 0.03 99.83 <--- last measure before host becomes reachable 07:40:07 PM all 14.72 0.00 17.93 0.02 13.31 54.02 <--- new measure after host becomes reachable 07:40:07 PM 0 29.43 0.00 35.87 0.00 26.57 8.13 07:40:07 PM 1 0.00 0.00 0.00 0.04 0.04 99.91 07:45:01 PM all 0.55 0.00 0.25 0.04 0.27 98.89 07:45:01 PM 0 0.54 0.00 0.14 0.05 0.21 99.07 07:45:01 PM 1 0.55 0.00 0.36 0.04 0.33 98.72 07:50:01 PM all 0.11 0.00 0.05 0.18 0.06 99.60 07:50:01 PM 0 0.12 0.00 0.06 0.13 0.09 99.60 07:50:01 PM 1 0.11 0.00 0.04 0.23 0.04 99.59 07:55:01 PM all 0.00 0.00 0.01 0.05 0.07 99.88 07:55:01 PM 0 0.00 0.00 0.01 0.01 0.13 99.84 07:55:01 PM 1 0.00 0.00 0.00 0.08 0.00 99.91 08:05:01 PM all 0.01 0.00 0.00 0.00 0.05 99.94 08:05:01 PM 0 0.00 0.00 0.00 0.00 0.08 99.91 08:05:01 PM 1 0.03 0.00 0.00 0.00 0.01 99.96

I suppose it's caused by a high load but I don't have any proof :( Is there a known bug about that ? I had a similar issue with Cassandra that forced me to upgrade to linux kernel > 3.0

thanks.

--
Cyril SCETBON

Reply via email to