Hi everybody, 

We are running 6 data nodes (plus one master node - version HBase 
1.0.0-cdh5.6.0) in each case on a productive and a test environment. Each month 
we export the deltas of the previous month from the productive system (using 
org.apache.hadoop.hbase.mapreduce.Export) and import them into the test system. 
From time to time we are using RowCounter and an analytics map-reduce job 
written by our own to check if the restore is fine. 

Now we see that the Export/Import is broken since April 2019. After lots of 
investigations and tests we found that the bug described in 
https://github.com/hortonworks-spark/shc/issues/174 
<https://github.com/hortonworks-spark/shc/issues/174> causes the problems. 

After increasing the timeouts (client and roc timeout) from 1 minute to 10 
minutes the row counts in the test system seem to be in a good shape (we 
counted the rows for one month via RowCounter and scan on the hbase shell).

Now we are about to implement the changes in the productive system.

But the question remains what causes the long timeouts. Some of the tests we 
did revealed ScannerTimeouts after 60 seconds (the default setting). But 60 
seconds - for an android, that is nearly an eternity. Thus we assume that there 
is something wrong, but how can we find out.
The hbase locality factor is 1.0 or close to 1.0 for most of the regions. 

My questions are: Is it possible that „silent timeouts“ can cause incomplete 
exports? 
Is it usual that scans take longer than 1 minute - even if it seems that up to 
April the exports were all ok?
How can one identify regions which are in trouble?

Thank you and best regards
Udo

Reply via email to