Hello, Every now and then we need to flatten our cluster and re-import all data from log files (changes in data format, etc.) Afterwards we notice a significant increase in scan performance. As data is added and shuffled around between region servers, performance goes down again over time (say a couple of weeks). Are there any routine operations that one should run manually, or settings to activate in the HBase configuration to keep the data well distributed? We use HBase 0.92 as part of a Cloudera4 cluster.
Thank you, /David