Hi, 1. Not sure if you've seen HBaseWD (https://github.com/sematext/HBaseWD) project. It implements the "salt keys with prefix" approach when writing monotonically increasing row key/timeseries data. If simplified, the idea is to add random prefix to the row key so that writes end up on different region servers (avoiding single RS hotspot).
2. When writing data to HBase with salted or random keys (so that load is well distributed over cluster) the write speed per RS is limited by the slowest RS in cluster (singe one Region is served by one RS). Given 1 & 2 I got this crazy idea to: * write in multiple threads * each prefix (or interval of keys in case of completely random keys) is assigned to particular thread, so that records with this prefix always written by that thread * measure how well each thread performs (e.g. write speed) * based on each thread performance, salt (or randomize) keys in a biased way, so that threads which perform better got more records to write Thus we will be loading less those RSs that are "slower" and overall load will be more or less balanced which will give max write performance for the cluster. This might work if each thread is writing into relatively small number of all RSs though only, I think. Otherwise they will perform more or less the same. Am I completely crazy when thinking about this? Does it makes sense to you at all? Alex Baranau ------ Sematext :: http://blog.sematext.com/
