Hi group, I'm using HBase to store large amount of time series data, the usage case is heavy on writes then reads. My application stops at writing 600k requests per second and I can't tune up for better tps.
Hardware: I have 6 Region Servers, each has 128G memory, 12 HDDs, 2cores with 24threads, Schema: The schema for these time series data is similar as OpenTSDB that the data points of a same metric within an hour are store in one row, and there could be maximum 3600 columns per row. The cell is about 70bytes on its size, including the rowkey, column qualifier, column family and value. HBase config: CDH 5.6 HBase 1.0.0 100G memory for each RegionServer hbase.hstore.compactionThreshold = 50 hbase.hstore.blockingStoreFiles = 100 hbase.hregion.majorcompaction disable hbase.client.write.buffer = 20MB hbase.regionserver.handler.count = 100 hbase.hregion.memstore.flush.size = 128MB HBase Client: write in BufferedMutator with 100000/batch Inputs Volumes: The input data throughput is more than 2millions/sec from Kafka My writer applications are distributed, how ever I scaled them up, the total write throughput won't get larger than 600K/sec. The severs have 20% CPU usage and 5.6 wa, GC doesn't look good though, it shows a lot 10s+. In my opinion, 1M/s input data will result in only 70MByte/s write throughput to the cluster, which is quite a small amount compare to the 6 region servers. The performance should not be bad like this. Is anybody has idea why the performance stops at 600K/s? Is there anything I have to tune to increase the HBase write throughput?
