Hello, I'm working on a POC to use HBase + Phoenix as a DB layer for a system that consumes several thousand (10,000 to 40,000) messages per second.
Our cluster is fairly small: 4 region servers supporting about a dozen tables. We are currently experimenting with salting - our first pass was 4 regions. The ultimate data size is also pretty small. The data compacts very nicely and after aggregation and de-duplication, it is only on the order of 10's of GB. Querying these tables is reasonably performant right now, but upserting the data is not optimal and I'm looking for some performance tips. As I said, the incoming data is streamed (via storm), at a rate of thousands of messages per second. After some basic benchmarking, it appears that Storm is able to consume the data much more quickly than it can upsert it to phoenix. I understand that Phoenix is fundamentally designed for fast querying, and not necessarily fast writing. But can anyone suggest some Phoenix and/or hbase parameters we should consider tuning to improve performance? Any tips on designing something like this? Also, we have 3 additional indexes, in addition to the primary key. I'm guessing that this creates a significant amount of overhead in terms of writing data. But the indexes are necessary for query performance. Is it possible to force the index maintenance to behave in more of a batch pattern? Maybe only update the index tables every X minutes? Even twice a day? Thanks in advance for any tips!