Hi All, We are seeing a couple of issues on some of our Phoenix tables where the size of the tables keep growing 2-3 times after around 2-3 days of ingestion and the read performance takes a big hit after that. Now, if we insert overwrite the data in that table to a new copy table, the data size comes back to normal size and the queries perform fast on that copy table.
Initial table size after 1st day ~ 5G After 2 days of ingestion ~ 15G Re-write into a copy table ~ 5-6 G Query performance becomes proportional to the size of the table, lets say the query took 40 secs to run on the original table after first day, it takes around 130-160 secs after 2 days of ingestion. The same query when run on the copy table finishes in around ~40secs. Most of the ingested data after the first day are mostly updates happening on the existing rows, so we thought major compaction should solve the size issue but it does not shrink the size every time (load happens in parallel when the compaction is run). Write performance is always good and we have used salt buckets to even out the writes. The primary key is a 12-bit string which is made by the concatenation of some account id and an auto-generated transaction number. One query that has a toll on its performance as mentioned above is: *select (list of 50-70 columns) from original_table where account_id IN (list of 100k account ids) *[account_id in this query is the primary key on that table] We are currently increasing the heap space on these region servers to provide more memstore size, which could reduce the number of flushes for the upserted data. Could there be any other reason for the increase in the size of the table apart from the updated rows? How could we better the performance of those read queries? Thanks, Abhishek