is your max_versions set to 1 ? keep_deleted_cells? On Tue, Jan 29, 2019 at 10:41 AM talluri abhishek <abhishektall...@gmail.com> wrote:
> Hi All, > > We are seeing a couple of issues on some of our Phoenix tables where the > size of the tables keep growing 2-3 times after around 2-3 days of > ingestion and the read performance takes a big hit after that. Now, if we > insert overwrite the data in that table to a new copy table, the data size > comes back to normal size and the queries perform fast on that copy table. > > Initial table size after 1st day ~ 5G > After 2 days of ingestion ~ 15G > Re-write into a copy table ~ 5-6 G > > Query performance becomes proportional to the size of the table, lets say > the query took 40 secs to run on the original table after first day, it > takes around 130-160 secs after 2 days of ingestion. The same query when > run on the copy table finishes in around ~40secs. > > Most of the ingested data after the first day are mostly updates happening > on the existing rows, so we thought major compaction should solve the size > issue but it does not shrink the size every time (load happens in > parallel when the compaction is run). > Write performance is always good and we have used salt buckets to even out > the writes. The primary key is a 12-bit string which is made by the > concatenation of some account id and an auto-generated transaction number. > > One query that has a toll on its performance as mentioned above is: > *select (list of 50-70 columns) from original_table where account_id IN > (list of 100k account ids) *[account_id in this query is the primary key > on that table] > > We are currently increasing the heap space on these region servers to > provide more memstore size, which could reduce the number of flushes for > the upserted data. > > Could there be any other reason for the increase in the size of the table > apart from the updated rows? How could we better the performance of those > read queries? > > Thanks, > Abhishek >