Hi All,

We are seeing a couple of issues on some of our Phoenix tables where the
size of the tables keep growing 2-3 times after around 2-3 days of
ingestion and the read performance takes a big hit after that. Now, if we
insert overwrite the data in that table to a new copy table, the data size
comes back to normal size and the queries perform fast on that copy table.

Initial table size after 1st day ~ 5G
After 2 days of ingestion ~ 15G
Re-write into a copy table ~ 5-6 G

Query performance becomes proportional to the size of the table, lets say
the query took 40 secs to run on the original table after first day, it
takes around 130-160 secs after 2 days of ingestion. The same query when
run on the copy table finishes in around ~40secs.

Most of the ingested data after the first day are mostly updates happening
on the existing rows, so we thought major compaction should solve the size
issue but it does not shrink the size every time (load happens in
parallel when the compaction is run).
Write performance is always good and we have used salt buckets to even out
the writes. The primary key is a 12-bit string which is made by the
concatenation of some account id and an auto-generated transaction number.

One query that has a toll on its performance as mentioned above is:
*select (list of 50-70 columns) from original_table where account_id IN
(list of 100k account ids) *[account_id in this query is the primary key on
that table]

We are currently increasing the heap space on these region servers to
provide more memstore size, which could reduce the number of flushes for
the upserted data.

Could there be any other reason for the increase in the size of the table
apart from the updated rows? How could we better the performance of those
read queries?

Thanks,
Abhishek

Reply via email to