Also check whats the value for hbase.master.hfilecleaner.ttl? On Wed, 30 Jan 2019 at 7:31 AM, sudhir patil <spatil.sud...@gmail.com> wrote:
> Is the replication enabled and setup properly? > Are you creating snapshots or is backup enabled (hbase.backup.enable) ? > > > Check under ../hbase folder whats actually taking more space. > > On Wed, 30 Jan 2019 at 6:24 AM, talluri abhishek < > abhishektall...@gmail.com> wrote: > >> Hi Vincent, >> >> Versions is set to1 and keep_deleted_cells is false. It's basically the >> default settings and nothing has been changed. >> >> describe on the hbase table gives below: >> >> VERSIONS => '1', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS >> => 'FALSE' >> >> >> Thanks, >> Abhishek >> >> On Tue, Jan 29, 2019 at 3:20 PM Vincent Poon <vincentp...@apache.org> >> wrote: >> >>> is your max_versions set to 1 ? keep_deleted_cells? >>> >>> On Tue, Jan 29, 2019 at 10:41 AM talluri abhishek < >>> abhishektall...@gmail.com> wrote: >>> >>>> Hi All, >>>> >>>> We are seeing a couple of issues on some of our Phoenix tables where >>>> the size of the tables keep growing 2-3 times after around 2-3 days of >>>> ingestion and the read performance takes a big hit after that. Now, if we >>>> insert overwrite the data in that table to a new copy table, the data size >>>> comes back to normal size and the queries perform fast on that copy table. >>>> >>>> Initial table size after 1st day ~ 5G >>>> After 2 days of ingestion ~ 15G >>>> Re-write into a copy table ~ 5-6 G >>>> >>>> Query performance becomes proportional to the size of the table, lets >>>> say the query took 40 secs to run on the original table after first day, it >>>> takes around 130-160 secs after 2 days of ingestion. The same query when >>>> run on the copy table finishes in around ~40secs. >>>> >>>> Most of the ingested data after the first day are mostly updates >>>> happening on the existing rows, so we thought major compaction should solve >>>> the size issue but it does not shrink the size every time (load happens in >>>> parallel when the compaction is run). >>>> Write performance is always good and we have used salt buckets to even >>>> out the writes. The primary key is a 12-bit string which is made by the >>>> concatenation of some account id and an auto-generated transaction number. >>>> >>>> One query that has a toll on its performance as mentioned above is: >>>> *select (list of 50-70 columns) from original_table where account_id IN >>>> (list of 100k account ids) *[account_id in this query is the primary >>>> key on that table] >>>> >>>> We are currently increasing the heap space on these region servers to >>>> provide more memstore size, which could reduce the number of flushes >>>> for the upserted data. >>>> >>>> Could there be any other reason for the increase in the size of the >>>> table apart from the updated rows? How could we better the performance of >>>> those read queries? >>>> >>>> Thanks, >>>> Abhishek >>>> >>>