Re: Growth in table size and performance degradation on read-queries

sudhir patil Tue, 29 Jan 2019 15:38:01 -0800

Also check whats the value for hbase.master.hfilecleaner.ttl?

On Wed, 30 Jan 2019 at 7:31 AM, sudhir patil <spatil.sud...@gmail.com>
wrote:


> Is the replication enabled and setup properly?
> Are you creating snapshots or is backup enabled (hbase.backup.enable) ?
>
>
> Check under ../hbase folder whats actually taking more space.
>
> On Wed, 30 Jan 2019 at 6:24 AM, talluri abhishek <
> abhishektall...@gmail.com> wrote:
>
>> Hi Vincent,
>>
>> Versions is set to1 and keep_deleted_cells is false. It's basically the
>> default settings and nothing has been changed.
>>
>> describe on the hbase table gives below:
>>
>> VERSIONS => '1', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS 
>> => 'FALSE'
>>
>>
>> Thanks,
>> Abhishek
>>
>> On Tue, Jan 29, 2019 at 3:20 PM Vincent Poon <vincentp...@apache.org>
>> wrote:
>>
>>> is your max_versions set to 1 ?  keep_deleted_cells?
>>>
>>> On Tue, Jan 29, 2019 at 10:41 AM talluri abhishek <
>>> abhishektall...@gmail.com> wrote:
>>>
>>>> Hi All,
>>>>
>>>> We are seeing a couple of issues on some of our Phoenix tables where
>>>> the size of the tables keep growing 2-3 times after around 2-3 days of
>>>> ingestion and the read performance takes a big hit after that. Now, if we
>>>> insert overwrite the data in that table to a new copy table, the data size
>>>> comes back to normal size and the queries perform fast on that copy table.
>>>>
>>>> Initial table size after 1st day ~ 5G
>>>> After 2 days of ingestion ~ 15G
>>>> Re-write into a copy table ~ 5-6 G
>>>>
>>>> Query performance becomes proportional to the size of the table, lets
>>>> say the query took 40 secs to run on the original table after first day, it
>>>> takes around 130-160 secs after 2 days of ingestion. The same query when
>>>> run on the copy table finishes in around ~40secs.
>>>>
>>>> Most of the ingested data after the first day are mostly updates
>>>> happening on the existing rows, so we thought major compaction should solve
>>>> the size issue but it does not shrink the size every time (load happens in
>>>> parallel when the compaction is run).
>>>> Write performance is always good and we have used salt buckets to even
>>>> out the writes. The primary key is a 12-bit string which is made by the
>>>> concatenation of some account id and an auto-generated transaction number.
>>>>
>>>> One query that has a toll on its performance as mentioned above is:
>>>> *select (list of 50-70 columns) from original_table where account_id IN
>>>> (list of 100k account ids) *[account_id in this query is the primary
>>>> key on that table]
>>>>
>>>> We are currently increasing the heap space on these region servers to
>>>> provide more memstore size, which could reduce the number of flushes
>>>> for the upserted data.
>>>>
>>>> Could there be any other reason for the increase in the size of the
>>>> table apart from the updated rows? How could we better the performance of
>>>> those read queries?
>>>>
>>>> Thanks,
>>>> Abhishek
>>>>
>>>

Re: Growth in table size and performance degradation on read-queries

Reply via email to