Re: Growth in table size and performance degradation on read-queries

2019-01-29 Thread sudhir patil
Also check whats the value for hbase.master.hfilecleaner.ttl?

On Wed, 30 Jan 2019 at 7:31 AM, sudhir patil 
wrote:

> Is the replication enabled and setup properly?
> Are you creating snapshots or is backup enabled (hbase.backup.enable) ?
>
>
> Check under ../hbase folder whats actually taking more space.
>
> On Wed, 30 Jan 2019 at 6:24 AM, talluri abhishek <
> abhishektall...@gmail.com> wrote:
>
>> Hi Vincent,
>>
>> Versions is set to1 and keep_deleted_cells is false. It's basically the
>> default settings and nothing has been changed.
>>
>> describe on the hbase table gives below:
>>
>> VERSIONS => '1', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS 
>> => 'FALSE'
>>
>>
>> Thanks,
>> Abhishek
>>
>> On Tue, Jan 29, 2019 at 3:20 PM Vincent Poon 
>> wrote:
>>
>>> is your max_versions set to 1 ?  keep_deleted_cells?
>>>
>>> On Tue, Jan 29, 2019 at 10:41 AM talluri abhishek <
>>> abhishektall...@gmail.com> wrote:
>>>
 Hi All,

 We are seeing a couple of issues on some of our Phoenix tables where
 the size of the tables keep growing 2-3 times after around 2-3 days of
 ingestion and the read performance takes a big hit after that. Now, if we
 insert overwrite the data in that table to a new copy table, the data size
 comes back to normal size and the queries perform fast on that copy table.

 Initial table size after 1st day ~ 5G
 After 2 days of ingestion ~ 15G
 Re-write into a copy table ~ 5-6 G

 Query performance becomes proportional to the size of the table, lets
 say the query took 40 secs to run on the original table after first day, it
 takes around 130-160 secs after 2 days of ingestion. The same query when
 run on the copy table finishes in around ~40secs.

 Most of the ingested data after the first day are mostly updates
 happening on the existing rows, so we thought major compaction should solve
 the size issue but it does not shrink the size every time (load happens in
 parallel when the compaction is run).
 Write performance is always good and we have used salt buckets to even
 out the writes. The primary key is a 12-bit string which is made by the
 concatenation of some account id and an auto-generated transaction number.

 One query that has a toll on its performance as mentioned above is:
 *select (list of 50-70 columns) from original_table where account_id IN
 (list of 100k account ids) *[account_id in this query is the primary
 key on that table]

 We are currently increasing the heap space on these region servers to
 provide more memstore size, which could reduce the number of flushes
 for the upserted data.

 Could there be any other reason for the increase in the size of the
 table apart from the updated rows? How could we better the performance of
 those read queries?

 Thanks,
 Abhishek

>>>


Re: Growth in table size and performance degradation on read-queries

2019-01-29 Thread sudhir patil
Is the replication enabled and setup properly?
Are you creating snapshots or is backup enabled (hbase.backup.enable) ?


Check under ../hbase folder whats actually taking more space.

On Wed, 30 Jan 2019 at 6:24 AM, talluri abhishek 
wrote:

> Hi Vincent,
>
> Versions is set to1 and keep_deleted_cells is false. It's basically the
> default settings and nothing has been changed.
>
> describe on the hbase table gives below:
>
> VERSIONS => '1', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 
> 'FALSE'
>
>
> Thanks,
> Abhishek
>
> On Tue, Jan 29, 2019 at 3:20 PM Vincent Poon 
> wrote:
>
>> is your max_versions set to 1 ?  keep_deleted_cells?
>>
>> On Tue, Jan 29, 2019 at 10:41 AM talluri abhishek <
>> abhishektall...@gmail.com> wrote:
>>
>>> Hi All,
>>>
>>> We are seeing a couple of issues on some of our Phoenix tables where the
>>> size of the tables keep growing 2-3 times after around 2-3 days of
>>> ingestion and the read performance takes a big hit after that. Now, if we
>>> insert overwrite the data in that table to a new copy table, the data size
>>> comes back to normal size and the queries perform fast on that copy table.
>>>
>>> Initial table size after 1st day ~ 5G
>>> After 2 days of ingestion ~ 15G
>>> Re-write into a copy table ~ 5-6 G
>>>
>>> Query performance becomes proportional to the size of the table, lets
>>> say the query took 40 secs to run on the original table after first day, it
>>> takes around 130-160 secs after 2 days of ingestion. The same query when
>>> run on the copy table finishes in around ~40secs.
>>>
>>> Most of the ingested data after the first day are mostly updates
>>> happening on the existing rows, so we thought major compaction should solve
>>> the size issue but it does not shrink the size every time (load happens in
>>> parallel when the compaction is run).
>>> Write performance is always good and we have used salt buckets to even
>>> out the writes. The primary key is a 12-bit string which is made by the
>>> concatenation of some account id and an auto-generated transaction number.
>>>
>>> One query that has a toll on its performance as mentioned above is:
>>> *select (list of 50-70 columns) from original_table where account_id IN
>>> (list of 100k account ids) *[account_id in this query is the primary
>>> key on that table]
>>>
>>> We are currently increasing the heap space on these region servers to
>>> provide more memstore size, which could reduce the number of flushes
>>> for the upserted data.
>>>
>>> Could there be any other reason for the increase in the size of the
>>> table apart from the updated rows? How could we better the performance of
>>> those read queries?
>>>
>>> Thanks,
>>> Abhishek
>>>
>>


Re: Growth in table size and performance degradation on read-queries

2019-01-29 Thread talluri abhishek
Hi Vincent,

Versions is set to1 and keep_deleted_cells is false. It's basically the
default settings and nothing has been changed.

describe on the hbase table gives below:

VERSIONS => '1', MIN_VERSIONS => '0', TTL => 'FOREVER',
KEEP_DELETED_CELLS => 'FALSE'


Thanks,
Abhishek

On Tue, Jan 29, 2019 at 3:20 PM Vincent Poon  wrote:

> is your max_versions set to 1 ?  keep_deleted_cells?
>
> On Tue, Jan 29, 2019 at 10:41 AM talluri abhishek <
> abhishektall...@gmail.com> wrote:
>
>> Hi All,
>>
>> We are seeing a couple of issues on some of our Phoenix tables where the
>> size of the tables keep growing 2-3 times after around 2-3 days of
>> ingestion and the read performance takes a big hit after that. Now, if we
>> insert overwrite the data in that table to a new copy table, the data size
>> comes back to normal size and the queries perform fast on that copy table.
>>
>> Initial table size after 1st day ~ 5G
>> After 2 days of ingestion ~ 15G
>> Re-write into a copy table ~ 5-6 G
>>
>> Query performance becomes proportional to the size of the table, lets say
>> the query took 40 secs to run on the original table after first day, it
>> takes around 130-160 secs after 2 days of ingestion. The same query when
>> run on the copy table finishes in around ~40secs.
>>
>> Most of the ingested data after the first day are mostly updates
>> happening on the existing rows, so we thought major compaction should solve
>> the size issue but it does not shrink the size every time (load happens in
>> parallel when the compaction is run).
>> Write performance is always good and we have used salt buckets to even
>> out the writes. The primary key is a 12-bit string which is made by the
>> concatenation of some account id and an auto-generated transaction number.
>>
>> One query that has a toll on its performance as mentioned above is:
>> *select (list of 50-70 columns) from original_table where account_id IN
>> (list of 100k account ids) *[account_id in this query is the primary key
>> on that table]
>>
>> We are currently increasing the heap space on these region servers to
>> provide more memstore size, which could reduce the number of flushes for
>> the upserted data.
>>
>> Could there be any other reason for the increase in the size of the table
>> apart from the updated rows? How could we better the performance of those
>> read queries?
>>
>> Thanks,
>> Abhishek
>>
>


Re: Growth in table size and performance degradation on read-queries

2019-01-29 Thread Vincent Poon
is your max_versions set to 1 ?  keep_deleted_cells?

On Tue, Jan 29, 2019 at 10:41 AM talluri abhishek 
wrote:

> Hi All,
>
> We are seeing a couple of issues on some of our Phoenix tables where the
> size of the tables keep growing 2-3 times after around 2-3 days of
> ingestion and the read performance takes a big hit after that. Now, if we
> insert overwrite the data in that table to a new copy table, the data size
> comes back to normal size and the queries perform fast on that copy table.
>
> Initial table size after 1st day ~ 5G
> After 2 days of ingestion ~ 15G
> Re-write into a copy table ~ 5-6 G
>
> Query performance becomes proportional to the size of the table, lets say
> the query took 40 secs to run on the original table after first day, it
> takes around 130-160 secs after 2 days of ingestion. The same query when
> run on the copy table finishes in around ~40secs.
>
> Most of the ingested data after the first day are mostly updates happening
> on the existing rows, so we thought major compaction should solve the size
> issue but it does not shrink the size every time (load happens in
> parallel when the compaction is run).
> Write performance is always good and we have used salt buckets to even out
> the writes. The primary key is a 12-bit string which is made by the
> concatenation of some account id and an auto-generated transaction number.
>
> One query that has a toll on its performance as mentioned above is:
> *select (list of 50-70 columns) from original_table where account_id IN
> (list of 100k account ids) *[account_id in this query is the primary key
> on that table]
>
> We are currently increasing the heap space on these region servers to
> provide more memstore size, which could reduce the number of flushes for
> the upserted data.
>
> Could there be any other reason for the increase in the size of the table
> apart from the updated rows? How could we better the performance of those
> read queries?
>
> Thanks,
> Abhishek
>


Growth in table size and performance degradation on read-queries

2019-01-29 Thread talluri abhishek
Hi All,

We are seeing a couple of issues on some of our Phoenix tables where the
size of the tables keep growing 2-3 times after around 2-3 days of
ingestion and the read performance takes a big hit after that. Now, if we
insert overwrite the data in that table to a new copy table, the data size
comes back to normal size and the queries perform fast on that copy table.

Initial table size after 1st day ~ 5G
After 2 days of ingestion ~ 15G
Re-write into a copy table ~ 5-6 G

Query performance becomes proportional to the size of the table, lets say
the query took 40 secs to run on the original table after first day, it
takes around 130-160 secs after 2 days of ingestion. The same query when
run on the copy table finishes in around ~40secs.

Most of the ingested data after the first day are mostly updates happening
on the existing rows, so we thought major compaction should solve the size
issue but it does not shrink the size every time (load happens in
parallel when the compaction is run).
Write performance is always good and we have used salt buckets to even out
the writes. The primary key is a 12-bit string which is made by the
concatenation of some account id and an auto-generated transaction number.

One query that has a toll on its performance as mentioned above is:
*select (list of 50-70 columns) from original_table where account_id IN
(list of 100k account ids) *[account_id in this query is the primary key on
that table]

We are currently increasing the heap space on these region servers to
provide more memstore size, which could reduce the number of flushes for
the upserted data.

Could there be any other reason for the increase in the size of the table
apart from the updated rows? How could we better the performance of those
read queries?

Thanks,
Abhishek