Re: Growth in table size and performance degradation on read-queries
Also check whats the value for hbase.master.hfilecleaner.ttl? On Wed, 30 Jan 2019 at 7:31 AM, sudhir patil wrote: > Is the replication enabled and setup properly? > Are you creating snapshots or is backup enabled (hbase.backup.enable) ? > > > Check under ../hbase folder whats actually taking more space. > > On Wed, 30 Jan 2019 at 6:24 AM, talluri abhishek < > abhishektall...@gmail.com> wrote: > >> Hi Vincent, >> >> Versions is set to1 and keep_deleted_cells is false. It's basically the >> default settings and nothing has been changed. >> >> describe on the hbase table gives below: >> >> VERSIONS => '1', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS >> => 'FALSE' >> >> >> Thanks, >> Abhishek >> >> On Tue, Jan 29, 2019 at 3:20 PM Vincent Poon >> wrote: >> >>> is your max_versions set to 1 ? keep_deleted_cells? >>> >>> On Tue, Jan 29, 2019 at 10:41 AM talluri abhishek < >>> abhishektall...@gmail.com> wrote: >>> Hi All, We are seeing a couple of issues on some of our Phoenix tables where the size of the tables keep growing 2-3 times after around 2-3 days of ingestion and the read performance takes a big hit after that. Now, if we insert overwrite the data in that table to a new copy table, the data size comes back to normal size and the queries perform fast on that copy table. Initial table size after 1st day ~ 5G After 2 days of ingestion ~ 15G Re-write into a copy table ~ 5-6 G Query performance becomes proportional to the size of the table, lets say the query took 40 secs to run on the original table after first day, it takes around 130-160 secs after 2 days of ingestion. The same query when run on the copy table finishes in around ~40secs. Most of the ingested data after the first day are mostly updates happening on the existing rows, so we thought major compaction should solve the size issue but it does not shrink the size every time (load happens in parallel when the compaction is run). Write performance is always good and we have used salt buckets to even out the writes. The primary key is a 12-bit string which is made by the concatenation of some account id and an auto-generated transaction number. One query that has a toll on its performance as mentioned above is: *select (list of 50-70 columns) from original_table where account_id IN (list of 100k account ids) *[account_id in this query is the primary key on that table] We are currently increasing the heap space on these region servers to provide more memstore size, which could reduce the number of flushes for the upserted data. Could there be any other reason for the increase in the size of the table apart from the updated rows? How could we better the performance of those read queries? Thanks, Abhishek >>>
Re: Growth in table size and performance degradation on read-queries
Is the replication enabled and setup properly? Are you creating snapshots or is backup enabled (hbase.backup.enable) ? Check under ../hbase folder whats actually taking more space. On Wed, 30 Jan 2019 at 6:24 AM, talluri abhishek wrote: > Hi Vincent, > > Versions is set to1 and keep_deleted_cells is false. It's basically the > default settings and nothing has been changed. > > describe on the hbase table gives below: > > VERSIONS => '1', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => > 'FALSE' > > > Thanks, > Abhishek > > On Tue, Jan 29, 2019 at 3:20 PM Vincent Poon > wrote: > >> is your max_versions set to 1 ? keep_deleted_cells? >> >> On Tue, Jan 29, 2019 at 10:41 AM talluri abhishek < >> abhishektall...@gmail.com> wrote: >> >>> Hi All, >>> >>> We are seeing a couple of issues on some of our Phoenix tables where the >>> size of the tables keep growing 2-3 times after around 2-3 days of >>> ingestion and the read performance takes a big hit after that. Now, if we >>> insert overwrite the data in that table to a new copy table, the data size >>> comes back to normal size and the queries perform fast on that copy table. >>> >>> Initial table size after 1st day ~ 5G >>> After 2 days of ingestion ~ 15G >>> Re-write into a copy table ~ 5-6 G >>> >>> Query performance becomes proportional to the size of the table, lets >>> say the query took 40 secs to run on the original table after first day, it >>> takes around 130-160 secs after 2 days of ingestion. The same query when >>> run on the copy table finishes in around ~40secs. >>> >>> Most of the ingested data after the first day are mostly updates >>> happening on the existing rows, so we thought major compaction should solve >>> the size issue but it does not shrink the size every time (load happens in >>> parallel when the compaction is run). >>> Write performance is always good and we have used salt buckets to even >>> out the writes. The primary key is a 12-bit string which is made by the >>> concatenation of some account id and an auto-generated transaction number. >>> >>> One query that has a toll on its performance as mentioned above is: >>> *select (list of 50-70 columns) from original_table where account_id IN >>> (list of 100k account ids) *[account_id in this query is the primary >>> key on that table] >>> >>> We are currently increasing the heap space on these region servers to >>> provide more memstore size, which could reduce the number of flushes >>> for the upserted data. >>> >>> Could there be any other reason for the increase in the size of the >>> table apart from the updated rows? How could we better the performance of >>> those read queries? >>> >>> Thanks, >>> Abhishek >>> >>
Re: Growth in table size and performance degradation on read-queries
Hi Vincent, Versions is set to1 and keep_deleted_cells is false. It's basically the default settings and nothing has been changed. describe on the hbase table gives below: VERSIONS => '1', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE' Thanks, Abhishek On Tue, Jan 29, 2019 at 3:20 PM Vincent Poon wrote: > is your max_versions set to 1 ? keep_deleted_cells? > > On Tue, Jan 29, 2019 at 10:41 AM talluri abhishek < > abhishektall...@gmail.com> wrote: > >> Hi All, >> >> We are seeing a couple of issues on some of our Phoenix tables where the >> size of the tables keep growing 2-3 times after around 2-3 days of >> ingestion and the read performance takes a big hit after that. Now, if we >> insert overwrite the data in that table to a new copy table, the data size >> comes back to normal size and the queries perform fast on that copy table. >> >> Initial table size after 1st day ~ 5G >> After 2 days of ingestion ~ 15G >> Re-write into a copy table ~ 5-6 G >> >> Query performance becomes proportional to the size of the table, lets say >> the query took 40 secs to run on the original table after first day, it >> takes around 130-160 secs after 2 days of ingestion. The same query when >> run on the copy table finishes in around ~40secs. >> >> Most of the ingested data after the first day are mostly updates >> happening on the existing rows, so we thought major compaction should solve >> the size issue but it does not shrink the size every time (load happens in >> parallel when the compaction is run). >> Write performance is always good and we have used salt buckets to even >> out the writes. The primary key is a 12-bit string which is made by the >> concatenation of some account id and an auto-generated transaction number. >> >> One query that has a toll on its performance as mentioned above is: >> *select (list of 50-70 columns) from original_table where account_id IN >> (list of 100k account ids) *[account_id in this query is the primary key >> on that table] >> >> We are currently increasing the heap space on these region servers to >> provide more memstore size, which could reduce the number of flushes for >> the upserted data. >> >> Could there be any other reason for the increase in the size of the table >> apart from the updated rows? How could we better the performance of those >> read queries? >> >> Thanks, >> Abhishek >> >
Re: Growth in table size and performance degradation on read-queries
is your max_versions set to 1 ? keep_deleted_cells? On Tue, Jan 29, 2019 at 10:41 AM talluri abhishek wrote: > Hi All, > > We are seeing a couple of issues on some of our Phoenix tables where the > size of the tables keep growing 2-3 times after around 2-3 days of > ingestion and the read performance takes a big hit after that. Now, if we > insert overwrite the data in that table to a new copy table, the data size > comes back to normal size and the queries perform fast on that copy table. > > Initial table size after 1st day ~ 5G > After 2 days of ingestion ~ 15G > Re-write into a copy table ~ 5-6 G > > Query performance becomes proportional to the size of the table, lets say > the query took 40 secs to run on the original table after first day, it > takes around 130-160 secs after 2 days of ingestion. The same query when > run on the copy table finishes in around ~40secs. > > Most of the ingested data after the first day are mostly updates happening > on the existing rows, so we thought major compaction should solve the size > issue but it does not shrink the size every time (load happens in > parallel when the compaction is run). > Write performance is always good and we have used salt buckets to even out > the writes. The primary key is a 12-bit string which is made by the > concatenation of some account id and an auto-generated transaction number. > > One query that has a toll on its performance as mentioned above is: > *select (list of 50-70 columns) from original_table where account_id IN > (list of 100k account ids) *[account_id in this query is the primary key > on that table] > > We are currently increasing the heap space on these region servers to > provide more memstore size, which could reduce the number of flushes for > the upserted data. > > Could there be any other reason for the increase in the size of the table > apart from the updated rows? How could we better the performance of those > read queries? > > Thanks, > Abhishek >
Growth in table size and performance degradation on read-queries
Hi All, We are seeing a couple of issues on some of our Phoenix tables where the size of the tables keep growing 2-3 times after around 2-3 days of ingestion and the read performance takes a big hit after that. Now, if we insert overwrite the data in that table to a new copy table, the data size comes back to normal size and the queries perform fast on that copy table. Initial table size after 1st day ~ 5G After 2 days of ingestion ~ 15G Re-write into a copy table ~ 5-6 G Query performance becomes proportional to the size of the table, lets say the query took 40 secs to run on the original table after first day, it takes around 130-160 secs after 2 days of ingestion. The same query when run on the copy table finishes in around ~40secs. Most of the ingested data after the first day are mostly updates happening on the existing rows, so we thought major compaction should solve the size issue but it does not shrink the size every time (load happens in parallel when the compaction is run). Write performance is always good and we have used salt buckets to even out the writes. The primary key is a 12-bit string which is made by the concatenation of some account id and an auto-generated transaction number. One query that has a toll on its performance as mentioned above is: *select (list of 50-70 columns) from original_table where account_id IN (list of 100k account ids) *[account_id in this query is the primary key on that table] We are currently increasing the heap space on these region servers to provide more memstore size, which could reduce the number of flushes for the upserted data. Could there be any other reason for the increase in the size of the table apart from the updated rows? How could we better the performance of those read queries? Thanks, Abhishek