Upgrading HBase in existing Phoenix cluster

2019-04-17 Thread Hao Qin Tan
Hi,

I was searching through the internet and I was unable to find any
instructions for doing a HBase upgrade for a Phoenix-enabled cluster.
Upgrading HBase looks like it will require a Phoenix side-grade but there
are no instructions for that in the Phoenix site.

Has anyone tried doing such an upgrade? Will there be any issues faced when
reading existing Phoenix data from the cluster post-upgrade?

We are currently thinking of upgrading from HBase 1.2.6 to 1.4 because of
there are various features in HBase 1.4 that we would like to make use of.

Regards,
Hao Qin


Re: Date for next release ?

2019-04-17 Thread Thomas D'Silva
Josh had started a discussion thread on the dev list about having a 5.0.1
release.
https://lists.apache.org/thread.html/99fcc737d7a8f82ddffb1b34a64f7099f7909900b8bea36dd6afca16@%3Cdev.phoenix.apache.org%3E

We would appreciate any help in making this release happen.

On Mon, Apr 15, 2019 at 4:11 AM Jean-Charles Jabouille <
jean-charles.jaboui...@kelkoogroup.com> wrote:

> Hi,
>
> first: thanks for this awesome project. We try to use Phoenix(5.0.0) for
> our big project. We are facing some issue with Hbase version. We would
> like to update it but Phoenix 5.0.0 is only Hbase 2.0.0 compliant.
>
> Do you have an idea about the next release of Phoenix using Hbase >
> 2.0.0 please ?
>
> Thanks again for you work.
>
> Regards,
>
> jean charles
>


Re: Large differences in query execution time for similar queries

2019-04-17 Thread James Taylor
Hi Hieu,
You could try add the /*+ SERIAL */ hint to see if that has any impact.
Also, have you tried not salting the table? The SALT_BUCKETS value of 128
is pretty high.

For the other issue, do you have a lot of deleted cells? You might try
running a major compaction. You might try adding a secondary index
on "doubleCol" if that's a common query.

Thanks,
James

On Thu, Apr 11, 2019 at 5:44 PM Hieu Nguyen  wrote:

> Hi,
>
> I am using Phoenix 4.14-cdh5.11, with sqlline-thin as the client.  I am
> seeing strange patterns around SELECT query execution time:
> 1. Increasing the LIMIT past a certain "threshold" results in
> significantly slower execution time.
> 2. Adding just one column (BIGINT) to the SELECT results in significantly
> slower execution time.
>
> This is our schema (names are changed for readability):
> CREATE TABLE "metadata" (
>   "pk"   VARCHAR PRIMARY KEY
> )
> SALT_BUCKETS = 128,
> COLUMN_ENCODED_BYTES = 0,
> BLOOMFILTER = 'ROWCOL',
> COMPRESSION = 'GZ';
>
> CREATE VIEW "extended" (
> "doubleCol" DOUBLE,
> "intCol" BIGINT,
> "intCol2" BIGINT,
> "intCol3" BIGINT,
> "stringCol" VARCHAR,
> "stringCol2" VARCHAR,
> "stringCol3" VARCHAR,
> "stringCol4" VARCHAR,
> "stringCol5" VARCHAR,
> "stringCol6" VARCHAR,
> "stringCol7" VARCHAR,
> ) AS SELECT * FROM "metadata"
>
> We have other views created that also select from "metadata" that define
> their own columns.  Overall, there are 1 million rows in this table, and
> 20k rows match the condition "doubleCol" > 100.
>
> Base query:
> SELECT
> "pk","doubleCol","intCol","intCol2","stringCol","stringCol2","intCol3"
> FROM "templatealldatattype-7d55c5a6-efe3-419d-9bce-9fea7c14f8bc"
> WHERE "doubleCol" > 100
> LIMIT 1
> -> 1.976 seconds
>
> Decreasing LIMIT to 9500 (only 5% decrease in number of rows):
> SELECT
> "pk","doubleCol","intCol","intCol2","stringCol","stringCol2","intCol3"
> FROM "templatealldatattype-7d55c5a6-efe3-419d-9bce-9fea7c14f8bc"
> WHERE "doubleCol" > 100
> LIMIT 9500
> -> 0.409 seconds
>
> Removing "intCol3" from SELECT, keeping LIMIT at 1:
> SELECT "pk","doubleCol","intCol","intCol2","stringCol","stringCol2"
> FROM "templatealldatattype-7d55c5a6-efe3-419d-9bce-9fea7c14f8bc"
> WHERE "doubleCol" > 100
> LIMIT 1
> -> 0.339 seconds
>
> I ran each of these queries a few times in a row.  There was small
> variation in execution time, but the 2nd and 3rd queries never were slower
> than the 1st query.
>
> The EXPLAIN plan did not change, except the ROW LIMIT value when
> explaining the 2nd query (9500 instead of 1).
>
> ++-+++
> |PLAN
>   | EST_BYTES_READ  | EST_ROWS_READ  |
> EST_INFO_TS   |
>
> ++-+++
> | CLIENT 128-CHUNK 382226 ROWS 314572800 BYTES PARALLEL 128-WAY ROUND
> ROBIN FULL SCAN OVER metadata  | 314572800   | 382226 |
> 1554973434637  |
> | SERVER FILTER BY "doubleCol" > 100.0
>| 314572800   | 382226 |
> 1554973434637  |
> | SERVER 1 ROW LIMIT
>| 314572800   | 382226 |
> 1554973434637  |
> | CLIENT 1 ROW LIMIT
>| 314572800   | 382226 |
> 1554973434637  |
>
> ++-+++
>
> I tried adding the SEEK_TO_COLUMN and NO_SEEK_TO_COLUMN hints as suggested
> in a similar thread (
> https://lists.apache.org/thread.html/4ef8384ecd31f30fdaf5837e3abc613142426d899e916c7aae4a46d4@%3Cuser.phoenix.apache.org%3E),
> but they had no effect.
>
> Any pointers to how we can investigate the 4-5x slowdown when increasing
> LIMIT by only ~5% or when selecting just one more BIGINT column?  Could we
> have exceeded some threshold in the result size that caused the query to
> perform a lot slower for seemingly small changes in the query?
>
> Thanks,
> -Hieu
>