from:"Gerald Sangudi"

[jira] [Updated] (PHOENIX-5353) Incorrect results when JOINs project array elements

2019-06-18 Thread Gerald Sangudi (JIRA)



 [ 
https://issues.apache.org/jira/browse/PHOENIX-5353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gerald Sangudi updated PHOENIX-5353:

Description: 
There are two errors observed below, one with HASH JOIN and the other with SORT 
MERGE JOIN.

Here are the steps for reproducing the bug in array element projection when 
using sort merge join.
 I also noticed there is an issue with hash-joins (see the last example).

Create tables and upsert test data.
 0: jdbc:phoenix:localhost:2181:/hbase> create table d (c1 integer primary key, 
c2 integer, c3 integer[]);
 0: jdbc:phoenix:localhost:2181:/hbase> upsert into d values (1, 1, 
array[1,2,3]);
 0: jdbc:phoenix:localhost:2181:/hbase> upsert into d values (2, 10, 
array[2,3,4]);

0: jdbc:phoenix:localhost:2181:/hbase> create table t2 (c1 integer primary key, 
c2 integer);
 0: jdbc:phoenix:localhost:2181:/hbase> upsert into t2 values (1, 10);
 0: jdbc:phoenix:localhost:2181:/hbase> upsert into t2 values (2, 20);
 Verify data in the tables
 0: jdbc:phoenix:localhost:2181:/hbase> select * from d;
 +--+++
|C1|C2|C3|

+--+++
|1|1|[1, 2, 3]|
|2|10|[2, 3, 4]|

+--+++
 2 rows selected (0.027 seconds)
 0: jdbc:phoenix:localhost:2181:/hbase> select * from t2;
 +--++
|C1|C2|

+--++
|1|10|
|2|20|

+--++
 2 rows selected (0.021 seconds)


 Perform sort merge join without projecting array elements. This works fine.
 0: jdbc:phoenix:localhost:2181:/hbase> select /*+ use_sort_merge_join */ d.c1, 
d.c2, d.c3, t2.c1, t2.c2 from d join t2 on d.c1 = t2.c1;
 ++--++++
|D.C1|D.C2|D.C3|T2.C1|T2.C2|

++--++++
|1|1|[1, 2, 3]|1|10|
|2|10|[2, 3, 4]|2|20|

++--++++
 2 rows selected (0.054 seconds)


 Perform sort merge join by projecting array elements. Returns junk data for 
array elements.
 0: jdbc:phoenix:localhost:2181:/hbase> select /*+ use_sort_merge_join */ d.c1, 
d.c2, d.c3[1], d.c3[2], d.c3[3], t2.c1, t2.c2 from d join t2 on d.c1 = t2.c1;
 
++--++++--++
|D.C1|D.C2|ARRAY_ELEM(D.C3, 1)|ARRAY_ELEM(D.C3, 2)|ARRAY_ELEM(D.C3, 
3)|T2.C1|T2.C2|

++--++++--++
|1|1|-1937768448|-2122317824|-2105540608|1|10|
|2|10|-1937768448|-2105540608|-2088763392|2|20|

++--++++--++
 2 rows selected (0.043 seconds)


 Array element projection works fine when using hash-join but columns from the 
non-array table are messed up.
 0: jdbc:phoenix:localhost:2181:/hbase> select d.c1, d.c2, d.c3[1], d.c3[2], 
d.c3[3], t2.c1, t2.c2 from d join t2 on d.c1 = t2.c1;
 
++--++++++--
|D.C1|D.C2|ARRAY_ELEM(D.C3, 1)|ARRAY_ELEM(D.C3, 2)|ARRAY_ELEM(D.C3, 
3)|T2.C1|T2.C2|

++--++++++--
|1|1|1|2|3|-2146795520|-2147319808|
|2|10|2|3|4|-2146140160|-2147319808|

++--++++++--
 2 rows selected (0.067 seconds)

  was:
There are two errors observed below, one with HASH JOIN and the other with SORT 
MERGE JOIN.

Here are the steps for reproducing the bug in array element projection when 
using sort merge join.
I also noticed there is an issue with hash-joins (see the last example).

* Create tables and upsert test data.
0: jdbc:phoenix:localhost:2181:/hbase> create table d (c1 integer primary key, 
c2 integer, c3 integer[]);
0: jdbc:phoenix:localhost:2181:/hbase> upsert into d values (1, 1, 
array[1,2,3]);
0: jdbc:phoenix:localhost:2181:/hbase> upsert into d values (2, 10, 
array[2,3,4]);

0: jdbc:phoenix:localhost:2181:/hbase> create table t2 (c1 integer primary key, 
c2 integer);
0: jdbc:phoenix:localhost:2181:/hbase> upsert into t2 values (1, 10);
0: jdbc:phoenix:localhost:2181:/hbase> upsert into t2 values (2, 20);
Verify data in the tables
0: jdbc:phoenix:localhost:2181:/hbase> select * from d;
+-+-++
| C1  | C2  | C3 |
+-+-++
| 1   | 1   | [1, 2, 3]  |
| 2   | 10  | [2, 3, 4]  |
+-+-++
2 rows selected (0.027 seconds)
0: jdbc:phoenix:localhost:2181:/hbase> select * from t2;
+-+-+
| C1  | C2  |
+-+-+
| 1   | 10  |
| 2   | 20  |
+-+-+
2 rows selected (0.021 seconds)
Perform sort merge join without projecting array elements. This works fine.
0: jdbc:phoenix:localhost:2181:/hbase> select /*+ use_sort_merge_join */ d.

[jira] [Updated] (PHOENIX-5353) Incorrect results when JOINs project array elements

2019-06-18 Thread Gerald Sangudi (JIRA)



 [ 
https://issues.apache.org/jira/browse/PHOENIX-5353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gerald Sangudi updated PHOENIX-5353:

Description: 
There are two errors observed below, one with HASH JOIN and the other with SORT 
MERGE JOIN.

Here are the steps for reproducing the bug in array element projection when 
using sort merge join.
I also noticed there is an issue with hash-joins (see the last example).

* Create tables and upsert test data.
0: jdbc:phoenix:localhost:2181:/hbase> create table d (c1 integer primary key, 
c2 integer, c3 integer[]);
0: jdbc:phoenix:localhost:2181:/hbase> upsert into d values (1, 1, 
array[1,2,3]);
0: jdbc:phoenix:localhost:2181:/hbase> upsert into d values (2, 10, 
array[2,3,4]);

0: jdbc:phoenix:localhost:2181:/hbase> create table t2 (c1 integer primary key, 
c2 integer);
0: jdbc:phoenix:localhost:2181:/hbase> upsert into t2 values (1, 10);
0: jdbc:phoenix:localhost:2181:/hbase> upsert into t2 values (2, 20);
Verify data in the tables
0: jdbc:phoenix:localhost:2181:/hbase> select * from d;
+-+-++
| C1  | C2  | C3 |
+-+-++
| 1   | 1   | [1, 2, 3]  |
| 2   | 10  | [2, 3, 4]  |
+-+-++
2 rows selected (0.027 seconds)
0: jdbc:phoenix:localhost:2181:/hbase> select * from t2;
+-+-+
| C1  | C2  |
+-+-+
| 1   | 10  |
| 2   | 20  |
+-+-+
2 rows selected (0.021 seconds)
Perform sort merge join without projecting array elements. This works fine.
0: jdbc:phoenix:localhost:2181:/hbase> select /*+ use_sort_merge_join */ d.c1, 
d.c2, d.c3, t2.c1, t2.c2 from d join t2 on d.c1 = t2.c1;
+---+---++++
| D.C1  | D.C2  |D.C3| T2.C1  | T2.C2  |
+---+---++++
| 1 | 1 | [1, 2, 3]  | 1  | 10 |
| 2 | 10| [2, 3, 4]  | 2  | 20 |
+---+---++++
2 rows selected (0.054 seconds)
Perform sort merge join by projecting array elements. Returns junk data for 
array elements.
0: jdbc:phoenix:localhost:2181:/hbase> select /*+ use_sort_merge_join */ d.c1, 
d.c2, d.c3[1], d.c3[2], d.c3[3], t2.c1, t2.c2 from d join t2 on d.c1 = t2.c1;
+---+---+--+--+--+++
| D.C1  | D.C2  | ARRAY_ELEM(D.C3, 1)  | ARRAY_ELEM(D.C3, 2)  | 
ARRAY_ELEM(D.C3, 3)  | T2.C1  | T2.C2  |
+---+---+--+--+--+++
| 1 | 1 | -1937768448  | -2122317824  | -2105540608 
 | 1  | 10 |
| 2 | 10| -1937768448  | -2105540608  | -2088763392 
 | 2  | 20 |
+---+---+--+--+--+++
2 rows selected (0.043 seconds)
Array element projection works fine when using hash-join but columns from the 
non-array table are messed up.
0: jdbc:phoenix:localhost:2181:/hbase> select  d.c1, d.c2, d.c3[1], d.c3[2], 
d.c3[3], t2.c1, t2.c2 from d join t2 on d.c1 = t2.c1;
+---+---+--+--+--+--+--+
| D.C1  | D.C2  | ARRAY_ELEM(D.C3, 1)  | ARRAY_ELEM(D.C3, 2)  | 
ARRAY_ELEM(D.C3, 3)  |T2.C1 |T2.C2 |
+---+---+--+--+--+--+--+
| 1 | 1 | 1| 2| 3   
 | -2146795520  | -2147319808  |
| 2 | 10| 2| 3| 4   
 | -2146140160  | -2147319808  |
+---+---+--+--+--+--+--+
2 rows selected (0.067 seconds)


  was:
There are two errors observed below, one with HASH JOIN and the other with SORT 
MERGE JOIN.

Here are the steps for reproducing the bug in array element projection when 
using sort merge join.
I also noticed there is an issue with hash-joins (see the last example).
Create tables and upsert test data.
0: jdbc:phoenix:localhost:2181:/hbase> create table d (c1 integer primary key, 
c2 integer, c3 integer[]);
0: jdbc:phoenix:localhost:2181:/hbase> upsert into d values (1, 1, 
array[1,2,3]);
0: jdbc:phoenix:localhost:2181:/hbase> upsert into d values (2, 10, 
array[2,3,4]);

0: jdbc:phoenix:localhost:2181:/hbase> create table t2 (c1 integer primary key, 
c2 integer);
0: jdbc:phoenix:localhost:2181:/hbase> upsert into t2 values (1, 10);
0: jdbc:phoenix:localhost:2181:/hbase> upsert into t2 values (2, 20);
Verify data in the tables
0: jdbc:phoenix:localhost:2181:/hbase> select * from d;
+-+-++
| C1  | C2  | C3 |
+-+-++
| 1   | 1   | [1, 2, 3]  |
| 2

[jira] [Updated] (PHOENIX-5353) Incorrect results when JOINs project array elements

2019-06-18 Thread Gerald Sangudi (JIRA)



 [ 
https://issues.apache.org/jira/browse/PHOENIX-5353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gerald Sangudi updated PHOENIX-5353:

Description: 
There are two errors observed below, one with HASH JOIN and the other with SORT 
MERGE JOIN.

Here are the steps for reproducing the bug in array element projection when 
using sort merge join.
I also noticed there is an issue with hash-joins (see the last example).
Create tables and upsert test data.
0: jdbc:phoenix:localhost:2181:/hbase> create table d (c1 integer primary key, 
c2 integer, c3 integer[]);
0: jdbc:phoenix:localhost:2181:/hbase> upsert into d values (1, 1, 
array[1,2,3]);
0: jdbc:phoenix:localhost:2181:/hbase> upsert into d values (2, 10, 
array[2,3,4]);

0: jdbc:phoenix:localhost:2181:/hbase> create table t2 (c1 integer primary key, 
c2 integer);
0: jdbc:phoenix:localhost:2181:/hbase> upsert into t2 values (1, 10);
0: jdbc:phoenix:localhost:2181:/hbase> upsert into t2 values (2, 20);
Verify data in the tables
0: jdbc:phoenix:localhost:2181:/hbase> select * from d;
+-+-++
| C1  | C2  | C3 |
+-+-++
| 1   | 1   | [1, 2, 3]  |
| 2   | 10  | [2, 3, 4]  |
+-+-++
2 rows selected (0.027 seconds)
0: jdbc:phoenix:localhost:2181:/hbase> select * from t2;
+-+-+
| C1  | C2  |
+-+-+
| 1   | 10  |
| 2   | 20  |
+-+-+
2 rows selected (0.021 seconds)
Perform sort merge join without projecting array elements. This works fine.
0: jdbc:phoenix:localhost:2181:/hbase> select /*+ use_sort_merge_join */ d.c1, 
d.c2, d.c3, t2.c1, t2.c2 from d join t2 on d.c1 = t2.c1;
+---+---++++
| D.C1  | D.C2  |D.C3| T2.C1  | T2.C2  |
+---+---++++
| 1 | 1 | [1, 2, 3]  | 1  | 10 |
| 2 | 10| [2, 3, 4]  | 2  | 20 |
+---+---++++
2 rows selected (0.054 seconds)
Perform sort merge join by projecting array elements. Returns junk data for 
array elements.
0: jdbc:phoenix:localhost:2181:/hbase> select /*+ use_sort_merge_join */ d.c1, 
d.c2, d.c3[1], d.c3[2], d.c3[3], t2.c1, t2.c2 from d join t2 on d.c1 = t2.c1;
+---+---+--+--+--+++
| D.C1  | D.C2  | ARRAY_ELEM(D.C3, 1)  | ARRAY_ELEM(D.C3, 2)  | 
ARRAY_ELEM(D.C3, 3)  | T2.C1  | T2.C2  |
+---+---+--+--+--+++
| 1 | 1 | -1937768448  | -2122317824  | -2105540608 
 | 1  | 10 |
| 2 | 10| -1937768448  | -2105540608  | -2088763392 
 | 2  | 20 |
+---+---+--+--+--+++
2 rows selected (0.043 seconds)
Array element projection works fine when using hash-join but columns from the 
non-array table are messed up.
0: jdbc:phoenix:localhost:2181:/hbase> select  d.c1, d.c2, d.c3[1], d.c3[2], 
d.c3[3], t2.c1, t2.c2 from d join t2 on d.c1 = t2.c1;
+---+---+--+--+--+--+--+
| D.C1  | D.C2  | ARRAY_ELEM(D.C3, 1)  | ARRAY_ELEM(D.C3, 2)  | 
ARRAY_ELEM(D.C3, 3)  |T2.C1 |T2.C2 |
+---+---+--+--+--+--+--+
| 1 | 1 | 1| 2| 3   
 | -2146795520  | -2147319808  |
| 2 | 10| 2| 3| 4   
 | -2146140160  | -2147319808  |
+---+---+--+--+--+--+--+
2 rows selected (0.067 seconds)


  was:
There are two errors observed below, one with HASH JOIN and the other with SORT 
MERGE JOIN.
 
Here are the steps for reproducing the bug in array element projection when 
using sort merge join. 
I also noticed there is an issue with hash-joins (see the last example).
 * Create tables and upsert test data.
{quote}{quote}0: jdbc:phoenix:localhost:2181:/hbase> create table d (c1 integer 
primary key, c2 integer, c3 integer[]);{quote}
{quote}0: jdbc:phoenix:localhost:2181:/hbase> upsert into d values (1, 1, 
array[1,2,3]);{quote}
{quote}0: jdbc:phoenix:localhost:2181:/hbase> upsert into d values (2, 10, 
array[2,3,4]);{quote}
{quote} {quote}
{quote}0: jdbc:phoenix:localhost:2181:/hbase> create table t2 (c1 integer 
primary key, c2 integer);{quote}
{quote}0: jdbc:phoenix:localhost:2181:/hbase> upsert into t2 values (1, 
10);{quote}
{quote}0: jdbc:phoenix:localhost:2181:/hbase> upsert into t2 values (2, 
20);{quote}{quote}
 * Verify data in the tables
{quote}{quote}0: jdbc:phoenix:localhost:2181:/hbase>

[jira] [Created] (PHOENIX-5353) Incorrect results when JOINs project array elements

2019-06-17 Thread Gerald Sangudi (JIRA)

Gerald Sangudi created PHOENIX-5353:
---

 Summary: Incorrect results when JOINs project array elements
 Key: PHOENIX-5353
 URL: https://issues.apache.org/jira/browse/PHOENIX-5353
 Project: Phoenix
  Issue Type: Bug
Reporter: Gerald Sangudi


There are two errors observed below, one with HASH JOIN and the other with SORT 
MERGE JOIN.
 
Here are the steps for reproducing the bug in array element projection when 
using sort merge join. 
I also noticed there is an issue with hash-joins (see the last example).
 * Create tables and upsert test data.
{quote}{quote}0: jdbc:phoenix:localhost:2181:/hbase> create table d (c1 integer 
primary key, c2 integer, c3 integer[]);{quote}
{quote}0: jdbc:phoenix:localhost:2181:/hbase> upsert into d values (1, 1, 
array[1,2,3]);{quote}
{quote}0: jdbc:phoenix:localhost:2181:/hbase> upsert into d values (2, 10, 
array[2,3,4]);{quote}
{quote} {quote}
{quote}0: jdbc:phoenix:localhost:2181:/hbase> create table t2 (c1 integer 
primary key, c2 integer);{quote}
{quote}0: jdbc:phoenix:localhost:2181:/hbase> upsert into t2 values (1, 
10);{quote}
{quote}0: jdbc:phoenix:localhost:2181:/hbase> upsert into t2 values (2, 
20);{quote}{quote}
 * Verify data in the tables
{quote}{quote}0: jdbc:phoenix:localhost:2181:/hbase> select * from d;
+-+-++
| C1  | C2  |     C3     |
+-+-++
| 1   | 1   | [1, 2, 3]  |
| 2   | 10  | [2, 3, 4]  |
+-+-++
2 rows selected (0.027 seconds)
0: jdbc:phoenix:localhost:2181:/hbase> select * from t2;
+-+-+
| C1  | C2  |
+-+-+
| 1   | 10  |
| 2   | 20  |
+-+-+
2 rows selected (0.021 seconds){quote}{quote}
 * Perform sort merge join without projecting array elements. This works fine.
{quote}{quote}0: jdbc:phoenix:localhost:2181:/hbase> select /*+ 
use_sort_merge_join */ d.c1, d.c2, d.c3, t2.c1, t2.c2 from d join t2 on d.c1 = 
t2.c1;{quote}
{quote}+---+---++++{quote}
{quote}| D.C1  | D.C2  |    D.C3    | T2.C1  | T2.C2  |{quote}
{quote}+---+---++++{quote}
{quote}| 1     | 1     | [1, 2, 3]  | 1      | 10     |{quote}
{quote}| 2     | 10    | [2, 3, 4]  | 2      | 20     |{quote}
{quote}+---+---++++{quote}
{quote}2 rows selected (0.054 seconds){quote}{quote}
 * Perform sort merge join by projecting array elements. *Returns junk data for 
array elements.*
{quote}{quote}0: jdbc:phoenix:localhost:2181:/hbase> select /*+ 
use_sort_merge_join */ d.c1, d.c2, d.c3[1], d.c3[2], d.c3[3], t2.c1, t2.c2 from 
d join t2 on d.c1 = t2.c1;{quote}
{quote}+---+---+--+--+--+++{quote}
{quote}| D.C1  | D.C2  | ARRAY_ELEM(D.C3, 1)  | ARRAY_ELEM(D.C3, 2)  | 
ARRAY_ELEM(D.C3, 3)  | T2.C1  | T2.C2  |{quote}
{quote}+---+---+--+--+--+++{quote}
{quote}| 1     | 1     | {color:#ff}-1937768448{color}          | 
{color:#ff}-2122317824{color}          | {color:#ff}-2105540608{color}  
        | 1      | 10     |{quote}
{quote}| 2     | 10    | {color:#ff}-1937768448{color}          | 
{color:#ff}-2105540608{color}          | {color:#ff}-2088763392{color}  
        | 2      | 20     |{quote}
{quote}+---+---+--+--+--+++{quote}
{quote}2 rows selected (0.043 seconds){quote}{quote}
 * Array element projection works fine when using hash-join but *columns from 
the non-array table are messed up*.
{quote}{quote}0: jdbc:phoenix:localhost:2181:/hbase> select  d.c1, d.c2, 
d.c3[1], d.c3[2], d.c3[3], t2.c1, t2.c2 from d join t2 on d.c1 = t2.c1;{quote}
{quote}+---+---+--+--+--+--+--+{quote}
{quote}| D.C1  | D.C2  | ARRAY_ELEM(D.C3, 1)  | ARRAY_ELEM(D.C3, 2)  | 
ARRAY_ELEM(D.C3, 3)  |    T2.C1     |    T2.C2     |{quote}
{quote}+---+---+--+--+--+--+--+{quote}
{quote}| 1     | 1     | 1                    | 2                    | 3        
            | {color:#ff}-2146795520{color}  | 
{color:#ff}-2147319808{color}  |{quote}
{quote}| 2     | 10    | 2                    | 3                    | 4        
            | {color:#ff}-2146140160{color}  | 
{color:#ff}-2147319808{color}  |{quote}
{quote}+---+---+--+--+--+--+--+{quote}
{quote}2 rows selected (0.067 seconds){quote}{quote}
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Build failure

2019-01-28 Thread Gerald Sangudi

Hi folks,

The build is failing for me on the latest master. I also tried a fresh
install.

The latest commit that's failing is
https://github.com/apache/phoenix/commit/1455763b43cc71bff9e69daf928be59251e97235

The first error is a dependency on a HBase class that's not included in
Phoenix.

Previous commits work fine for me.

Thanks,
Gerald

[jira] [Updated] (PHOENIX-4757) composite key salt_buckets

2018-11-20 Thread Gerald Sangudi (JIRA)



 [ 
https://issues.apache.org/jira/browse/PHOENIX-4757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gerald Sangudi updated PHOENIX-4757:

Comment: was deleted

(was: [~tdsilva] - id_1and id_2 are not constant. They are meaningful in the 
application domain, and they vary by record. There would be no write 
hotspotting if new writes arrive in random order wrt id_2.)

> composite key salt_buckets
> --
>
> Key: PHOENIX-4757
> URL: https://issues.apache.org/jira/browse/PHOENIX-4757
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.11.0
>Reporter: cmd
>Priority: Major
> Fix For: 4.11.0
>
>
> CREATE TABLE IF NOT EXISTS user_events (
>  user_id VARCHAR NOT NULL,
>  event_type VARCHAR NOT NULL,
>  event_time VARCHAR NOT NULL
>  event_msg VARCHAR NOT NULL
>  event_status VARCHAR NOT NULL
>  event_opt VARCHAR NOT NULL
>  CONSTRAINT my_pk PRIMARY KEY (user_id,event_type,event_time)) 
> SALT_BUCKETS=128;
> and my query is:
>  1.select event_type,count(0) from us_population where user_id='' group 
> by event_type
>  2.select count(0) from us_population where user_id='' and 
> event_type='0101'
>  3.select * from us_population where user_id='' and event_type='0101' and 
> event_time>'20180101' and event_time<'20180201' order by event_time limit 
> 50,100
> Concurrency query ratio:
>  1:80%
>  2:10%
>  3:10% 
>  user_events data :50billion
>  It can be a field/some fileds of the primary key salted by hash
>  grammar with "SALT_BUCKETS(user_id)=4" or 
> "SALT_BUCKETS(user_id,event_type)=4"
> ref:
>  
> [https://www.safaribooksonline.com/library/view/greenplum-architecture/9781940540337/xhtml/chapter03.xhtml]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Salting based on partial rowkeys

2018-11-05 Thread Gerald Sangudi

Hi folks,

I would like to continue the discussion below from a few weeks ago. I would
like to address the feedback from Thomas, Jaanai, and Sergey.

In thinking about this some more, it amounts to introducing a form of hash
partitioning in Phoenix + HBase. For this to work in the general case,
region splits and merges would need to be disabled, and the regions
pre-defined. This is already supported by DisableRegionSplitPolicy.

I understand that HBase is range-partitioned. What we are proposing is to
allow specialized use cases for users to manage their own partitioning. All
other users would be unaffected.

Jaanai also mentioned that salting is meant to address write hotspots. That
is true. We are proposing an additional use of salting (or if you prefer,
another feature altogether) for specialized use cases. Again, all other
users would be unaffected.

We have a lot of data in HBase, and some of our queries would benefit from
some sort of hash partitioning. That is the crux of our proposal.

Some specific responses:

@Josh -- exactly.

@Thomas -- if we can partition the data exactly how we want, we can make
sure that certain queries do not go across regions. We can either scan only
the matching regions, and we can perform the full aggregation within each
matching region, without having to do a merge.

@Jaanai, @Sergey -- I hope I explained about hotspotting above. We would
also be fine calling it something other than salting. Maybe that's better
to avoid confusion.

@Lars -- you posted questions in PHOENIX-4757. I will respond there.

Thanks everyone for all the feedback on this. Our goal is to discuss all
the concerns, and then finally get a yay or nay consensus from the
committers.

Gerald

On Sun, Sep 16, 2018 at 9:52 PM la...@apache.org  wrote:

>  I added some comments on the PHOENIX-4757
>
> On Thursday, September 13, 2018, 6:42:12 PM PDT, Josh Elser <
> els...@apache.org> wrote:
>
>  Ahh, I get you now.
>
> For a composite primary key made up of columns 1 through N, you want
> similar controls to compute the value of the salt based on a sequence of
> the columns 1 through M where M <= N (instead of always on all columns).
>
> For large numbers of salt buckets and a scan over a facet, you prune
> your search space considerably. Makes sense to me!
>
> On 9/13/18 6:37 PM, Gerald Sangudi wrote:
> > In case the text formatting is lost below, I also added it as a comment
> in
> > the JIRA ticket:
> >
> > https://issues.apache.org/jira/browse/PHOENIX-4757
> >
> >
> > On Thu, Sep 13, 2018 at 3:24 PM, Gerald Sangudi 
> > wrote:
> >
> >> Sorry I missed Josh's reply; I've subscribed to the dev list now.
> >>
> >> Below is a copy-and-paste from our internal document. Thanks in advance
> >> for your review and additional feedback on this.
> >>
> >> Gerald
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> *BackgroundWe make extensive use of multi-column rowkeys and salting
> >> <https://phoenix.apache.org/salted.html> in our different apache
> phoenix
> >> deployments. We frequently perform group-by aggregations on these data
> >> along a specific dimension that would benefit from predictably
> partitioning
> >> the data along that dimension. Proposal:We propose to add table
> metadata to
> >> allow schema designers to constrain salting to a subset of the rowkey,
> >> rather than the full rowkey as it is today. This will introduce a
> mechanism
> >> to partition data on a per-table basis along a single dimension without
> >> application changes or much change to the phoenix runtime logic. We
> expect
> >> this will result in substantially faster group-by’s along the salted
> >> dimension and negligible penalties elsewhere. This feature has also been
> >> proposed in PHOENIX-4757
> >> <https://issues.apache.org/jira/browse/PHOENIX-4757> where it was
> pointed
> >> out that partitioning and sorting data along different dimensions is a
> >> common pattern in other datastores as well.Theoretically, it could cause
> >> hotspotting when querying along the salted dimension without the leading
> >> rowkey - that would be an anti-pattern.Usage
> ExampleCurrent:Schema:CREATE
> >> TABLE relationship (id_1 BIGINT NOT NULL,id_2 BIGINT NOT NULL,other_key
> >> BIGINT NOT NULL,val SMALLINT,CONSTRAINT pk PRIMARY KEY (id_1, id_2,
> >> other_key))SALT_BUCKETS=60;Query:Select id_2, sum(val)From
> >> relationshipWhere id_1 in (2,3

[jira] [Created] (PHOENIX-4948) Add test case for merging nullified array elements

2018-10-03 Thread Gerald Sangudi (JIRA)

Gerald Sangudi created PHOENIX-4948:
---

 Summary: Add test case for merging nullified array elements
 Key: PHOENIX-4948
 URL: https://issues.apache.org/jira/browse/PHOENIX-4948
 Project: Phoenix
  Issue Type: Test
Reporter: Gerald Sangudi
Assignee: Gerald Sangudi


See PHOENIX-4917. That ticket fixes a ClassCastException when projecting array 
elements in hash join. This ticket is to add a test case that exercises the 
merge path in that code, specifically 
HashJoinRegionScanner.mergeProjectedValue().

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (PHOENIX-4948) Add test case for merging nullified array elements

2018-10-03 Thread Gerald Sangudi (JIRA)



 [ 
https://issues.apache.org/jira/browse/PHOENIX-4948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gerald Sangudi updated PHOENIX-4948:

Description: 
See PHOENIX-4917. That ticket fixes a ClassCastException when projecting array 
elements in hash join. This ticket is to add a test case that exercises the 
merge path in that code, specifically 
HashJoinRegionScanner.mergeProjectedValue().

 Cc [~twdsi...@gmail.com]

  was:
See PHOENIX-4917. That ticket fixes a ClassCastException when projecting array 
elements in hash join. This ticket is to add a test case that exercises the 
merge path in that code, specifically 
HashJoinRegionScanner.mergeProjectedValue().

 


> Add test case for merging nullified array elements
> --
>
> Key: PHOENIX-4948
> URL: https://issues.apache.org/jira/browse/PHOENIX-4948
> Project: Phoenix
>  Issue Type: Test
>    Reporter: Gerald Sangudi
>        Assignee: Gerald Sangudi
>Priority: Major
>
> See PHOENIX-4917. That ticket fixes a ClassCastException when projecting 
> array elements in hash join. This ticket is to add a test case that exercises 
> the merge path in that code, specifically 
> HashJoinRegionScanner.mergeProjectedValue().
>  Cc [~twdsi...@gmail.com]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (PHOENIX-4917) ClassCastException when projecting array elements in hash join

2018-09-22 Thread Gerald Sangudi (JIRA)



 [ 
https://issues.apache.org/jira/browse/PHOENIX-4917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gerald Sangudi updated PHOENIX-4917:

Description: 
This bug was introduced in the fix for PHOENIX-4791.

When projecting array elements in hash join, we now generate both 
ProjectedTupleValue and MultiKeyTupleValue. Before the fix for PHOENIX-4791, 
hash join was only generating ProjectedTupleValue, and there were two lines of 
code with class casts that reflected this assumption. The fix is to handle both 
ProjectedTupleValue and MultiKeyTupleValue, while continuing to propagate the 
array cell as in PHOENIX-4791.

 

The stack trace with the ClassCastException:

Caused by: 
org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.DoNotRetryIOException):
 org.apache.hadoop.hbase.DoNotRetryIOException: 
GENO_DOSE,,1537598769044.1a6cb8853b036c59e7515d8e876e28c5.: 
org.apache.phoenix.schema.tuple.MultiKeyValueTuple cannot be cast to 
org.apache.phoenix.execute.TupleProjector$ProjectedValueTuple

at org.apache.phoenix.util.ServerUtil.createIOException(ServerUtil.java:96)

at org.apache.phoenix.util.ServerUtil.throwIOException(ServerUtil.java:62)

at 
org.apache.phoenix.coprocessor.HashJoinRegionScanner.nextRaw(HashJoinRegionScanner.java:300)

at 
org.apache.phoenix.coprocessor.DelegateRegionScanner.nextRaw(DelegateRegionScanner.java:82)

at 
org.apache.phoenix.coprocessor.DelegateRegionScanner.nextRaw(DelegateRegionScanner.java:82)

at 
org.apache.phoenix.coprocessor.BaseScannerRegionObserver$RegionScannerHolder.nextRaw(BaseScannerRegionObserver.java:294)

at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2633)

at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2837)

at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:34950)

at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2339)

at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123)

at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)

at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)

Caused by: java.lang.ClassCastException: 
org.apache.phoenix.schema.tuple.MultiKeyValueTuple cannot be cast to 
org.apache.phoenix.execute.TupleProjector$ProjectedValueTuple

at 
org.apache.phoenix.coprocessor.HashJoinRegionScanner.processResults(HashJoinRegionScanner.java:220)

at 
org.apache.phoenix.coprocessor.HashJoinRegionScanner.nextRaw(HashJoinRegionScanner.java:294)

  was:
This bug was introduced in the fix for PHOENIX-4791.

When projecting array elements in hash join, we now generate both 
ProjectedTupleValue and MultiKeyTupleValue. Previously, we were only generating 
ProjectedTupleValue, and there are two lines of code that contain this 
assumption in class casts. The fix is to merge into the MultiKeyTupleValue, 
while propagating the array cell as in PHOENIX-4791.

 

The stack trace with the ClassCastException:

Caused by: 
org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.DoNotRetryIOException):
 org.apache.hadoop.hbase.DoNotRetryIOException: 
GENO_DOSE,,1537598769044.1a6cb8853b036c59e7515d8e876e28c5.: 
org.apache.phoenix.schema.tuple.MultiKeyValueTuple cannot be cast to 
org.apache.phoenix.execute.TupleProjector$ProjectedValueTuple

at org.apache.phoenix.util.ServerUtil.createIOException(ServerUtil.java:96)

at org.apache.phoenix.util.ServerUtil.throwIOException(ServerUtil.java:62)

at 
org.apache.phoenix.coprocessor.HashJoinRegionScanner.nextRaw(HashJoinRegionScanner.java:300)

at 
org.apache.phoenix.coprocessor.DelegateRegionScanner.nextRaw(DelegateRegionScanner.java:82)

at 
org.apache.phoenix.coprocessor.DelegateRegionScanner.nextRaw(DelegateRegionScanner.java:82)

at 
org.apache.phoenix.coprocessor.BaseScannerRegionObserver$RegionScannerHolder.nextRaw(BaseScannerRegionObserver.java:294)

at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2633)

at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2837)

at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:34950)

at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2339)

at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123)

at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)

at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)

Caused by: java.lang.ClassCastException: 
org.apache.phoenix.schema.tuple.MultiKeyValueTuple cannot be cast to 
org.apache.phoenix.execute.TupleProjector$ProjectedValueTuple

at 
org.apache.phoenix.coprocessor.HashJoinRegionScanner.processResults(HashJoinRegionScanner.java:220)

at 
org.apache.phoenix.coprocessor.HashJoinRegionScanner.nextRaw(HashJoinRegionScanner.java:294

[jira] [Updated] (PHOENIX-4917) ClassCastException when projecting array elements in hash join

2018-09-22 Thread Gerald Sangudi (JIRA)



 [ 
https://issues.apache.org/jira/browse/PHOENIX-4917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gerald Sangudi updated PHOENIX-4917:

Description: 
This bug was introduced in the fix for PHOENIX-4791.

When projecting array elements in hash join, we now generate both 
ProjectedTupleValue and MultiKeyTupleValue. Previously, we were only generating 
ProjectedTupleValue, and there are two lines of code that contain this 
assumption in class casts. The fix is to merge into the MultiKeyTupleValue, 
while propagating the array cell as in PHOENIX-4791.

 

The stack trace with the ClassCastException:

Caused by: 
org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.DoNotRetryIOException):
 org.apache.hadoop.hbase.DoNotRetryIOException: 
GENO_DOSE,,1537598769044.1a6cb8853b036c59e7515d8e876e28c5.: 
org.apache.phoenix.schema.tuple.MultiKeyValueTuple cannot be cast to 
org.apache.phoenix.execute.TupleProjector$ProjectedValueTuple

at org.apache.phoenix.util.ServerUtil.createIOException(ServerUtil.java:96)

at org.apache.phoenix.util.ServerUtil.throwIOException(ServerUtil.java:62)

at 
org.apache.phoenix.coprocessor.HashJoinRegionScanner.nextRaw(HashJoinRegionScanner.java:300)

at 
org.apache.phoenix.coprocessor.DelegateRegionScanner.nextRaw(DelegateRegionScanner.java:82)

at 
org.apache.phoenix.coprocessor.DelegateRegionScanner.nextRaw(DelegateRegionScanner.java:82)

at 
org.apache.phoenix.coprocessor.BaseScannerRegionObserver$RegionScannerHolder.nextRaw(BaseScannerRegionObserver.java:294)

at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2633)

at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2837)

at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:34950)

at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2339)

at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123)

at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)

at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)

Caused by: java.lang.ClassCastException: 
org.apache.phoenix.schema.tuple.MultiKeyValueTuple cannot be cast to 
org.apache.phoenix.execute.TupleProjector$ProjectedValueTuple

at 
org.apache.phoenix.coprocessor.HashJoinRegionScanner.processResults(HashJoinRegionScanner.java:220)

at 
org.apache.phoenix.coprocessor.HashJoinRegionScanner.nextRaw(HashJoinRegionScanner.java:294)

  was:
This bug was introduced in the fix for 
https://issues.apache.org/jira/browse/PHOENIX-4791.

When projecting array elements in hash join, we now generate both 
ProjectedTupleValue and MultiKeyTupleValue. Previously, we were only generating 
ProjectedTupleValue, and there are two lines of code that contain this 
assumption in class casts. The fix is to merge into the MultiKeyTupleValue, 
while propagating the array cell as in PHOENIX-4791.

 

The stack trace with the ClassCastException:

Caused by: 
org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.DoNotRetryIOException):
 org.apache.hadoop.hbase.DoNotRetryIOException: 
GENO_DOSE,,1537598769044.1a6cb8853b036c59e7515d8e876e28c5.: 
org.apache.phoenix.schema.tuple.MultiKeyValueTuple cannot be cast to 
org.apache.phoenix.execute.TupleProjector$ProjectedValueTuple

 at org.apache.phoenix.util.ServerUtil.createIOException(ServerUtil.java:96)

 at org.apache.phoenix.util.ServerUtil.throwIOException(ServerUtil.java:62)

 at 
org.apache.phoenix.coprocessor.HashJoinRegionScanner.nextRaw(HashJoinRegionScanner.java:300)

 at 
org.apache.phoenix.coprocessor.DelegateRegionScanner.nextRaw(DelegateRegionScanner.java:82)

 at 
org.apache.phoenix.coprocessor.DelegateRegionScanner.nextRaw(DelegateRegionScanner.java:82)

 at 
org.apache.phoenix.coprocessor.BaseScannerRegionObserver$RegionScannerHolder.nextRaw(BaseScannerRegionObserver.java:294)

 at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2633)

 at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2837)

 at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:34950)

 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2339)

 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123)

 at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)

 at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)

Caused by: java.lang.ClassCastException: 
org.apache.phoenix.schema.tuple.MultiKeyValueTuple cannot be cast to 
org.apache.phoenix.execute.TupleProjector$ProjectedValueTuple

 at 
org.apache.phoenix.coprocessor.HashJoinRegionScanner.processResults(HashJoinRegionScanner.java:220)

 at 
org.apache.phoenix.coprocessor.HashJoinRegionScanner.nextRaw(HashJoinRegionScanner.java:294)


> ClassCastExcept

[jira] [Created] (PHOENIX-4917) ClassCastException when projecting array elements in hash join

2018-09-22 Thread Gerald Sangudi (JIRA)

Gerald Sangudi created PHOENIX-4917:
---

 Summary: ClassCastException when projecting array elements in hash 
join
 Key: PHOENIX-4917
 URL: https://issues.apache.org/jira/browse/PHOENIX-4917
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 4.15.0, 4.14.1
Reporter: Gerald Sangudi
Assignee: Gerald Sangudi
 Fix For: 4.15.0, 4.14.1


This bug was introduced in the fix for 
https://issues.apache.org/jira/browse/PHOENIX-4791.

When projecting array elements in hash join, we now generate both 
ProjectedTupleValue and MultiKeyTupleValue. Previously, we were only generating 
ProjectedTupleValue, and there are two lines of code that contain this 
assumption in class casts. The fix is to merge into the MultiKeyTupleValue, 
while propagating the array cell as in PHOENIX-4791.

 

The stack trace with the ClassCastException:

Caused by: 
org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.DoNotRetryIOException):
 org.apache.hadoop.hbase.DoNotRetryIOException: 
GENO_DOSE,,1537598769044.1a6cb8853b036c59e7515d8e876e28c5.: 
org.apache.phoenix.schema.tuple.MultiKeyValueTuple cannot be cast to 
org.apache.phoenix.execute.TupleProjector$ProjectedValueTuple

 at org.apache.phoenix.util.ServerUtil.createIOException(ServerUtil.java:96)

 at org.apache.phoenix.util.ServerUtil.throwIOException(ServerUtil.java:62)

 at 
org.apache.phoenix.coprocessor.HashJoinRegionScanner.nextRaw(HashJoinRegionScanner.java:300)

 at 
org.apache.phoenix.coprocessor.DelegateRegionScanner.nextRaw(DelegateRegionScanner.java:82)

 at 
org.apache.phoenix.coprocessor.DelegateRegionScanner.nextRaw(DelegateRegionScanner.java:82)

 at 
org.apache.phoenix.coprocessor.BaseScannerRegionObserver$RegionScannerHolder.nextRaw(BaseScannerRegionObserver.java:294)

 at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2633)

 at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2837)

 at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:34950)

 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2339)

 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123)

 at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)

 at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)

Caused by: java.lang.ClassCastException: 
org.apache.phoenix.schema.tuple.MultiKeyValueTuple cannot be cast to 
org.apache.phoenix.execute.TupleProjector$ProjectedValueTuple

 at 
org.apache.phoenix.coprocessor.HashJoinRegionScanner.processResults(HashJoinRegionScanner.java:220)

 at 
org.apache.phoenix.coprocessor.HashJoinRegionScanner.nextRaw(HashJoinRegionScanner.java:294)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Salting based on partial rowkeys

2018-09-13 Thread Gerald Sangudi

In case the text formatting is lost below, I also added it as a comment in
the JIRA ticket:

https://issues.apache.org/jira/browse/PHOENIX-4757


On Thu, Sep 13, 2018 at 3:24 PM, Gerald Sangudi 
wrote:

> Sorry I missed Josh's reply; I've subscribed to the dev list now.
>
> Below is a copy-and-paste from our internal document. Thanks in advance
> for your review and additional feedback on this.
>
> Gerald
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *BackgroundWe make extensive use of multi-column rowkeys and salting
> <https://phoenix.apache.org/salted.html> in our different apache phoenix
> deployments. We frequently perform group-by aggregations on these data
> along a specific dimension that would benefit from predictably partitioning
> the data along that dimension. Proposal:We propose to add table metadata to
> allow schema designers to constrain salting to a subset of the rowkey,
> rather than the full rowkey as it is today. This will introduce a mechanism
> to partition data on a per-table basis along a single dimension without
> application changes or much change to the phoenix runtime logic. We expect
> this will result in substantially faster group-by’s along the salted
> dimension and negligible penalties elsewhere. This feature has also been
> proposed in PHOENIX-4757
> <https://issues.apache.org/jira/browse/PHOENIX-4757> where it was pointed
> out that partitioning and sorting data along different dimensions is a
> common pattern in other datastores as well.Theoretically, it could cause
> hotspotting when querying along the salted dimension without the leading
> rowkey - that would be an anti-pattern.Usage ExampleCurrent:Schema:CREATE
> TABLE relationship (id_1 BIGINT NOT NULL,id_2 BIGINT NOT NULL,other_key
> BIGINT NOT NULL,val SMALLINT,CONSTRAINT pk PRIMARY KEY (id_1, id_2,
> other_key))SALT_BUCKETS=60;Query:Select id_2, sum(val)From
> relationshipWhere id_1 in (2,3)Group by id_2Explain:0: jdbc:phoenix:>
> EXPLAIN Select id_2, sum(val) From relationship Where id_1 in (2,3) Group
> by id_2
> ;+-++|
>  PLAN| EST_BY
> |+-++|
> CLIENT 60-CHUNK PARALLEL 60-WAY SKIP SCAN ON 120 KEYS OVER RELATIONSHIP
> [0,2] - [59,3]  | null || SERVER AGGREGATE INTO DISTINCT ROWS BY [ID_2]
>   | null || CLIENT MERGE SORT
>   |
> null
> |+-++3
> rows selected (0.048 seconds)In this case, although the group by is
> performed on both the client and regionserver, almost all of the actual
> grouping happens on the client because the id_2’s are randomly distributed
> across the regionservers. As a result, a lot of unnecessary data is
> serialized to the client and grouped serially there. This can become quite
> material with large resultsets.Proposed:Schema:CREATE TABLE relationship
> (id_1 BIGINT NOT NULL,id_2 BIGINT NOT NULL,other_key BIGINT NOT NULL,val
> SMALLINT,CONSTRAINT pk PRIMARY KEY (id_1, id_2,
> other_key),SALT_BUCKETS=60,SALT_COLUMN = id_2);Query (unchanged):Select
> id_2, sum(val)From relationshipWhere id_1 in (2,3)Group by id_2Explain
> (unchanged)Under the proposal, the data are merely partitioned so that all
> rows containing the same id_2 are on the same regionserver, the above query
> will perform almost all of the grouping in parallel on the regionservers.
> No special hint or changes to the query plan would be required to benefit.
> Tables would need to be re-salted to take advantage of the new
> functionality.Technical changes proposed to phoenix: - Create a new piece
> of table-level metadata: SALT_COLUMN. SALT_COLUMN will instruct the salting
> logic to generate a salt-byte based only on the specified column. If
> unspecified, it will behave as it does today and default to salting the
> entire rowkey. This metadata may be specified only when the table is
> created and may not be modified. The specified column must be part of the
> rowkey.  - Modify all callers of getSaltingByte
> <https://github.com/apache/phoenix/blob/master/phoenix-core/src/main/java/org/apache/phoenix/schema/SaltingUtil.java#L77>(byte[]
> value, int offset, int length, int bucketNum) to consistently leverage the
> new metadata.- Tests- DocsDesign points:One salt column vs multiple salt
> columns: Based on the existing signature for getSaltingByte, it seems
> simpler to only support a single SALT_COLUMN rather than multiple arbi

Re: Salting based on partial rowkeys

2018-09-13 Thread Gerald Sangudi

 unnecessary
in cases where the SALT_COLUMN is part of the group-by expression.- A query
that contains a literal equality predicate for the SALT_COLUMN can be
isolated to a single regionserver, rather than broadcast to all
regionservers.- A client-side merge-sort-join based on the SALT_COLUMN
could optimize organization of merges. - Similarly, a server-side hash join
could distribute only ‘necessary’ portions of the hash table to each
regionserver.If additional advantages of these types come for free, then
that’s great but can be follow on enhancements from the initial commit.*

On Thu, Sep 13, 2018 at 9:33 AM, Thomas D'Silva 
wrote:

> Gerald,
>
> I think you missed Josh's reply here : https://lists.apache.org/
> thread.html/c5145461805429622a410c23c1199d578e146a5c94511b2d5833438b@%
> 3Cdev.phoenix.apache.org%3E
>
> Could you explain how using a subset of the pk columns to generate the
> salt byte helps with partitioning, aggregations etc?
>
> Thanks,
> Thomas
>
> On Thu, Sep 13, 2018 at 8:32 AM, Gerald Sangudi 
> wrote:
>
>> Hi folks,
>>
>> Any thoughts or feedback on this?
>>
>> Thanks,
>> Gerald
>>
>> On Mon, Sep 10, 2018 at 1:56 PM, Gerald Sangudi 
>> wrote:
>>
>>> Hello folks,
>>>
>>> We have a requirement for salting based on partial, rather than full,
>>> rowkeys. My colleague Mike Polcari has identified the requirement and
>>> proposed an approach.
>>>
>>> I found an already-open JIRA ticket for the same issue:
>>> https://issues.apache.org/jira/browse/PHOENIX-4757. I can provide more
>>> details from the proposal.
>>>
>>> The JIRA proposes a syntax of SALT_BUCKETS(col, ...) = N, whereas Mike
>>> proposes SALT_COLUMN=col or SALT_COLUMNS=col, ... .
>>>
>>> The benefit at issue is that users gain more control over partitioning,
>>> and this can be used to push some additional aggregations and hash joins
>>> down to region servers.
>>>
>>> I would appreciate any go-ahead / thoughts / guidance / objections /
>>> feedback. I'd like to be sure that the concept at least is not
>>> objectionable. We would like to work on this and submit a patch down the
>>> road. I'll also add a note to the JIRA ticket.
>>>
>>> Thanks,
>>> Gerald
>>>
>>>
>>
>

Re: Salting based on partial rowkeys

2018-09-13 Thread Gerald Sangudi

Hi folks,

Any thoughts or feedback on this?

Thanks,
Gerald

On Mon, Sep 10, 2018 at 1:56 PM, Gerald Sangudi 
wrote:

> Hello folks,
>
> We have a requirement for salting based on partial, rather than full,
> rowkeys. My colleague Mike Polcari has identified the requirement and
> proposed an approach.
>
> I found an already-open JIRA ticket for the same issue:
> https://issues.apache.org/jira/browse/PHOENIX-4757. I can provide more
> details from the proposal.
>
> The JIRA proposes a syntax of SALT_BUCKETS(col, ...) = N, whereas Mike
> proposes SALT_COLUMN=col or SALT_COLUMNS=col, ... .
>
> The benefit at issue is that users gain more control over partitioning,
> and this can be used to push some additional aggregations and hash joins
> down to region servers.
>
> I would appreciate any go-ahead / thoughts / guidance / objections /
> feedback. I'd like to be sure that the concept at least is not
> objectionable. We would like to work on this and submit a patch down the
> road. I'll also add a note to the JIRA ticket.
>
> Thanks,
> Gerald
>
>

Salting based on partial rowkeys

2018-09-10 Thread Gerald Sangudi

Hello folks,

We have a requirement for salting based on partial, rather than full,
rowkeys. My colleague Mike Polcari has identified the requirement and
proposed an approach.

I found an already-open JIRA ticket for the same issue:
https://issues.apache.org/jira/browse/PHOENIX-4757. I can provide more
details from the proposal.

The JIRA proposes a syntax of SALT_BUCKETS(col, ...) = N, whereas Mike
proposes SALT_COLUMN=col or SALT_COLUMNS=col, ... .

The benefit at issue is that users gain more control over partitioning, and
this can be used to push some additional aggregations and hash joins down
to region servers.

I would appreciate any go-ahead / thoughts / guidance / objections /
feedback. I'd like to be sure that the concept at least is not
objectionable. We would like to work on this and submit a patch down the
road. I'll also add a note to the JIRA ticket.

Thanks,
Gerald

[jira] [Updated] (PHOENIX-4791) Array elements are nullified with joins

2018-09-04 Thread Gerald Sangudi (JIRA)



 [ 
https://issues.apache.org/jira/browse/PHOENIX-4791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gerald Sangudi updated PHOENIX-4791:

Attachment: (was: PHOENIX-4791.patch)

> Array elements are nullified with joins
> ---
>
> Key: PHOENIX-4791
> URL: https://issues.apache.org/jira/browse/PHOENIX-4791
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.11.0, 4.12.0, 4.13.0, 4.14.0, 5.0.0
>Reporter: Tulasi P
>    Assignee: Gerald Sangudi
>Priority: Major
> Fix For: 4.15.0, 5.1.0
>
>
> Returning elements of an array from a table that is part of a join causes 
> array elements to be nullified.
> {noformat}
> create table array_test_1 (id integer not null primary key, arr tinyint[5]); 
> upsert into array_test_1 values (1001, array[0, 0, 0, 0, 0]); 
> upsert into array_test_1 values (1002, array[0, 0, 0, 0, 1]); 
> upsert into array_test_1 values (1003, array[0, 0, 0, 1, 1]); 
> upsert into array_test_1 values (1004, array[0, 0, 1, 1, 1]); 
> upsert into array_test_1 values (1005, array[1, 1, 1, 1, 1]);
> {noformat}
>  
> {noformat}
> create table test_table_1 (id integer not null primary key, val varchar); 
> upsert into test_table_1 values (1001, 'abc'); 
> upsert into test_table_1 values (1002, 'def'); 
> upsert into test_table_1 values (1003, 'ghi');{noformat}
> {noformat}
> 0: jdbc:phoenix:localhost> select t1.id, t2.val, t1.arr[1], t1.arr[2], 
> t1.arr[3] from array_test_1 as t1 join test_table_1 as t2 on t1.id = t2.id; 
> ++-+-++---+ 
> | T1.ID  | T2.VAL  | ARRAY_ELEM(T1.ARR, 1)  | ARRAY_ELEM(T1.ARR, 2)  | 
> ARRAY_ELEM(T1.ARR, 3)  | 
> ++-+-+-++ 
> | 1001   | abc     | null  | null | null | 
> | 1002   | def     | null   | null | null |
>  | 1003   | ghi     | null  | null | null |
>  ++-+++-+ 
> 3 rows selected (0.056 seconds)
> {noformat}
> However, directly selecting array elements from the array returns data 
> correctly.
> {noformat}
> 0: jdbc:phoenix:localhost> select [t1.id, t1.arr[1], t1.arr[2], t1.arr[3] 
> from array_test_1 as t1;
> +---++-+-+
> |  ID   | ARRAY_ELEM(ARR, 1)  | ARRAY_ELEM(ARR, 2)  | ARRAY_ELEM(ARR, 3)  |
> +---++-+-+
> | 1001  | 0 | 0 | 0 |
> | 1002  | 0 | 0 | 0 |
> | 1003  | 0 | 0 | 0 |
> | 1004  | 0 | 0 | 1 |
> | 1005  | 1 | 1 | 1 |
> +---+---+--+--+
> 5 rows selected (0.044 seconds)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (PHOENIX-4791) Array elements are nullified with joins

2018-09-04 Thread Gerald Sangudi (JIRA)



 [ 
https://issues.apache.org/jira/browse/PHOENIX-4791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gerald Sangudi updated PHOENIX-4791:

Attachment: (was: PHOENIX-4791.patch)

> Array elements are nullified with joins
> ---
>
> Key: PHOENIX-4791
> URL: https://issues.apache.org/jira/browse/PHOENIX-4791
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.11.0, 4.12.0, 4.13.0, 4.14.0, 5.0.0
>Reporter: Tulasi P
>    Assignee: Gerald Sangudi
>Priority: Major
> Fix For: 4.15.0, 5.1.0
>
> Attachments: PHOENIX-4791.patch
>
>
> Returning elements of an array from a table that is part of a join causes 
> array elements to be nullified.
> {noformat}
> create table array_test_1 (id integer not null primary key, arr tinyint[5]); 
> upsert into array_test_1 values (1001, array[0, 0, 0, 0, 0]); 
> upsert into array_test_1 values (1002, array[0, 0, 0, 0, 1]); 
> upsert into array_test_1 values (1003, array[0, 0, 0, 1, 1]); 
> upsert into array_test_1 values (1004, array[0, 0, 1, 1, 1]); 
> upsert into array_test_1 values (1005, array[1, 1, 1, 1, 1]);
> {noformat}
>  
> {noformat}
> create table test_table_1 (id integer not null primary key, val varchar); 
> upsert into test_table_1 values (1001, 'abc'); 
> upsert into test_table_1 values (1002, 'def'); 
> upsert into test_table_1 values (1003, 'ghi');{noformat}
> {noformat}
> 0: jdbc:phoenix:localhost> select t1.id, t2.val, t1.arr[1], t1.arr[2], 
> t1.arr[3] from array_test_1 as t1 join test_table_1 as t2 on t1.id = t2.id; 
> ++-+-++---+ 
> | T1.ID  | T2.VAL  | ARRAY_ELEM(T1.ARR, 1)  | ARRAY_ELEM(T1.ARR, 2)  | 
> ARRAY_ELEM(T1.ARR, 3)  | 
> ++-+-+-++ 
> | 1001   | abc     | null  | null | null | 
> | 1002   | def     | null   | null | null |
>  | 1003   | ghi     | null  | null | null |
>  ++-+++-+ 
> 3 rows selected (0.056 seconds)
> {noformat}
> However, directly selecting array elements from the array returns data 
> correctly.
> {noformat}
> 0: jdbc:phoenix:localhost> select [t1.id, t1.arr[1], t1.arr[2], t1.arr[3] 
> from array_test_1 as t1;
> +---++-+-+
> |  ID   | ARRAY_ELEM(ARR, 1)  | ARRAY_ELEM(ARR, 2)  | ARRAY_ELEM(ARR, 3)  |
> +---++-+-+
> | 1001  | 0 | 0 | 0 |
> | 1002  | 0 | 0 | 0 |
> | 1003  | 0 | 0 | 0 |
> | 1004  | 0 | 0 | 1 |
> | 1005  | 1 | 1 | 1 |
> +---+---+--+--+
> 5 rows selected (0.044 seconds)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (PHOENIX-4791) Array elements are nullified with joins

2018-08-30 Thread Gerald Sangudi (JIRA)



 [ 
https://issues.apache.org/jira/browse/PHOENIX-4791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gerald Sangudi updated PHOENIX-4791:

Attachment: (was: PHOENIX-4791.patch)

> Array elements are nullified with joins
> ---
>
> Key: PHOENIX-4791
> URL: https://issues.apache.org/jira/browse/PHOENIX-4791
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.11.0, 4.12.0, 4.13.0, 4.14.0, 5.0.0
>Reporter: Tulasi P
>Priority: Major
> Fix For: 4.15.0, 5.1.0
>
> Attachments: PHOENIX-4791.patch
>
>
> Returning elements of an array from a table that is part of a join causes 
> array elements to be nullified.
> {noformat}
> create table array_test_1 (id integer not null primary key, arr tinyint[5]); 
> upsert into array_test_1 values (1001, array[0, 0, 0, 0, 0]); 
> upsert into array_test_1 values (1002, array[0, 0, 0, 0, 1]); 
> upsert into array_test_1 values (1003, array[0, 0, 0, 1, 1]); 
> upsert into array_test_1 values (1004, array[0, 0, 1, 1, 1]); 
> upsert into array_test_1 values (1005, array[1, 1, 1, 1, 1]);
> {noformat}
>  
> {noformat}
> create table test_table_1 (id integer not null primary key, val varchar); 
> upsert into test_table_1 values (1001, 'abc'); 
> upsert into test_table_1 values (1002, 'def'); 
> upsert into test_table_1 values (1003, 'ghi');{noformat}
> {noformat}
> 0: jdbc:phoenix:localhost> select t1.id, t2.val, t1.arr[1], t1.arr[2], 
> t1.arr[3] from array_test_1 as t1 join test_table_1 as t2 on t1.id = t2.id; 
> ++-+-++---+ 
> | T1.ID  | T2.VAL  | ARRAY_ELEM(T1.ARR, 1)  | ARRAY_ELEM(T1.ARR, 2)  | 
> ARRAY_ELEM(T1.ARR, 3)  | 
> ++-+-+-++ 
> | 1001   | abc     | null  | null | null | 
> | 1002   | def     | null   | null | null |
>  | 1003   | ghi     | null  | null | null |
>  ++-+++-+ 
> 3 rows selected (0.056 seconds)
> {noformat}
> However, directly selecting array elements from the array returns data 
> correctly.
> {noformat}
> 0: jdbc:phoenix:localhost> select [t1.id, t1.arr[1], t1.arr[2], t1.arr[3] 
> from array_test_1 as t1;
> +---++-+-+
> |  ID   | ARRAY_ELEM(ARR, 1)  | ARRAY_ELEM(ARR, 2)  | ARRAY_ELEM(ARR, 3)  |
> +---++-+-+
> | 1001  | 0 | 0 | 0 |
> | 1002  | 0 | 0 | 0 |
> | 1003  | 0 | 0 | 0 |
> | 1004  | 0 | 0 | 1 |
> | 1005  | 1 | 1 | 1 |
> +---+---+--+--+
> 5 rows selected (0.044 seconds)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Null array elements with joins

2018-08-27 Thread Gerald Sangudi

Hi folks,

I submitted PR https://github.com/apache/phoenix/pull/335

Appreciate your review.

Thanks,
Gerald

On Mon, Aug 13, 2018 at 1:10 PM, James Taylor 
wrote:

> I commented on the JIRA you filed here: PHOENIX-4791. Best to keep
> discussion there.
> Thanks,
> James
>
> On Mon, Aug 13, 2018 at 11:08 AM, Gerald Sangudi 
> wrote:
>
>> Hello all,
>>
>> Any suggestions or pointers on the issue below?
>>
>> Projecting array elements works when not using joins, and does not work
>> when we use hash joins. Is there an issue with the ProjectionCompiler for
>> joins? I have not been able to isolate the specific cause, and would
>> appreciate any pointers or suggestions.
>>
>> Thanks,
>> Gerald
>>
>> On Tue, Jun 19, 2018 at 10:02 AM, Tulasi Paradarami <
>> tulasi.krishn...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I'm running few tests against Phoenix array and running into this bug
>>> where array elements return null values when a join is involved. Is this a
>>> known issue/limitation of arrays?
>>>
>>> create table array_test_1 (id integer not null primary key, arr
>>> tinyint[5]);
>>> upsert into array_test_1 values (1001, array[0, 0, 0, 0, 0]);
>>> upsert into array_test_1 values (1002, array[0, 0, 0, 0, 1]);
>>> upsert into array_test_1 values (1003, array[0, 0, 0, 1, 1]);
>>> upsert into array_test_1 values (1004, array[0, 0, 1, 1, 1]);
>>> upsert into array_test_1 values (1005, array[1, 1, 1, 1, 1]);
>>>
>>> create table test_table_1 (id integer not null primary key, val varchar);
>>> upsert into test_table_1 values (1001, 'abc');
>>> upsert into test_table_1 values (1002, 'def');
>>> upsert into test_table_1 values (1003, 'ghi');
>>>
>>> 0: jdbc:phoenix:localhost> select t1.id, t2.val, t1.arr[1], t1.arr[2],
>>> t1.arr[3] from array_test_1 as t1 join test_table_1 as t2 on t1.id =
>>> t2.id;
>>> ++-++---
>>> -++
>>> | T1.ID  | T2.VAL  | ARRAY_ELEM(T1.ARR, 1)  | ARRAY_ELEM(T1.ARR, 2)  |
>>> ARRAY_ELEM(T1.ARR, 3)  |
>>> ++-++---
>>> -++
>>> | 1001   | abc | null   | null   |
>>> null   |
>>> | 1002   | def | null   | null   |
>>> null   |
>>> | 1003   | ghi | null   | null   |
>>> null   |
>>> ++-++---
>>> -++
>>> 3 rows selected (0.056 seconds)
>>>
>>> However, directly selecting array elements from the array returns data
>>> correctly.
>>> 0: jdbc:phoenix:localhost> select t1.id, t1.arr[1], t1.arr[2],
>>> t1.arr[3] from array_test_1 as t1;
>>> +---+-+-+---
>>> --+
>>> |  ID   | ARRAY_ELEM(ARR, 1)  | ARRAY_ELEM(ARR, 2)  | ARRAY_ELEM(ARR,
>>> 3)  |
>>> +---+-+-+---
>>> --+
>>> | 1001  | 0   | 0   | 0
>>>  |
>>> | 1002  | 0   | 0   | 0
>>>  |
>>> | 1003  | 0   | 0   | 0
>>>  |
>>> | 1004  | 0   | 0   | 1
>>>  |
>>> | 1005  | 1   | 1   | 1
>>>  |
>>> +---+-+-+---
>>> --+
>>> 5 rows selected (0.044 seconds)
>>>
>>>
>>>
>>
>

Re: Null array elements with joins

2018-08-13 Thread Gerald Sangudi

Hello all,

Any suggestions or pointers on the issue below?

Projecting array elements works when not using joins, and does not work
when we use hash joins. Is there an issue with the ProjectionCompiler for
joins? I have not been able to isolate the specific cause, and would
appreciate any pointers or suggestions.

Thanks,
Gerald

On Tue, Jun 19, 2018 at 10:02 AM, Tulasi Paradarami <
tulasi.krishn...@gmail.com> wrote:

> Hi,
>
> I'm running few tests against Phoenix array and running into this bug
> where array elements return null values when a join is involved. Is this a
> known issue/limitation of arrays?
>
> create table array_test_1 (id integer not null primary key, arr
> tinyint[5]);
> upsert into array_test_1 values (1001, array[0, 0, 0, 0, 0]);
> upsert into array_test_1 values (1002, array[0, 0, 0, 0, 1]);
> upsert into array_test_1 values (1003, array[0, 0, 0, 1, 1]);
> upsert into array_test_1 values (1004, array[0, 0, 1, 1, 1]);
> upsert into array_test_1 values (1005, array[1, 1, 1, 1, 1]);
>
> create table test_table_1 (id integer not null primary key, val varchar);
> upsert into test_table_1 values (1001, 'abc');
> upsert into test_table_1 values (1002, 'def');
> upsert into test_table_1 values (1003, 'ghi');
>
> 0: jdbc:phoenix:localhost> select t1.id, t2.val, t1.arr[1], t1.arr[2],
> t1.arr[3] from array_test_1 as t1 join test_table_1 as t2 on t1.id = t2.id
> ;
> ++-++---
> -++
> | T1.ID  | T2.VAL  | ARRAY_ELEM(T1.ARR, 1)  | ARRAY_ELEM(T1.ARR, 2)  |
> ARRAY_ELEM(T1.ARR, 3)  |
> ++-++---
> -++
> | 1001   | abc | null   | null   |
> null   |
> | 1002   | def | null   | null   |
> null   |
> | 1003   | ghi | null   | null   |
> null   |
> ++-++---
> -++
> 3 rows selected (0.056 seconds)
>
> However, directly selecting array elements from the array returns data
> correctly.
> 0: jdbc:phoenix:localhost> select t1.id, t1.arr[1], t1.arr[2], t1.arr[3]
> from array_test_1 as t1;
> +---+-+-+---
> --+
> |  ID   | ARRAY_ELEM(ARR, 1)  | ARRAY_ELEM(ARR, 2)  | ARRAY_ELEM(ARR, 3)  |
> +---+-+-+---
> --+
> | 1001  | 0   | 0   | 0   |
> | 1002  | 0   | 0   | 0   |
> | 1003  | 0   | 0   | 0   |
> | 1004  | 0   | 0   | 1   |
> | 1005  | 1   | 1   | 1   |
> +---+-+-+---
> --+
> 5 rows selected (0.044 seconds)
>
>
>

[jira] [Updated] (PHOENIX-4751) Support client-side hash aggregation with SORT_MERGE_JOIN

2018-08-01 Thread Gerald Sangudi (JIRA)



 [ 
https://issues.apache.org/jira/browse/PHOENIX-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gerald Sangudi updated PHOENIX-4751:

Attachment: 0001-PHOENIX-4751-Implement-client-side-hash-aggre.master.patch

> Support client-side hash aggregation with SORT_MERGE_JOIN
> -
>
> Key: PHOENIX-4751
> URL: https://issues.apache.org/jira/browse/PHOENIX-4751
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.14.0, 4.13.1
>        Reporter: Gerald Sangudi
>    Assignee: Gerald Sangudi
>Priority: Major
> Attachments: 
> 0001-PHOENIX-4751-Add-HASH_AGGREGATE-hint.4.x-HBase-1.4.patch, 
> 0001-PHOENIX-4751-Implement-client-side-has.4.x-HBase-1.4.patch, 
> 0001-PHOENIX-4751-Implement-client-side-hash-aggre.master.patch, 
> 0002-PHOENIX-4751-Begin-implementation-of-c.4.x-HBase-1.4.patch, 
> 0003-PHOENIX-4751-Generated-aggregated-resu.4.x-HBase-1.4.patch, 
> 0004-PHOENIX-4751-Sort-results-of-client-ha.4.x-HBase-1.4.patch, 
> 0005-PHOENIX-4751-Add-integration-test-for-.4.x-HBase-1.4.patch, 
> 0006-PHOENIX-4751-Fix-and-run-integration-t.4.x-HBase-1.4.patch, 
> 0007-PHOENIX-4751-Add-integration-test-for-.4.x-HBase-1.4.patch, 
> 0008-PHOENIX-4751-Verify-EXPLAIN-plan-for-b.4.x-HBase-1.4.patch, 
> 0009-PHOENIX-4751-Standardize-null-checks-a.4.x-HBase-1.4.patch, 
> 0010-PHOENIX-4751-Abort-when-client-aggrega.4.x-HBase-1.4.patch, 
> 0011-PHOENIX-4751-Use-Phoenix-memory-mgmt-t.4.x-HBase-1.4.patch, 
> 0012-PHOENIX-4751-Remove-extra-memory-limit.4.x-HBase-1.4.patch, 
> 0013-PHOENIX-4751-Sort-only-when-necessary.4.x-HBase-1.4.patch, 
> 0014-PHOENIX-4751-Sort-only-when-necessary-.4.x-HBase-1.4.patch, 
> 0015-PHOENIX-4751-Show-client-hash-aggregat.4.x-HBase-1.4.patch, 
> 0016-PHOENIX-4751-Handle-reverse-sort-add-c.4.x-HBase-1.4.patch
>
>
> A GROUP BY that follows a SORT_MERGE_JOIN should be able to use hash 
> aggregation in some cases, for improved performance.
> When a GROUP BY follows a SORT_MERGE_JOIN, the GROUP BY does not use hash 
> aggregation. It instead performs a CLIENT SORT followed by a CLIENT 
> AGGREGATE. The performance can be improved if (a) the GROUP BY output does 
> not need to be sorted, and (b) the GROUP BY input is large enough and has low 
> cardinality.
> The hash aggregation can initially be a hint. Here is an example from Phoenix 
> 4.13.1 that would benefit from hash aggregation if the GROUP BY input is 
> large with low cardinality.
> CREATE TABLE unsalted (
>  keyA BIGINT NOT NULL,
>  keyB BIGINT NOT NULL,
>  val SMALLINT,
>  CONSTRAINT pk PRIMARY KEY (keyA, keyB)
>  );
> EXPLAIN
>  SELECT /*+ USE_SORT_MERGE_JOIN */ 
>  t1.val v1, t2.val v2, COUNT(\*) c 
>  FROM unsalted t1 JOIN unsalted t2 
>  ON (t1.keyA = t2.keyA) 
>  GROUP BY t1.val, t2.val;
>  
> +-+++--+
> |PLAN|EST_BYTES_READ|EST_ROWS_READ| |
> +-+++--+
> |SORT-MERGE-JOIN (INNER) TABLES|null|null| |
> |    CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED|null|null| |
> |AND|null|null| |
> |    CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED|null|null| |
> |CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]|null|null| |
> |CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]|null|null| |
> +-+++--+



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (PHOENIX-4751) Support client-side hash aggregation with SORT_MERGE_JOIN

2018-07-31 Thread Gerald Sangudi (JIRA)



 [ 
https://issues.apache.org/jira/browse/PHOENIX-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gerald Sangudi updated PHOENIX-4751:

Attachment: 0001-PHOENIX-4751-Implement-client-side-has.4.x-HBase-1.4.patch

> Support client-side hash aggregation with SORT_MERGE_JOIN
> -
>
> Key: PHOENIX-4751
> URL: https://issues.apache.org/jira/browse/PHOENIX-4751
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.14.0, 4.13.1
>        Reporter: Gerald Sangudi
>    Assignee: Gerald Sangudi
>Priority: Major
> Attachments: 
> 0001-PHOENIX-4751-Add-HASH_AGGREGATE-hint.4.x-HBase-1.4.patch, 
> 0001-PHOENIX-4751-Implement-client-side-has.4.x-HBase-1.4.patch, 
> 0002-PHOENIX-4751-Begin-implementation-of-c.4.x-HBase-1.4.patch, 
> 0003-PHOENIX-4751-Generated-aggregated-resu.4.x-HBase-1.4.patch, 
> 0004-PHOENIX-4751-Sort-results-of-client-ha.4.x-HBase-1.4.patch, 
> 0005-PHOENIX-4751-Add-integration-test-for-.4.x-HBase-1.4.patch, 
> 0006-PHOENIX-4751-Fix-and-run-integration-t.4.x-HBase-1.4.patch, 
> 0007-PHOENIX-4751-Add-integration-test-for-.4.x-HBase-1.4.patch, 
> 0008-PHOENIX-4751-Verify-EXPLAIN-plan-for-b.4.x-HBase-1.4.patch, 
> 0009-PHOENIX-4751-Standardize-null-checks-a.4.x-HBase-1.4.patch, 
> 0010-PHOENIX-4751-Abort-when-client-aggrega.4.x-HBase-1.4.patch, 
> 0011-PHOENIX-4751-Use-Phoenix-memory-mgmt-t.4.x-HBase-1.4.patch, 
> 0012-PHOENIX-4751-Remove-extra-memory-limit.4.x-HBase-1.4.patch, 
> 0013-PHOENIX-4751-Sort-only-when-necessary.4.x-HBase-1.4.patch, 
> 0014-PHOENIX-4751-Sort-only-when-necessary-.4.x-HBase-1.4.patch, 
> 0015-PHOENIX-4751-Show-client-hash-aggregat.4.x-HBase-1.4.patch, 
> 0016-PHOENIX-4751-Handle-reverse-sort-add-c.4.x-HBase-1.4.patch
>
>
> A GROUP BY that follows a SORT_MERGE_JOIN should be able to use hash 
> aggregation in some cases, for improved performance.
> When a GROUP BY follows a SORT_MERGE_JOIN, the GROUP BY does not use hash 
> aggregation. It instead performs a CLIENT SORT followed by a CLIENT 
> AGGREGATE. The performance can be improved if (a) the GROUP BY output does 
> not need to be sorted, and (b) the GROUP BY input is large enough and has low 
> cardinality.
> The hash aggregation can initially be a hint. Here is an example from Phoenix 
> 4.13.1 that would benefit from hash aggregation if the GROUP BY input is 
> large with low cardinality.
> CREATE TABLE unsalted (
>  keyA BIGINT NOT NULL,
>  keyB BIGINT NOT NULL,
>  val SMALLINT,
>  CONSTRAINT pk PRIMARY KEY (keyA, keyB)
>  );
> EXPLAIN
>  SELECT /*+ USE_SORT_MERGE_JOIN */ 
>  t1.val v1, t2.val v2, COUNT(\*) c 
>  FROM unsalted t1 JOIN unsalted t2 
>  ON (t1.keyA = t2.keyA) 
>  GROUP BY t1.val, t2.val;
>  
> +-+++--+
> |PLAN|EST_BYTES_READ|EST_ROWS_READ| |
> +-+++--+
> |SORT-MERGE-JOIN (INNER) TABLES|null|null| |
> |    CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED|null|null| |
> |AND|null|null| |
> |    CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED|null|null| |
> |CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]|null|null| |
> |CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]|null|null| |
> +-+++--+



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (PHOENIX-4751) Support client-side hash aggregation with SORT_MERGE_JOIN

2018-07-30 Thread Gerald Sangudi (JIRA)



 [ 
https://issues.apache.org/jira/browse/PHOENIX-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gerald Sangudi updated PHOENIX-4751:

Attachment: 0016-PHOENIX-4751-Handle-reverse-sort-add-c.4.x-HBase-1.4.patch

> Support client-side hash aggregation with SORT_MERGE_JOIN
> -
>
> Key: PHOENIX-4751
> URL: https://issues.apache.org/jira/browse/PHOENIX-4751
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.14.0, 4.13.1
>        Reporter: Gerald Sangudi
>    Assignee: Gerald Sangudi
>Priority: Major
> Attachments: 
> 0001-PHOENIX-4751-Add-HASH_AGGREGATE-hint.4.x-HBase-1.4.patch, 
> 0002-PHOENIX-4751-Begin-implementation-of-c.4.x-HBase-1.4.patch, 
> 0003-PHOENIX-4751-Generated-aggregated-resu.4.x-HBase-1.4.patch, 
> 0004-PHOENIX-4751-Sort-results-of-client-ha.4.x-HBase-1.4.patch, 
> 0005-PHOENIX-4751-Add-integration-test-for-.4.x-HBase-1.4.patch, 
> 0006-PHOENIX-4751-Fix-and-run-integration-t.4.x-HBase-1.4.patch, 
> 0007-PHOENIX-4751-Add-integration-test-for-.4.x-HBase-1.4.patch, 
> 0008-PHOENIX-4751-Verify-EXPLAIN-plan-for-b.4.x-HBase-1.4.patch, 
> 0009-PHOENIX-4751-Standardize-null-checks-a.4.x-HBase-1.4.patch, 
> 0010-PHOENIX-4751-Abort-when-client-aggrega.4.x-HBase-1.4.patch, 
> 0011-PHOENIX-4751-Use-Phoenix-memory-mgmt-t.4.x-HBase-1.4.patch, 
> 0012-PHOENIX-4751-Remove-extra-memory-limit.4.x-HBase-1.4.patch, 
> 0013-PHOENIX-4751-Sort-only-when-necessary.4.x-HBase-1.4.patch, 
> 0014-PHOENIX-4751-Sort-only-when-necessary-.4.x-HBase-1.4.patch, 
> 0015-PHOENIX-4751-Show-client-hash-aggregat.4.x-HBase-1.4.patch, 
> 0016-PHOENIX-4751-Handle-reverse-sort-add-c.4.x-HBase-1.4.patch
>
>
> A GROUP BY that follows a SORT_MERGE_JOIN should be able to use hash 
> aggregation in some cases, for improved performance.
> When a GROUP BY follows a SORT_MERGE_JOIN, the GROUP BY does not use hash 
> aggregation. It instead performs a CLIENT SORT followed by a CLIENT 
> AGGREGATE. The performance can be improved if (a) the GROUP BY output does 
> not need to be sorted, and (b) the GROUP BY input is large enough and has low 
> cardinality.
> The hash aggregation can initially be a hint. Here is an example from Phoenix 
> 4.13.1 that would benefit from hash aggregation if the GROUP BY input is 
> large with low cardinality.
> CREATE TABLE unsalted (
>  keyA BIGINT NOT NULL,
>  keyB BIGINT NOT NULL,
>  val SMALLINT,
>  CONSTRAINT pk PRIMARY KEY (keyA, keyB)
>  );
> EXPLAIN
>  SELECT /*+ USE_SORT_MERGE_JOIN */ 
>  t1.val v1, t2.val v2, COUNT(\*) c 
>  FROM unsalted t1 JOIN unsalted t2 
>  ON (t1.keyA = t2.keyA) 
>  GROUP BY t1.val, t2.val;
>  
> +-+++--+
> |PLAN|EST_BYTES_READ|EST_ROWS_READ| |
> +-+++--+
> |SORT-MERGE-JOIN (INNER) TABLES|null|null| |
> |    CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED|null|null| |
> |AND|null|null| |
> |    CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED|null|null| |
> |CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]|null|null| |
> |CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]|null|null| |
> +-+++--+



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (PHOENIX-4751) Support client-side hash aggregation with SORT_MERGE_JOIN

2018-07-30 Thread Gerald Sangudi (JIRA)



 [ 
https://issues.apache.org/jira/browse/PHOENIX-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gerald Sangudi updated PHOENIX-4751:

Attachment: 0015-PHOENIX-4751-Show-client-hash-aggregat.4.x-HBase-1.4.patch
0014-PHOENIX-4751-Sort-only-when-necessary-.4.x-HBase-1.4.patch
0013-PHOENIX-4751-Sort-only-when-necessary.4.x-HBase-1.4.patch
0012-PHOENIX-4751-Remove-extra-memory-limit.4.x-HBase-1.4.patch
0011-PHOENIX-4751-Use-Phoenix-memory-mgmt-t.4.x-HBase-1.4.patch
0010-PHOENIX-4751-Abort-when-client-aggrega.4.x-HBase-1.4.patch
0009-PHOENIX-4751-Standardize-null-checks-a.4.x-HBase-1.4.patch
0008-PHOENIX-4751-Verify-EXPLAIN-plan-for-b.4.x-HBase-1.4.patch
0007-PHOENIX-4751-Add-integration-test-for-.4.x-HBase-1.4.patch
0006-PHOENIX-4751-Fix-and-run-integration-t.4.x-HBase-1.4.patch
0005-PHOENIX-4751-Add-integration-test-for-.4.x-HBase-1.4.patch
0004-PHOENIX-4751-Sort-results-of-client-ha.4.x-HBase-1.4.patch
0003-PHOENIX-4751-Generated-aggregated-resu.4.x-HBase-1.4.patch
0002-PHOENIX-4751-Begin-implementation-of-c.4.x-HBase-1.4.patch
0001-PHOENIX-4751-Add-HASH_AGGREGATE-hint.4.x-HBase-1.4.patch

> Support client-side hash aggregation with SORT_MERGE_JOIN
> -
>
> Key: PHOENIX-4751
> URL: https://issues.apache.org/jira/browse/PHOENIX-4751
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.14.0, 4.13.1
>        Reporter: Gerald Sangudi
>    Assignee: Gerald Sangudi
>Priority: Major
> Attachments: 
> 0001-PHOENIX-4751-Add-HASH_AGGREGATE-hint.4.x-HBase-1.4.patch, 
> 0002-PHOENIX-4751-Begin-implementation-of-c.4.x-HBase-1.4.patch, 
> 0003-PHOENIX-4751-Generated-aggregated-resu.4.x-HBase-1.4.patch, 
> 0004-PHOENIX-4751-Sort-results-of-client-ha.4.x-HBase-1.4.patch, 
> 0005-PHOENIX-4751-Add-integration-test-for-.4.x-HBase-1.4.patch, 
> 0006-PHOENIX-4751-Fix-and-run-integration-t.4.x-HBase-1.4.patch, 
> 0007-PHOENIX-4751-Add-integration-test-for-.4.x-HBase-1.4.patch, 
> 0008-PHOENIX-4751-Verify-EXPLAIN-plan-for-b.4.x-HBase-1.4.patch, 
> 0009-PHOENIX-4751-Standardize-null-checks-a.4.x-HBase-1.4.patch, 
> 0010-PHOENIX-4751-Abort-when-client-aggrega.4.x-HBase-1.4.patch, 
> 0011-PHOENIX-4751-Use-Phoenix-memory-mgmt-t.4.x-HBase-1.4.patch, 
> 0012-PHOENIX-4751-Remove-extra-memory-limit.4.x-HBase-1.4.patch, 
> 0013-PHOENIX-4751-Sort-only-when-necessary.4.x-HBase-1.4.patch, 
> 0014-PHOENIX-4751-Sort-only-when-necessary-.4.x-HBase-1.4.patch, 
> 0015-PHOENIX-4751-Show-client-hash-aggregat.4.x-HBase-1.4.patch
>
>
> A GROUP BY that follows a SORT_MERGE_JOIN should be able to use hash 
> aggregation in some cases, for improved performance.
> When a GROUP BY follows a SORT_MERGE_JOIN, the GROUP BY does not use hash 
> aggregation. It instead performs a CLIENT SORT followed by a CLIENT 
> AGGREGATE. The performance can be improved if (a) the GROUP BY output does 
> not need to be sorted, and (b) the GROUP BY input is large enough and has low 
> cardinality.
> The hash aggregation can initially be a hint. Here is an example from Phoenix 
> 4.13.1 that would benefit from hash aggregation if the GROUP BY input is 
> large with low cardinality.
> CREATE TABLE unsalted (
>  keyA BIGINT NOT NULL,
>  keyB BIGINT NOT NULL,
>  val SMALLINT,
>  CONSTRAINT pk PRIMARY KEY (keyA, keyB)
>  );
> EXPLAIN
>  SELECT /*+ USE_SORT_MERGE_JOIN */ 
>  t1.val v1, t2.val v2, COUNT(\*) c 
>  FROM unsalted t1 JOIN unsalted t2 
>  ON (t1.keyA = t2.keyA) 
>  GROUP BY t1.val, t2.val;
>  
> +-+++--+
> |PLAN|EST_BYTES_READ|EST_ROWS_READ| |
> +-+++--+
> |SORT-MERGE-JOIN (INNER) TABLES|null|null| |
> |    CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED|null|null| |
> |AND|null|null| |
> |    CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED|null|null| |
> |CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]|null|null| |
> |CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]|null|null| |
> +-+++--+



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (PHOENIX-4751) Support client-side hash aggregation with SORT_MERGE_JOIN

2018-07-30 Thread Gerald Sangudi (JIRA)



 [ 
https://issues.apache.org/jira/browse/PHOENIX-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gerald Sangudi updated PHOENIX-4751:

Attachment: (was: PHOENIX-4751-4.x-HBase-1.4.patch)

> Support client-side hash aggregation with SORT_MERGE_JOIN
> -
>
> Key: PHOENIX-4751
> URL: https://issues.apache.org/jira/browse/PHOENIX-4751
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.14.0, 4.13.1
>        Reporter: Gerald Sangudi
>    Assignee: Gerald Sangudi
>Priority: Major
>
> A GROUP BY that follows a SORT_MERGE_JOIN should be able to use hash 
> aggregation in some cases, for improved performance.
> When a GROUP BY follows a SORT_MERGE_JOIN, the GROUP BY does not use hash 
> aggregation. It instead performs a CLIENT SORT followed by a CLIENT 
> AGGREGATE. The performance can be improved if (a) the GROUP BY output does 
> not need to be sorted, and (b) the GROUP BY input is large enough and has low 
> cardinality.
> The hash aggregation can initially be a hint. Here is an example from Phoenix 
> 4.13.1 that would benefit from hash aggregation if the GROUP BY input is 
> large with low cardinality.
> CREATE TABLE unsalted (
>  keyA BIGINT NOT NULL,
>  keyB BIGINT NOT NULL,
>  val SMALLINT,
>  CONSTRAINT pk PRIMARY KEY (keyA, keyB)
>  );
> EXPLAIN
>  SELECT /*+ USE_SORT_MERGE_JOIN */ 
>  t1.val v1, t2.val v2, COUNT(\*) c 
>  FROM unsalted t1 JOIN unsalted t2 
>  ON (t1.keyA = t2.keyA) 
>  GROUP BY t1.val, t2.val;
>  
> +-+++--+
> |PLAN|EST_BYTES_READ|EST_ROWS_READ| |
> +-+++--+
> |SORT-MERGE-JOIN (INNER) TABLES|null|null| |
> |    CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED|null|null| |
> |AND|null|null| |
> |    CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED|null|null| |
> |CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]|null|null| |
> |CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]|null|null| |
> +-+++--+



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (PHOENIX-4751) Support client-side hash aggregation with SORT_MERGE_JOIN

2018-07-30 Thread Gerald Sangudi (JIRA)



 [ 
https://issues.apache.org/jira/browse/PHOENIX-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gerald Sangudi updated PHOENIX-4751:

Attachment: (was: PHOENIX-4751.patch)

> Support client-side hash aggregation with SORT_MERGE_JOIN
> -
>
> Key: PHOENIX-4751
> URL: https://issues.apache.org/jira/browse/PHOENIX-4751
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.14.0, 4.13.1
>        Reporter: Gerald Sangudi
>    Assignee: Gerald Sangudi
>Priority: Major
> Attachments: PHOENIX-4751-4.x-HBase-1.4.patch
>
>
> A GROUP BY that follows a SORT_MERGE_JOIN should be able to use hash 
> aggregation in some cases, for improved performance.
> When a GROUP BY follows a SORT_MERGE_JOIN, the GROUP BY does not use hash 
> aggregation. It instead performs a CLIENT SORT followed by a CLIENT 
> AGGREGATE. The performance can be improved if (a) the GROUP BY output does 
> not need to be sorted, and (b) the GROUP BY input is large enough and has low 
> cardinality.
> The hash aggregation can initially be a hint. Here is an example from Phoenix 
> 4.13.1 that would benefit from hash aggregation if the GROUP BY input is 
> large with low cardinality.
> CREATE TABLE unsalted (
>  keyA BIGINT NOT NULL,
>  keyB BIGINT NOT NULL,
>  val SMALLINT,
>  CONSTRAINT pk PRIMARY KEY (keyA, keyB)
>  );
> EXPLAIN
>  SELECT /*+ USE_SORT_MERGE_JOIN */ 
>  t1.val v1, t2.val v2, COUNT(\*) c 
>  FROM unsalted t1 JOIN unsalted t2 
>  ON (t1.keyA = t2.keyA) 
>  GROUP BY t1.val, t2.val;
>  
> +-+++--+
> |PLAN|EST_BYTES_READ|EST_ROWS_READ| |
> +-+++--+
> |SORT-MERGE-JOIN (INNER) TABLES|null|null| |
> |    CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED|null|null| |
> |AND|null|null| |
> |    CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED|null|null| |
> |CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]|null|null| |
> |CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]|null|null| |
> +-+++--+



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (PHOENIX-4751) Support client-side hash aggregation with SORT_MERGE_JOIN

2018-07-30 Thread Gerald Sangudi (JIRA)



 [ 
https://issues.apache.org/jira/browse/PHOENIX-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gerald Sangudi updated PHOENIX-4751:

Attachment: (was: PHOENIX-4751.patch)

> Support client-side hash aggregation with SORT_MERGE_JOIN
> -
>
> Key: PHOENIX-4751
> URL: https://issues.apache.org/jira/browse/PHOENIX-4751
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.14.0, 4.13.1
>        Reporter: Gerald Sangudi
>    Assignee: Gerald Sangudi
>Priority: Major
> Attachments: PHOENIX-4751-4.x-HBase-1.4.patch
>
>
> A GROUP BY that follows a SORT_MERGE_JOIN should be able to use hash 
> aggregation in some cases, for improved performance.
> When a GROUP BY follows a SORT_MERGE_JOIN, the GROUP BY does not use hash 
> aggregation. It instead performs a CLIENT SORT followed by a CLIENT 
> AGGREGATE. The performance can be improved if (a) the GROUP BY output does 
> not need to be sorted, and (b) the GROUP BY input is large enough and has low 
> cardinality.
> The hash aggregation can initially be a hint. Here is an example from Phoenix 
> 4.13.1 that would benefit from hash aggregation if the GROUP BY input is 
> large with low cardinality.
> CREATE TABLE unsalted (
>  keyA BIGINT NOT NULL,
>  keyB BIGINT NOT NULL,
>  val SMALLINT,
>  CONSTRAINT pk PRIMARY KEY (keyA, keyB)
>  );
> EXPLAIN
>  SELECT /*+ USE_SORT_MERGE_JOIN */ 
>  t1.val v1, t2.val v2, COUNT(\*) c 
>  FROM unsalted t1 JOIN unsalted t2 
>  ON (t1.keyA = t2.keyA) 
>  GROUP BY t1.val, t2.val;
>  
> +-+++--+
> |PLAN|EST_BYTES_READ|EST_ROWS_READ| |
> +-+++--+
> |SORT-MERGE-JOIN (INNER) TABLES|null|null| |
> |    CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED|null|null| |
> |AND|null|null| |
> |    CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED|null|null| |
> |CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]|null|null| |
> |CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]|null|null| |
> +-+++--+



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (PHOENIX-4751) Support client-side hash aggregation with SORT_MERGE_JOIN

2018-07-30 Thread Gerald Sangudi (JIRA)



 [ 
https://issues.apache.org/jira/browse/PHOENIX-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gerald Sangudi updated PHOENIX-4751:

Attachment: PHOENIX-4751-4.x-HBase-1.4.patch

> Support client-side hash aggregation with SORT_MERGE_JOIN
> -
>
> Key: PHOENIX-4751
> URL: https://issues.apache.org/jira/browse/PHOENIX-4751
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.14.0, 4.13.1
>        Reporter: Gerald Sangudi
>    Assignee: Gerald Sangudi
>Priority: Major
> Attachments: PHOENIX-4751-4.x-HBase-1.4.patch
>
>
> A GROUP BY that follows a SORT_MERGE_JOIN should be able to use hash 
> aggregation in some cases, for improved performance.
> When a GROUP BY follows a SORT_MERGE_JOIN, the GROUP BY does not use hash 
> aggregation. It instead performs a CLIENT SORT followed by a CLIENT 
> AGGREGATE. The performance can be improved if (a) the GROUP BY output does 
> not need to be sorted, and (b) the GROUP BY input is large enough and has low 
> cardinality.
> The hash aggregation can initially be a hint. Here is an example from Phoenix 
> 4.13.1 that would benefit from hash aggregation if the GROUP BY input is 
> large with low cardinality.
> CREATE TABLE unsalted (
>  keyA BIGINT NOT NULL,
>  keyB BIGINT NOT NULL,
>  val SMALLINT,
>  CONSTRAINT pk PRIMARY KEY (keyA, keyB)
>  );
> EXPLAIN
>  SELECT /*+ USE_SORT_MERGE_JOIN */ 
>  t1.val v1, t2.val v2, COUNT(\*) c 
>  FROM unsalted t1 JOIN unsalted t2 
>  ON (t1.keyA = t2.keyA) 
>  GROUP BY t1.val, t2.val;
>  
> +-+++--+
> |PLAN|EST_BYTES_READ|EST_ROWS_READ| |
> +-+++--+
> |SORT-MERGE-JOIN (INNER) TABLES|null|null| |
> |    CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED|null|null| |
> |AND|null|null| |
> |    CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED|null|null| |
> |CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]|null|null| |
> |CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]|null|null| |
> +-+++--+



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (PHOENIX-4751) Support client-side hash aggregation with SORT_MERGE_JOIN

2018-07-30 Thread Gerald Sangudi (JIRA)



 [ 
https://issues.apache.org/jira/browse/PHOENIX-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gerald Sangudi updated PHOENIX-4751:

Attachment: PHOENIX-4751.patch

> Support client-side hash aggregation with SORT_MERGE_JOIN
> -
>
> Key: PHOENIX-4751
> URL: https://issues.apache.org/jira/browse/PHOENIX-4751
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.14.0, 4.13.1
>        Reporter: Gerald Sangudi
>    Assignee: Gerald Sangudi
>Priority: Major
> Attachments: PHOENIX-4751.patch
>
>
> A GROUP BY that follows a SORT_MERGE_JOIN should be able to use hash 
> aggregation in some cases, for improved performance.
> When a GROUP BY follows a SORT_MERGE_JOIN, the GROUP BY does not use hash 
> aggregation. It instead performs a CLIENT SORT followed by a CLIENT 
> AGGREGATE. The performance can be improved if (a) the GROUP BY output does 
> not need to be sorted, and (b) the GROUP BY input is large enough and has low 
> cardinality.
> The hash aggregation can initially be a hint. Here is an example from Phoenix 
> 4.13.1 that would benefit from hash aggregation if the GROUP BY input is 
> large with low cardinality.
> CREATE TABLE unsalted (
>  keyA BIGINT NOT NULL,
>  keyB BIGINT NOT NULL,
>  val SMALLINT,
>  CONSTRAINT pk PRIMARY KEY (keyA, keyB)
>  );
> EXPLAIN
>  SELECT /*+ USE_SORT_MERGE_JOIN */ 
>  t1.val v1, t2.val v2, COUNT(\*) c 
>  FROM unsalted t1 JOIN unsalted t2 
>  ON (t1.keyA = t2.keyA) 
>  GROUP BY t1.val, t2.val;
>  
> +-+++--+
> |PLAN|EST_BYTES_READ|EST_ROWS_READ| |
> +-+++--+
> |SORT-MERGE-JOIN (INNER) TABLES|null|null| |
> |    CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED|null|null| |
> |AND|null|null| |
> |    CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED|null|null| |
> |CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]|null|null| |
> |CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]|null|null| |
> +-+++--+



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (PHOENIX-4751) Support client-side hash aggregation with SORT_MERGE_JOIN

2018-07-09 Thread Gerald Sangudi (JIRA)



[ 
https://issues.apache.org/jira/browse/PHOENIX-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16537785#comment-16537785
 ] 

Gerald Sangudi commented on PHOENIX-4751:
-

[~sergey.soldatov] I removed those unnecessary commits using (git rebase -i + 
git push --force). Lmk if I need to do anything else. Thanks.

> Support client-side hash aggregation with SORT_MERGE_JOIN
> -
>
> Key: PHOENIX-4751
> URL: https://issues.apache.org/jira/browse/PHOENIX-4751
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.14.0, 4.13.1
>        Reporter: Gerald Sangudi
>Priority: Major
>
> A GROUP BY that follows a SORT_MERGE_JOIN should be able to use hash 
> aggregation in some cases, for improved performance.
> When a GROUP BY follows a SORT_MERGE_JOIN, the GROUP BY does not use hash 
> aggregation. It instead performs a CLIENT SORT followed by a CLIENT 
> AGGREGATE. The performance can be improved if (a) the GROUP BY output does 
> not need to be sorted, and (b) the GROUP BY input is large enough and has low 
> cardinality.
> The hash aggregation can initially be a hint. Here is an example from Phoenix 
> 4.13.1 that would benefit from hash aggregation if the GROUP BY input is 
> large with low cardinality.
> CREATE TABLE unsalted (
>  keyA BIGINT NOT NULL,
>  keyB BIGINT NOT NULL,
>  val SMALLINT,
>  CONSTRAINT pk PRIMARY KEY (keyA, keyB)
>  );
> EXPLAIN
>  SELECT /*+ USE_SORT_MERGE_JOIN */ 
>  t1.val v1, t2.val v2, COUNT(\*) c 
>  FROM unsalted t1 JOIN unsalted t2 
>  ON (t1.keyA = t2.keyA) 
>  GROUP BY t1.val, t2.val;
>  
> +-+++--+
> |PLAN|EST_BYTES_READ|EST_ROWS_READ| |
> +-+++--+
> |SORT-MERGE-JOIN (INNER) TABLES|null|null| |
> |    CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED|null|null| |
> |AND|null|null| |
> |    CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED|null|null| |
> |CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]|null|null| |
> |CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]|null|null| |
> +-+++--+



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (PHOENIX-4751) Support client-side hash aggregation with SORT_MERGE_JOIN

2018-07-09 Thread Gerald Sangudi (JIRA)



[ 
https://issues.apache.org/jira/browse/PHOENIX-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16537722#comment-16537722
 ] 

Gerald Sangudi commented on PHOENIX-4751:
-

Hi [~sergey.soldatov], how do I do that? Is there a link? I'll also check docs 
/ Google.

Those commits came in because I wanted to keep the patch in sync with upstream.


> Support client-side hash aggregation with SORT_MERGE_JOIN
> -
>
> Key: PHOENIX-4751
> URL: https://issues.apache.org/jira/browse/PHOENIX-4751
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.14.0, 4.13.1
>        Reporter: Gerald Sangudi
>Priority: Major
>
> A GROUP BY that follows a SORT_MERGE_JOIN should be able to use hash 
> aggregation in some cases, for improved performance.
> When a GROUP BY follows a SORT_MERGE_JOIN, the GROUP BY does not use hash 
> aggregation. It instead performs a CLIENT SORT followed by a CLIENT 
> AGGREGATE. The performance can be improved if (a) the GROUP BY output does 
> not need to be sorted, and (b) the GROUP BY input is large enough and has low 
> cardinality.
> The hash aggregation can initially be a hint. Here is an example from Phoenix 
> 4.13.1 that would benefit from hash aggregation if the GROUP BY input is 
> large with low cardinality.
> CREATE TABLE unsalted (
>  keyA BIGINT NOT NULL,
>  keyB BIGINT NOT NULL,
>  val SMALLINT,
>  CONSTRAINT pk PRIMARY KEY (keyA, keyB)
>  );
> EXPLAIN
>  SELECT /*+ USE_SORT_MERGE_JOIN */ 
>  t1.val v1, t2.val v2, COUNT(\*) c 
>  FROM unsalted t1 JOIN unsalted t2 
>  ON (t1.keyA = t2.keyA) 
>  GROUP BY t1.val, t2.val;
>  
> +-+++--+
> |PLAN|EST_BYTES_READ|EST_ROWS_READ| |
> +-+++--+
> |SORT-MERGE-JOIN (INNER) TABLES|null|null| |
> |    CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED|null|null| |
> |AND|null|null| |
> |    CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED|null|null| |
> |CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]|null|null| |
> |CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]|null|null| |
> +-+++--+



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (PHOENIX-4751) Support client-side hash aggregation with SORT_MERGE_JOIN

2018-07-02 Thread Gerald Sangudi (JIRA)



[ 
https://issues.apache.org/jira/browse/PHOENIX-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530425#comment-16530425
 ] 

Gerald Sangudi edited comment on PHOENIX-4751 at 7/2/18 8:51 PM:
-

Ready for review. Pull request https://github.com/apache/phoenix/pull/308

The hashing is done in memory, instead of using SpillableGroupByCache. This is 
consistent with the current client-side in-memory sort, and avoids any 
dependencies in SpillableGroupByCache.



was (Author: sangudi):
Ready for review. Pull request https://github.com/apache/phoenix/pull/308


> Support client-side hash aggregation with SORT_MERGE_JOIN
> -
>
> Key: PHOENIX-4751
> URL: https://issues.apache.org/jira/browse/PHOENIX-4751
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.14.0, 4.13.1
>        Reporter: Gerald Sangudi
>Priority: Major
>
> A GROUP BY that follows a SORT_MERGE_JOIN should be able to use hash 
> aggregation in some cases, for improved performance.
> When a GROUP BY follows a SORT_MERGE_JOIN, the GROUP BY does not use hash 
> aggregation. It instead performs a CLIENT SORT followed by a CLIENT 
> AGGREGATE. The performance can be improved if (a) the GROUP BY output does 
> not need to be sorted, and (b) the GROUP BY input is large enough and has low 
> cardinality.
> The hash aggregation can initially be a hint. Here is an example from Phoenix 
> 4.13.1 that would benefit from hash aggregation if the GROUP BY input is 
> large with low cardinality.
> CREATE TABLE unsalted (
>  keyA BIGINT NOT NULL,
>  keyB BIGINT NOT NULL,
>  val SMALLINT,
>  CONSTRAINT pk PRIMARY KEY (keyA, keyB)
>  );
> EXPLAIN
>  SELECT /*+ USE_SORT_MERGE_JOIN */ 
>  t1.val v1, t2.val v2, COUNT(\*) c 
>  FROM unsalted t1 JOIN unsalted t2 
>  ON (t1.keyA = t2.keyA) 
>  GROUP BY t1.val, t2.val;
>  
> +-+++--+
> |PLAN|EST_BYTES_READ|EST_ROWS_READ| |
> +-+++--+
> |SORT-MERGE-JOIN (INNER) TABLES|null|null| |
> |    CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED|null|null| |
> |AND|null|null| |
> |    CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED|null|null| |
> |CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]|null|null| |
> |CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]|null|null| |
> +-+++--+



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (PHOENIX-4751) Support client-side hash aggregation with SORT_MERGE_JOIN

2018-07-02 Thread Gerald Sangudi (JIRA)



[ 
https://issues.apache.org/jira/browse/PHOENIX-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16530425#comment-16530425
 ] 

Gerald Sangudi commented on PHOENIX-4751:
-

Ready for review. Pull request https://github.com/apache/phoenix/pull/308


> Support client-side hash aggregation with SORT_MERGE_JOIN
> -
>
> Key: PHOENIX-4751
> URL: https://issues.apache.org/jira/browse/PHOENIX-4751
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.14.0, 4.13.1
>        Reporter: Gerald Sangudi
>Priority: Major
>
> A GROUP BY that follows a SORT_MERGE_JOIN should be able to use hash 
> aggregation in some cases, for improved performance.
> When a GROUP BY follows a SORT_MERGE_JOIN, the GROUP BY does not use hash 
> aggregation. It instead performs a CLIENT SORT followed by a CLIENT 
> AGGREGATE. The performance can be improved if (a) the GROUP BY output does 
> not need to be sorted, and (b) the GROUP BY input is large enough and has low 
> cardinality.
> The hash aggregation can initially be a hint. Here is an example from Phoenix 
> 4.13.1 that would benefit from hash aggregation if the GROUP BY input is 
> large with low cardinality.
> CREATE TABLE unsalted (
>  keyA BIGINT NOT NULL,
>  keyB BIGINT NOT NULL,
>  val SMALLINT,
>  CONSTRAINT pk PRIMARY KEY (keyA, keyB)
>  );
> EXPLAIN
>  SELECT /*+ USE_SORT_MERGE_JOIN */ 
>  t1.val v1, t2.val v2, COUNT(\*) c 
>  FROM unsalted t1 JOIN unsalted t2 
>  ON (t1.keyA = t2.keyA) 
>  GROUP BY t1.val, t2.val;
>  
> +-+++--+
> |PLAN|EST_BYTES_READ|EST_ROWS_READ| |
> +-+++--+
> |SORT-MERGE-JOIN (INNER) TABLES|null|null| |
> |    CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED|null|null| |
> |AND|null|null| |
> |    CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED|null|null| |
> |CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]|null|null| |
> |CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]|null|null| |
> +-+++--+



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (PHOENIX-4751) Support client-side hash aggregation with SORT_MERGE_JOIN

2018-05-23 Thread Gerald Sangudi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gerald Sangudi updated PHOENIX-4751:

Description: 
A GROUP BY that follows a SORT_MERGE_JOIN should be able to use hash 
aggregation in some cases, for improved performance.

When a GROUP BY follows a SORT_MERGE_JOIN, the GROUP BY does not use hash 
aggregation. It instead performs a CLIENT SORT followed by a CLIENT AGGREGATE. 
The performance can be improved if (a) the GROUP BY output does not need to be 
sorted, and (b) the GROUP BY input is large enough and has low cardinality.

The hash aggregation can initially be a hint. Here is an example from Phoenix 
4.13.1 that would benefit from hash aggregation if the GROUP BY input is large 
with low cardinality.

CREATE TABLE unsalted (
 keyA BIGINT NOT NULL,
 keyB BIGINT NOT NULL,
 val SMALLINT,
 CONSTRAINT pk PRIMARY KEY (keyA, keyB)
 );

EXPLAIN
 SELECT /*+ USE_SORT_MERGE_JOIN */ 
 t1.val v1, t2.val v2, COUNT(\*) c 
 FROM unsalted t1 JOIN unsalted t2 
 ON (t1.keyA = t2.keyA) 
 GROUP BY t1.val, t2.val;
 
+-+++--+
|PLAN|EST_BYTES_READ|EST_ROWS_READ| |

+-+++--+
|SORT-MERGE-JOIN (INNER) TABLES|null|null| |
|    CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED|null|null| |
|AND|null|null| |
|    CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED|null|null| |
|CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]|null|null| |
|CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]|null|null| |

+-+++--+

  was:
A GROUP BY that follows a SORT_MERGE_JOIN should be able to use hash 
aggregation in some cases, for improved performance.

When a GROUP BY follows a SORT_MERGE_JOIN, the GROUP BY does not use hash 
aggregation. It instead performs a CLIENT SORT followed by a CLIENT AGGREGATE. 
The performance can be improved if (a) the GROUP BY output does not need to be 
sorted, and (b) the GROUP BY input is large enough and has low cardinality.

The hash aggregation can initially be a hint. Here is an example from Phoenix 
4.13.1 that would benefit from hash aggregation if the GROUP BY input is large 
with low cardinality.

CREATE TABLE unsalted (
 keyA BIGINT NOT NULL,
 keyB BIGINT NOT NULL,
 val SMALLINT,
 CONSTRAINT pk PRIMARY KEY (keyA, keyB)
 );

EXPLAIN
 SELECT /*+ USE_SORT_MERGE_JOIN */ 
 t1.val v1, t2.val v2, COUNT(*) c 
 FROM unsalted t1 JOIN unsalted t2 
 ON (t1.keyA = t2.keyA) 
 GROUP BY t1.val, t2.val;
 
+-+++--+
|PLAN|EST_BYTES_READ|EST_ROWS_READ| |

+-+++--+
|SORT-MERGE-JOIN (INNER) TABLES|null|null| |
|    CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED|null|null| |
|AND|null|null| |
|    CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED|null|null| |
|CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]|null|null| |
|CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]|null|null| |

+-+++--+


> Support client-side hash aggregation with SORT_MERGE_JOIN
> -
>
> Key: PHOENIX-4751
> URL: https://issues.apache.org/jira/browse/PHOENIX-4751
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.14.0, 4.13.1
>        Reporter: Gerald Sangudi
>Priority: Major
>
> A GROUP BY that follows a SORT_MERGE_JOIN should be able to use hash 
> aggregation in some cases, for improved performance.
> When a GROUP BY follows a SORT_MERGE_JOIN, the GROUP BY does not use hash 
> aggregation. It instead performs a CLIENT SORT followed by a CLIENT 
> AGGREGATE. The performance can be improved if (a) the GROUP BY output does 
> not need to be sorted, and (b) the GROUP BY input is large enough and has low 
> cardinality.
> The hash aggregation can initially be a hint. Here is an example from Phoenix 
> 4.13.1 that would benefit from hash aggregation if the GROUP BY input is 
> large with low cardinality.
> CREATE TABLE unsalted (
>  keyA BIGINT NOT NULL,
>  keyB BIGINT NOT NULL,
>  val SMALLINT,
>  CONSTRAINT pk PRIMARY KEY (keyA, keyB)
>  );
> EXPLAIN
>  SELECT /*+ USE_SORT_MERGE_JOIN */ 
>  t1.val v1, t2.val v2, COUNT(\*) c 
>  FROM unsalted t1 JOIN unsalted t2 
>  ON (t1.keyA = t2.keyA) 
>  GROUP BY t1.val, t2.val;
>  
> +---

[jira] [Commented] (PHOENIX-4751) Support client-side hash aggregation with SORT_MERGE_JOIN

2018-05-23 Thread Gerald Sangudi (JIRA)


[ 
https://issues.apache.org/jira/browse/PHOENIX-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16488387#comment-16488387
 ] 

Gerald Sangudi commented on PHOENIX-4751:
-

[~jamestaylor], my understanding is that SORT-MERGE-JOIN takes place on the 
client. In the EXPLAIN plan, I see CLIENT SCANs as part of the SORT-MERGE-JOIN. 
If that is the case, the results of the SORT-MERGE-JOIN might be in client-side 
memory before performing the GROUP BY. Depending on the data size, would the 
client be able to aggregate these faster than writing them back to a temp table 
on the region servers for aggregation?

(2) Is SpillableGroupByCache currently used anywhere, e.g. in server-side hash 
aggregation?

> Support client-side hash aggregation with SORT_MERGE_JOIN
> -
>
> Key: PHOENIX-4751
> URL: https://issues.apache.org/jira/browse/PHOENIX-4751
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.14.0, 4.13.1
>        Reporter: Gerald Sangudi
>Priority: Major
>
> A GROUP BY that follows a SORT_MERGE_JOIN should be able to use hash 
> aggregation in some cases, for improved performance.
> When a GROUP BY follows a SORT_MERGE_JOIN, the GROUP BY does not use hash 
> aggregation. It instead performs a CLIENT SORT followed by a CLIENT 
> AGGREGATE. The performance can be improved if (a) the GROUP BY output does 
> not need to be sorted, and (b) the GROUP BY input is large enough and has low 
> cardinality.
> The hash aggregation can initially be a hint. Here is an example from Phoenix 
> 4.13.1 that would benefit from hash aggregation if the GROUP BY input is 
> large with low cardinality.
> CREATE TABLE unsalted (
>  keyA BIGINT NOT NULL,
>  keyB BIGINT NOT NULL,
>  val SMALLINT,
>  CONSTRAINT pk PRIMARY KEY (keyA, keyB)
>  );
> EXPLAIN
>  SELECT /*+ USE_SORT_MERGE_JOIN */ 
>  t1.val v1, t2.val v2, COUNT(*) c 
>  FROM unsalted t1 JOIN unsalted t2 
>  ON (t1.keyA = t2.keyA) 
>  GROUP BY t1.val, t2.val;
>  
> +-+++--+
> |PLAN|EST_BYTES_READ|EST_ROWS_READ| |
> +-+++--+
> |SORT-MERGE-JOIN (INNER) TABLES|null|null| |
> |    CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED|null|null| |
> |AND|null|null| |
> |    CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED|null|null| |
> |CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]|null|null| |
> |CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]|null|null| |
> +-+++--+



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (PHOENIX-4751) Support client-side hash aggregation with SORT_MERGE_JOIN

2018-05-23 Thread Gerald Sangudi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gerald Sangudi updated PHOENIX-4751:

Description: 
A GROUP BY that follows a SORT_MERGE_JOIN should be able to use hash 
aggregation in some cases, for improved performance.

When a GROUP BY follows a SORT_MERGE_JOIN, the GROUP BY does not use hash 
aggregation. It instead performs a CLIENT SORT followed by a CLIENT AGGREGATE. 
The performance can be improved if (a) the GROUP BY output does not need to be 
sorted, and (b) the GROUP BY input is large enough and has low cardinality.

The hash aggregation can initially be a hint. Here is an example from Phoenix 
4.13.1 that would benefit from hash aggregation if the GROUP BY input is large 
with low cardinality.

CREATE TABLE unsalted (
 keyA BIGINT NOT NULL,
 keyB BIGINT NOT NULL,
 val SMALLINT,
 CONSTRAINT pk PRIMARY KEY (keyA, keyB)
 );

EXPLAIN
 SELECT /*+ USE_SORT_MERGE_JOIN */ 
 t1.val v1, t2.val v2, COUNT(*) c 
 FROM unsalted t1 JOIN unsalted t2 
 ON (t1.keyA = t2.keyA) 
 GROUP BY t1.val, t2.val;
 
+-+++--+
|PLAN|EST_BYTES_READ|EST_ROWS_READ| |

+-+++--+
|SORT-MERGE-JOIN (INNER) TABLES|null|null| |
|    CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED|null|null| |
|AND|null|null| |
|    CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED|null|null| |
|CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]|null|null| |
|CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]|null|null| |

+-+++--+

  was:
A GROUP BY that follows a SORT_MERGE_JOIN should be able to use hash 
aggregation in some cases, for improved performance.

When a GROUP BY follows a SORT_MERGE_JOIN, the GROUP BY does not use hash 
aggregation. It instead performs a CLIENT SORT followed by a CLIENT AGGREGATE. 
The performance can be improved if (a) the GROUP BY output does not need to be 
sorted, and (b) the GROUP BY input is large enough and has low cardinality.

The hash aggregation can initially be a hint. Here is an example from Phoenix 
4.13.1 that would benefit from hash aggregation if the GROUP BY input is large 
with low cardinality.

CREATE TABLE unsalted (
   keyA BIGINT NOT NULL,
   keyB BIGINT NOT NULL,
   val SMALLINT,
   CONSTRAINT pk PRIMARY KEY (keyA, keyB)
);

EXPLAIN
SELECT /*+ USE_SORT_MERGE_JOIN */ 
t1.val v1, t2.val v2, COUNT(\*) c 
FROM unsalted t1 JOIN unsalted t2 
ON (t1.keyA = t2.keyA) 
GROUP BY t1.val, t2.val;
++-++--+
|PLAN| EST_BYTES_READ  
| EST_ROWS_READ  |  |
++-++--+
| SORT-MERGE-JOIN (INNER) TABLES | null
| null   |  |
| CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null
| null   |  |
| AND| null
| null   |  |
| CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null
| null   |  |
| CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]  | null
| null   |  |
| CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]| null
| null   |  |
++-++--+


> Support client-side hash aggregation with SORT_MERGE_JOIN
> -
>
> Key: PHOENIX-4751
> URL: https://issues.apache.org/jira/browse/PHOENIX-4751
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.14.0, 4.13.1
>        Reporter: Gerald Sangudi
>Priority: Major
>
> A GROUP BY that follows a SORT_MERGE_JOIN should be able to use hash 
> aggregation in some cases, for improved performance.
> When a GROUP BY follows a SORT_MERGE_JOIN, the GROUP BY does not use hash 
> aggregation. It instead performs a CLIENT SORT followed by a CLIENT 
> AGGREGATE. The performance can be improved if (a) the GROUP BY output does 
> not need to be sorted, and (b) the GROUP BY input is large enough and has low 
> cardinality.
> The hash aggregation can initially be a hint. Here is an example from Phoenix 
> 4.13.1 that would benefit from hash aggregation if the GROUP BY input is 
> large with low cardinality.
> CREATE TABLE unsalted (
>  keyA BIGINT NOT NULL,
>  keyB BIGINT NOT NULL,
>  val SM

[jira] [Updated] (PHOENIX-4751) Support client-side hash aggregation with SORT_MERGE_JOIN

2018-05-23 Thread Gerald Sangudi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gerald Sangudi updated PHOENIX-4751:

Summary: Support client-side hash aggregation with SORT_MERGE_JOIN  (was: 
Support hash aggregation with SORT_MERGE_JOIN)

> Support client-side hash aggregation with SORT_MERGE_JOIN
> -
>
> Key: PHOENIX-4751
> URL: https://issues.apache.org/jira/browse/PHOENIX-4751
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.14.0, 4.13.1
>        Reporter: Gerald Sangudi
>Priority: Major
>
> A GROUP BY that follows a SORT_MERGE_JOIN should be able to use hash 
> aggregation in some cases, for improved performance.
> When a GROUP BY follows a SORT_MERGE_JOIN, the GROUP BY does not use hash 
> aggregation. It instead performs a CLIENT SORT followed by a CLIENT 
> AGGREGATE. The performance can be improved if (a) the GROUP BY output does 
> not need to be sorted, and (b) the GROUP BY input is large enough and has low 
> cardinality.
> The hash aggregation can initially be a hint. Here is an example from Phoenix 
> 4.13.1 that would benefit from hash aggregation if the GROUP BY input is 
> large with low cardinality.
> CREATE TABLE unsalted (
>keyA BIGINT NOT NULL,
>keyB BIGINT NOT NULL,
>val SMALLINT,
>CONSTRAINT pk PRIMARY KEY (keyA, keyB)
> );
> EXPLAIN
> SELECT /*+ USE_SORT_MERGE_JOIN */ 
> t1.val v1, t2.val v2, COUNT(\*) c 
> FROM unsalted t1 JOIN unsalted t2 
> ON (t1.keyA = t2.keyA) 
> GROUP BY t1.val, t2.val;
> ++-++--+
> |PLAN| EST_BYTES_READ 
>  | EST_ROWS_READ  |  |
> ++-++--+
> | SORT-MERGE-JOIN (INNER) TABLES | null   
>  | null   |  |
> | CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null   
>  | null   |  |
> | AND| null   
>  | null   |  |
> | CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null   
>  | null   |  |
> | CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]  | null   
>  | null   |  |
> | CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]| null   
>  | null   |  |
> ++-++--+



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (PHOENIX-4751) Support hash aggregation with SORT_MERGE_JOIN

2018-05-23 Thread Gerald Sangudi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gerald Sangudi updated PHOENIX-4751:

Description: 
A GROUP BY that follows a SORT_MERGE_JOIN should be able to use hash 
aggregation in some cases, for improved performance.

When a GROUP BY follows a SORT_MERGE_JOIN, the GROUP BY does not use hash 
aggregation. It instead performs a CLIENT SORT followed by a CLIENT AGGREGATE. 
The performance can be improved if (a) the GROUP BY output does not need to be 
sorted, and (b) the GROUP BY input is large enough and has low cardinality.

The hash aggregation can initially be a hint. Here is an example from Phoenix 
4.13.1 that would benefit from hash aggregation if the GROUP BY input is large 
with low cardinality.

CREATE TABLE unsalted (
   keyA BIGINT NOT NULL,
   keyB BIGINT NOT NULL,
   val SMALLINT,
   CONSTRAINT pk PRIMARY KEY (keyA, keyB)
);

EXPLAIN
SELECT /*+ USE_SORT_MERGE_JOIN */ 
t1.val v1, t2.val v2, COUNT(\*) c 
FROM unsalted t1 JOIN unsalted t2 
ON (t1.keyA = t2.keyA) 
GROUP BY t1.val, t2.val;
++-++--+
|PLAN| EST_BYTES_READ  
| EST_ROWS_READ  |  |
++-++--+
| SORT-MERGE-JOIN (INNER) TABLES | null
| null   |  |
| CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null
| null   |  |
| AND| null
| null   |  |
| CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED  | null
| null   |  |
| CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]  | null
| null   |  |
| CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]| null
| null   |  |
++-++--+

  was:
A GROUP BY that follows a SORT_MERGE_JOIN should be able to use hash 
aggregation in some cases, for improved performance.

When a GROUP BY follows a SORT_MERGE_JOIN, the GROUP BY does not use hash 
aggregation. It instead performs a CLIENT SORT followed by a CLIENT AGGREGATE. 
The performance can be improved if (a) the GROUP BY output does not need to be 
sorted, and (b) the GROUP BY input is large enough and has low cardinality.

The hash aggregation can initially be a hint. Here is an example from Phoenix 
4.13.1 that would benefit from hash aggregation if the GROUP BY input is large 
with low cardinality.

{{CREATE TABLE unsalted (}}
{{ keyA BIGINT NOT NULL,}}
{{ keyB BIGINT NOT NULL,}}
{{ val SMALLINT,}}
{{ CONSTRAINT pk PRIMARY KEY (keyA, keyB)}}
{{ );}}

{{EXPLAIN}}
{{ SELECT /*+ USE_SORT_MERGE_JOIN */ }}
{{ t1.val v1, t2.val v2, COUNT(*) c }}
{{ FROM unsalted t1 JOIN unsalted t2 }}
{{ ON (t1.keyA = t2.keyA) }}
{{ GROUP BY t1.val, t2.val;}}
{{ 
+-+++--+}}{{}}
|PLAN|EST_BYTES_READ|EST_ROWS_READ| |

{{}}{{+-+++--+}}{{}}
|SORT-MERGE-JOIN (INNER) TABLES|null|null| |
|CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED|null|null| |
|AND|null|null| |
|CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED|null|null| |
|CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]|null|null| |
|CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]|null|null| |

{{}}{{+-+++--+}}


> Support hash aggregation with SORT_MERGE_JOIN
> -
>
> Key: PHOENIX-4751
> URL: https://issues.apache.org/jira/browse/PHOENIX-4751
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.14.0, 4.13.1
>        Reporter: Gerald Sangudi
>Priority: Major
>
> A GROUP BY that follows a SORT_MERGE_JOIN should be able to use hash 
> aggregation in some cases, for improved performance.
> When a GROUP BY follows a SORT_MERGE_JOIN, the GROUP BY does not use hash 
> aggregation. It instead performs a CLIENT SORT followed by a CLIENT 
> AGGREGATE. The performance can be improved if (a) the GROUP BY output does 
> not need to be sorted, and (b) the GROUP BY input is large enough and has low 
> cardinality.
> The hash aggregation can initially be a hint. Here is an example from Phoenix 
> 4.13.1 that would benefit from hash aggregation if the GROUP BY input is 
> large with low cardinality.
> CREATE TABLE unsalted (
>keyA BIGINT NOT NULL,
>

[jira] [Created] (PHOENIX-4751) Support hash aggregation with SORT_MERGE_JOIN

2018-05-23 Thread Gerald Sangudi (JIRA)

Gerald Sangudi created PHOENIX-4751:
---

 Summary: Support hash aggregation with SORT_MERGE_JOIN
 Key: PHOENIX-4751
 URL: https://issues.apache.org/jira/browse/PHOENIX-4751
 Project: Phoenix
  Issue Type: Improvement
Affects Versions: 4.14.0, 4.13.1
Reporter: Gerald Sangudi


A GROUP BY that follows a SORT_MERGE_JOIN should be able to use hash 
aggregation in some cases, for improved performance.

When a GROUP BY follows a SORT_MERGE_JOIN, the GROUP BY does not use hash 
aggregation. It instead performs a CLIENT SORT followed by a CLIENT AGGREGATE. 
The performance can be improved if (a) the GROUP BY output does not need to be 
sorted, and (b) the GROUP BY input is large enough and has low cardinality.

The hash aggregation can initially be a hint. Here is an example from Phoenix 
4.13.1 that would benefit from hash aggregation if the GROUP BY input is large 
with low cardinality.

{{CREATE TABLE unsalted (}}
{{ keyA BIGINT NOT NULL,}}
{{ keyB BIGINT NOT NULL,}}
{{ val SMALLINT,}}
{{ CONSTRAINT pk PRIMARY KEY (keyA, keyB)}}
{{ );}}

{{EXPLAIN}}
{{ SELECT /*+ USE_SORT_MERGE_JOIN */ }}
{{ t1.val v1, t2.val v2, COUNT(*) c }}
{{ FROM unsalted t1 JOIN unsalted t2 }}
{{ ON (t1.keyA = t2.keyA) }}
{{ GROUP BY t1.val, t2.val;}}
{{ 
+-+++--+}}{{}}
|PLAN|EST_BYTES_READ|EST_ROWS_READ| |

{{}}{{+-+++--+}}{{}}
|SORT-MERGE-JOIN (INNER) TABLES|null|null| |
|CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED|null|null| |
|AND|null|null| |
|CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED|null|null| |
|CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]|null|null| |
|CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]|null|null| |

{{}}{{+-+++--+}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

40 matches

Mail list logo