[jira] [Updated] (PHOENIX-4504) Subquery with ORDER BY on salted table gives wrong results

Sokolov Yura (JIRA) Wed, 27 Dec 2017 10:38:37 -0800

     [ 
https://issues.apache.org/jira/browse/PHOENIX-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sokolov Yura updated PHOENIX-4504:
----------------------------------
    Description: 
Probably it is already fixed. Having a quick search I didn't find exact problem 
description.

I have a table:
{code:sql}
create immutable table product_history_v3 (
        ts bigint not null,
        id varchar not null,
        product varchar,
        merchantid varchar,
        storeid varchar,
        constraint pk primary key (ts, id)
) compression=LZ4,max_filesize=150000000,memstore_flushsize=70000000,
        versions=1,update_cache_frequency=1000,append_only_schema=true,
        guid_posts_width=10000000,
SALT_BUCKETS=20;
create local index product_history_v3_id_ts on product_history_v3 (id, ts) 
compression=LZ4;
create local index product_history_v3_merchantid_ts on product_history_v3 
(merchantid, ts) include (id) compression=LZ4;
create local index product_history_v3_storeid_ts on product_history_v3 
(storeid, ts) include (id) compression=LZ4;
{code}
Simple select by merchanid ordering by id,ts returns correct results:
{code:sql}
0: jdbc:phoenix:localhost:2181:/hbase> explain select id, ts from 
product_history_v3 where merchantid = '1479114284851799852-2-11-118-1577502676' 
and ts < 1499472000000 and ts > 1498867200000 order by id, ts limit 30;
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|                                                                               
                      PLAN                                                      
                                               |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| CLIENT 20-CHUNK 0 ROWS 0 BYTES PARALLEL 20-WAY RANGE SCAN OVER 
PRODUCT_HISTORY_V3 [2,'1479114284851799852-2-11-118-1577502676',1498867200001] 
- [2,'1479114284851799852-2-11-118-1577502676',1499472000000]  |
|     SERVER FILTER BY FIRST KEY ONLY                                           
                                                                                
                                               |
|     SERVER TOP 30 ROWS SORTED BY ["ID", "TS"]                                 
                                                                                
                                             |
| CLIENT MERGE SORT                                                             
                                                                                
                                               |
| CLIENT LIMIT 30                                                               
                                                                                
                                            |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
5 rows selected (0,019 seconds)
{code}
It runs very fast until I add {{product}} to selected fields (cause average 
length of {{product}} is 10kb).

So I'm trying to fetch id,ts in subquery, and product in outer query. It runs 
fast, but returns incorrect results: set of rows doesn't match to set of rows 
returned by query above.
{code}
0: jdbc:phoenix:localhost:2181:/hbase> explain select id, ts, 
substr(product,1,30) from product_history_v3 where (id, ts) in (select id, ts 
from product_history_v3 where merchantid = 
'1479114284851799852-2-11-118-1577502676' and ts < 1499472000000 and ts > 
1498867200000 order by id, ts limit 30) order by id, ts limit 30;
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|                                                                               
                          PLAN                                                  
                                                       |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| CLIENT 40-CHUNK 915204 ROWS 6291521905 BYTES PARALLEL 40-WAY FULL SCAN OVER 
PRODUCT_HISTORY_V3
|     SERVER TOP 30 ROWS SORTED BY [PRODUCT_HISTORY_V3.ID, 
PRODUCT_HISTORY_V3.TS]
| CLIENT MERGE SORT
| CLIENT LIMIT 30
|     SKIP-SCAN-JOIN TABLE 0
|         CLIENT 20-CHUNK 0 ROWS 0 BYTES PARALLEL 20-WAY RANGE SCAN OVER 
PRODUCT_HISTORY_V3 [2,'1479114284851799852-2-11-118-1577502676',1498867200001] 
- [2,'1479114284851799852-2-11-118-1577502676',1499472000000]
|             SERVER FILTER BY FIRST KEY ONLY
|             SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY ["ID", "TS"] LIMIT 
30 GROUPS
|         CLIENT MERGE SORT
|         CLIENT 30 ROW LIMIT
|     DYNAMIC SERVER FILTER BY (PRODUCT_HISTORY_V3.TS, PRODUCT_HISTORY_V3.ID) 
IN (($470.$473, $470.$472))
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
11 rows selected (0,021 seconds)
{code}
However, if I change ordering a bit, so planner is forced for reordering, then 
set of rows is equal to original query:
{code}
0: jdbc:phoenix:localhost:2181:/hbase> explain select id, ts, 
substr(product,1,30) from product_history_v3 where (id, ts) in (select id, ts 
from product_history_v3 where merchantid = 
'1479114284851799852-2-11-118-1577502676' and ts < 1499472000000 and ts > 
1498867200000 order by id||'-', ts limit 30) order by id, ts limit 30;
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|                                                                               
                          PLAN
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| CLIENT 40-CHUNK 915204 ROWS 6291521905 BYTES PARALLEL 40-WAY FULL SCAN OVER 
PRODUCT_HISTORY_V3
|     SERVER TOP 30 ROWS SORTED BY [PRODUCT_HISTORY_V3.ID, 
PRODUCT_HISTORY_V3.TS]
| CLIENT MERGE SORT
| CLIENT LIMIT 30
|     SKIP-SCAN-JOIN TABLE 0
|         CLIENT 20-CHUNK 0 ROWS 0 BYTES PARALLEL 20-WAY RANGE SCAN OVER 
PRODUCT_HISTORY_V3 [2,'1479114284851799852-2-11-118-1577502676',1498867200001] 
- [2,'1479114284851799852-2-11-118-1577502676',1499472000000]
|             SERVER FILTER BY FIRST KEY ONLY
|             SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY ["ID", "TS"]
|         CLIENT MERGE SORT
|         CLIENT TOP 30 ROWS SORTED BY [("ID" || '-'), "TS"]
|     DYNAMIC SERVER FILTER BY (PRODUCT_HISTORY_V3.TS, PRODUCT_HISTORY_V3.ID) 
IN (($494.$497, $494.$496))
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
11 rows selected (0,02 seconds)

12 rows selected (0,021 seconds)
{code}

There, certainly, should be a lot of rows to trigger this behaviour.

  was:
Probably it is already fixed. Having a quick search I didn't find exact problem 
description.

I have a table:
{code:sql}
create immutable table product_history_v3 (
        ts bigint not null,
        id varchar not null,
        product varchar,
        merchantid varchar,
        storeid varchar,
        constraint pk primary key (ts, id)
) compression=LZ4,max_filesize=150000000,memstore_flushsize=70000000,
        versions=1,update_cache_frequency=1000,append_only_schema=true,
        guid_posts_width=10000000,
SALT_BUCKETS=20;
create local index product_history_v3_id_ts on product_history_v3 (id, ts) 
compression=LZ4;
create local index product_history_v3_merchantid_ts on product_history_v3 
(merchantid, ts) include (id) compression=LZ4;
create local index product_history_v3_storeid_ts on product_history_v3 
(storeid, ts) include (id) compression=LZ4;
{code}
Simple select by merchanid ordering by id,ts returns correct results:
{code:sql}
0: jdbc:phoenix:localhost:2181:/hbase> explain select id, ts from 
product_history_v3 where merchantid = '1479114284851799852-2-11-118-1577502676' 
and ts < 1499472000000 and ts > 1498867200000 order by id, ts limit 30;
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+-+
|                                                                               
                      PLAN                                                      
                                               | EST_BYTES_READ  | |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+-+
| CLIENT 20-CHUNK 0 ROWS 0 BYTES PARALLEL 20-WAY RANGE SCAN OVER 
PRODUCT_HISTORY_V3 [2,'1479114284851799852-2-11-118-1577502676',1498867200001] 
- [2,'1479114284851799852-2-11-118-1577502676',1499472000000]  | null           
 | |
|     SERVER FILTER BY FIRST KEY ONLY                                           
                                                                                
                                               | null            | |
|     SERVER TOP 30 ROWS SORTED BY ["ID", "TS"]                                 
                                                                                
                                             | null            | |
| CLIENT MERGE SORT                                                             
                                                                                
                                               | null            | |
| CLIENT LIMIT 30                                                               
                                                                                
                                            | null            | |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+-+
5 rows selected (0,019 seconds)
{code}
It runs very fast until I add {{product}} to selected fields (cause average 
length of {{product}} is 10kb).

So I'm trying to fetch id,ts in subquery, and product in outer query. It runs 
fast, but returns incorrect results: set of rows doesn't match to set of rows 
returned by query above.
{code}
0: jdbc:phoenix:localhost:2181:/hbase> explain select id, ts, 
substr(product,1,30) from product_history_v3 where (id, ts) in (select id, ts 
from product_history_v3 where merchantid = 
'1479114284851799852-2-11-118-1577502676' and ts < 1499472000000 and ts > 
1498867200000 order by id, ts limit 30) order by id, ts limit 30;
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
|                                                                               
                          PLAN                                                  
                                                       | EST_BYTES |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
| CLIENT 40-CHUNK 915204 ROWS 6291521905 BYTES PARALLEL 40-WAY FULL SCAN OVER 
PRODUCT_HISTORY_V3                                                              
                                                         | 0         |
|     SERVER TOP 30 ROWS SORTED BY [PRODUCT_HISTORY_V3.ID, 
PRODUCT_HISTORY_V3.TS]                                                          
                                                                            | 0 
        |
| CLIENT MERGE SORT                                                             
                                                                                
                                                       | 0         |
| CLIENT LIMIT 30                                                               
                                                                                
                                                       | 0         |
|     SKIP-SCAN-JOIN TABLE 0                                                    
                                                                                
                                                       | 0         |
|         CLIENT 20-CHUNK 0 ROWS 0 BYTES PARALLEL 20-WAY RANGE SCAN OVER 
PRODUCT_HISTORY_V3 [2,'1479114284851799852-2-11-118-1577502676',1498867200001] 
- [2,'1479114284851799852-2-11-118-1577502676',1499472000000]  | 0         |
|             SERVER FILTER BY FIRST KEY ONLY                                   
                                                                                
                                                       | 0         |
|             SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY ["ID", "TS"] LIMIT 
30 GROUPS                                                                       
                                                       | 0         |
|         CLIENT MERGE SORT                                                     
                                                                                
                                                       | 0         |
|         CLIENT 30 ROW LIMIT                                                   
                                                                                
                                                       | 0         |
|     DYNAMIC SERVER FILTER BY (PRODUCT_HISTORY_V3.TS, PRODUCT_HISTORY_V3.ID) 
IN (($470.$473, $470.$472))                                                     
                                                         | 0         |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
11 rows selected (0,021 seconds)
{code}
However, if I change ordering a bit, so planner is forced for reordering, then 
set of rows is equal to original query:
{code}
0: jdbc:phoenix:localhost:2181:/hbase> explain select id, ts, 
substr(product,1,30) from product_history_v3 where (id, ts) in (select id, ts 
from product_history_v3 where merchantid = 
'1479114284851799852-2-11-118-1577502676' and ts < 1499472000000 and ts > 
1498867200000 order by id||'-', ts limit 3000) order by id, ts limit 30 offset 
2970;
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
|                                                                               
                          PLAN                                                  
                                                       | EST_BYTES |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
| CLIENT 40-CHUNK 915204 ROWS 6291521905 BYTES PARALLEL 40-WAY FULL SCAN OVER 
PRODUCT_HISTORY_V3                                                              
                                                         | 0         |
|     SERVER TOP 3000 ROWS SORTED BY [PRODUCT_HISTORY_V3.ID, 
PRODUCT_HISTORY_V3.TS]                                                          
                                                                          | 0   
      |
| CLIENT MERGE SORT                                                             
                                                                                
                                                       | 0         |
| CLIENT OFFSET 2970                                                            
                                                                                
                                                       | 0         |
| CLIENT LIMIT 30                                                               
                                                                                
                                                       | 0         |
|     SKIP-SCAN-JOIN TABLE 0                                                    
                                                                                
                                                       | 0         |
|         CLIENT 20-CHUNK 0 ROWS 0 BYTES PARALLEL 20-WAY RANGE SCAN OVER 
PRODUCT_HISTORY_V3 [2,'1479114284851799852-2-11-118-1577502676',1498867200001] 
- [2,'1479114284851799852-2-11-118-1577502676',1499472000000]  | 0         |
|             SERVER FILTER BY FIRST KEY ONLY                                   
                                                                                
                                                       | 0         |
|             SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY ["ID", "TS"]       
                                                                                
                                                       | 0         |
|         CLIENT MERGE SORT                                                     
                                                                                
                                                       | 0         |
|         CLIENT TOP 3000 ROWS SORTED BY [("ID" || '-'), "TS"]                  
                                                                                
                                                       | 0         |
|     DYNAMIC SERVER FILTER BY (PRODUCT_HISTORY_V3.TS, PRODUCT_HISTORY_V3.ID) 
IN (($482.$485, $482.$484))                                                     
                                                         | 0         |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
12 rows selected (0,021 seconds)
{code}

There, certainly, should be a lot of rows to trigger this behaviour.


> Subquery with ORDER BY on salted table gives wrong results
> ----------------------------------------------------------
>
>                 Key: PHOENIX-4504
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4504
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.11.0
>         Environment: amazon emr phoenix 4.11.0 hbase 1.3
>            Reporter: Sokolov Yura
>
> Probably it is already fixed. Having a quick search I didn't find exact 
> problem description.
> I have a table:
> {code:sql}
> create immutable table product_history_v3 (
>         ts bigint not null,
>         id varchar not null,
>         product varchar,
>         merchantid varchar,
>         storeid varchar,
>         constraint pk primary key (ts, id)
> ) compression=LZ4,max_filesize=150000000,memstore_flushsize=70000000,
>         versions=1,update_cache_frequency=1000,append_only_schema=true,
>         guid_posts_width=10000000,
> SALT_BUCKETS=20;
> create local index product_history_v3_id_ts on product_history_v3 (id, ts) 
> compression=LZ4;
> create local index product_history_v3_merchantid_ts on product_history_v3 
> (merchantid, ts) include (id) compression=LZ4;
> create local index product_history_v3_storeid_ts on product_history_v3 
> (storeid, ts) include (id) compression=LZ4;
> {code}
> Simple select by merchanid ordering by id,ts returns correct results:
> {code:sql}
> 0: jdbc:phoenix:localhost:2181:/hbase> explain select id, ts from 
> product_history_v3 where merchantid = 
> '1479114284851799852-2-11-118-1577502676' and ts < 1499472000000 and ts > 
> 1498867200000 order by id, ts limit 30;
> +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
> |                                                                             
>                         PLAN                                                  
>                                                    |
> +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
> | CLIENT 20-CHUNK 0 ROWS 0 BYTES PARALLEL 20-WAY RANGE SCAN OVER 
> PRODUCT_HISTORY_V3 
> [2,'1479114284851799852-2-11-118-1577502676',1498867200001] - 
> [2,'1479114284851799852-2-11-118-1577502676',1499472000000]  |
> |     SERVER FILTER BY FIRST KEY ONLY                                         
>                                                                               
>                                                    |
> |     SERVER TOP 30 ROWS SORTED BY ["ID", "TS"]                               
>                                                                               
>                                                  |
> | CLIENT MERGE SORT                                                           
>                                                                               
>                                                    |
> | CLIENT LIMIT 30                                                             
>                                                                               
>                                                 |
> +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
> 5 rows selected (0,019 seconds)
> {code}
> It runs very fast until I add {{product}} to selected fields (cause average 
> length of {{product}} is 10kb).
> So I'm trying to fetch id,ts in subquery, and product in outer query. It runs 
> fast, but returns incorrect results: set of rows doesn't match to set of rows 
> returned by query above.
> {code}
> 0: jdbc:phoenix:localhost:2181:/hbase> explain select id, ts, 
> substr(product,1,30) from product_history_v3 where (id, ts) in (select id, ts 
> from product_history_v3 where merchantid = 
> '1479114284851799852-2-11-118-1577502676' and ts < 1499472000000 and ts > 
> 1498867200000 order by id, ts limit 30) order by id, ts limit 30;
> +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
> |                                                                             
>                             PLAN                                              
>                                                            |
> +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
> | CLIENT 40-CHUNK 915204 ROWS 6291521905 BYTES PARALLEL 40-WAY FULL SCAN OVER 
> PRODUCT_HISTORY_V3
> |     SERVER TOP 30 ROWS SORTED BY [PRODUCT_HISTORY_V3.ID, 
> PRODUCT_HISTORY_V3.TS]
> | CLIENT MERGE SORT
> | CLIENT LIMIT 30
> |     SKIP-SCAN-JOIN TABLE 0
> |         CLIENT 20-CHUNK 0 ROWS 0 BYTES PARALLEL 20-WAY RANGE SCAN OVER 
> PRODUCT_HISTORY_V3 
> [2,'1479114284851799852-2-11-118-1577502676',1498867200001] - 
> [2,'1479114284851799852-2-11-118-1577502676',1499472000000]
> |             SERVER FILTER BY FIRST KEY ONLY
> |             SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY ["ID", "TS"] 
> LIMIT 30 GROUPS
> |         CLIENT MERGE SORT
> |         CLIENT 30 ROW LIMIT
> |     DYNAMIC SERVER FILTER BY (PRODUCT_HISTORY_V3.TS, PRODUCT_HISTORY_V3.ID) 
> IN (($470.$473, $470.$472))
> +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
> 11 rows selected (0,021 seconds)
> {code}
> However, if I change ordering a bit, so planner is forced for reordering, 
> then set of rows is equal to original query:
> {code}
> 0: jdbc:phoenix:localhost:2181:/hbase> explain select id, ts, 
> substr(product,1,30) from product_history_v3 where (id, ts) in (select id, ts 
> from product_history_v3 where merchantid = 
> '1479114284851799852-2-11-118-1577502676' and ts < 1499472000000 and ts > 
> 1498867200000 order by id||'-', ts limit 30) order by id, ts limit 30;
> +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
> |                                                                             
>                             PLAN
> +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
> | CLIENT 40-CHUNK 915204 ROWS 6291521905 BYTES PARALLEL 40-WAY FULL SCAN OVER 
> PRODUCT_HISTORY_V3
> |     SERVER TOP 30 ROWS SORTED BY [PRODUCT_HISTORY_V3.ID, 
> PRODUCT_HISTORY_V3.TS]
> | CLIENT MERGE SORT
> | CLIENT LIMIT 30
> |     SKIP-SCAN-JOIN TABLE 0
> |         CLIENT 20-CHUNK 0 ROWS 0 BYTES PARALLEL 20-WAY RANGE SCAN OVER 
> PRODUCT_HISTORY_V3 
> [2,'1479114284851799852-2-11-118-1577502676',1498867200001] - 
> [2,'1479114284851799852-2-11-118-1577502676',1499472000000]
> |             SERVER FILTER BY FIRST KEY ONLY
> |             SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY ["ID", "TS"]
> |         CLIENT MERGE SORT
> |         CLIENT TOP 30 ROWS SORTED BY [("ID" || '-'), "TS"]
> |     DYNAMIC SERVER FILTER BY (PRODUCT_HISTORY_V3.TS, PRODUCT_HISTORY_V3.ID) 
> IN (($494.$497, $494.$496))
> +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
> 11 rows selected (0,02 seconds)
> 12 rows selected (0,021 seconds)
> {code}
> There, certainly, should be a lot of rows to trigger this behaviour.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (PHOENIX-4504) Subquery with ORDER BY on salted table gives wrong results

Reply via email to