[ https://issues.apache.org/jira/browse/PHOENIX-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sokolov Yura updated PHOENIX-4504: ---------------------------------- Description: Probably it is already fixed. Having a quick search I didn't find exact problem description. I have a table: {code:sql} create immutable table product_history_v3 ( ts bigint not null, id varchar not null, product varchar, merchantid varchar, storeid varchar, constraint pk primary key (ts, id) ) compression=LZ4,max_filesize=150000000,memstore_flushsize=70000000, versions=1,update_cache_frequency=1000,append_only_schema=true, guid_posts_width=10000000, SALT_BUCKETS=20; create local index product_history_v3_id_ts on product_history_v3 (id, ts) compression=LZ4; create local index product_history_v3_merchantid_ts on product_history_v3 (merchantid, ts) include (id) compression=LZ4; create local index product_history_v3_storeid_ts on product_history_v3 (storeid, ts) include (id) compression=LZ4; {code} Simple select by merchanid ordering by id,ts returns correct results: {code:sql} 0: jdbc:phoenix:localhost:2181:/hbase> explain select id, ts from product_history_v3 where merchantid = '1479114284851799852-2-11-118-1577502676' and ts < 1499472000000 and ts > 1498867200000 order by id, ts limit 30; +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | PLAN | +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | CLIENT 20-CHUNK 0 ROWS 0 BYTES PARALLEL 20-WAY RANGE SCAN OVER PRODUCT_HISTORY_V3 [2,'1479114284851799852-2-11-118-1577502676',1498867200001] - [2,'1479114284851799852-2-11-118-1577502676',1499472000000] | | SERVER FILTER BY FIRST KEY ONLY | | SERVER TOP 30 ROWS SORTED BY ["ID", "TS"] | | CLIENT MERGE SORT | | CLIENT LIMIT 30 | +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 5 rows selected (0,019 seconds) {code} It runs very fast until I add {{product}} to selected fields (cause average length of {{product}} is 10kb). So I'm trying to fetch id,ts in subquery, and product in outer query. It runs fast, but returns incorrect results: set of rows doesn't match to set of rows returned by query above. {code} 0: jdbc:phoenix:localhost:2181:/hbase> explain select id, ts, substr(product,1,30) from product_history_v3 where (id, ts) in (select id, ts from product_history_v3 where merchantid = '1479114284851799852-2-11-118-1577502676' and ts < 1499472000000 and ts > 1498867200000 order by id, ts limit 30) order by id, ts limit 30; +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | PLAN | +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | CLIENT 40-CHUNK 915204 ROWS 6291521905 BYTES PARALLEL 40-WAY FULL SCAN OVER PRODUCT_HISTORY_V3 | SERVER TOP 30 ROWS SORTED BY [PRODUCT_HISTORY_V3.ID, PRODUCT_HISTORY_V3.TS] | CLIENT MERGE SORT | CLIENT LIMIT 30 | SKIP-SCAN-JOIN TABLE 0 | CLIENT 20-CHUNK 0 ROWS 0 BYTES PARALLEL 20-WAY RANGE SCAN OVER PRODUCT_HISTORY_V3 [2,'1479114284851799852-2-11-118-1577502676',1498867200001] - [2,'1479114284851799852-2-11-118-1577502676',1499472000000] | SERVER FILTER BY FIRST KEY ONLY | SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY ["ID", "TS"] LIMIT 30 GROUPS | CLIENT MERGE SORT | CLIENT 30 ROW LIMIT | DYNAMIC SERVER FILTER BY (PRODUCT_HISTORY_V3.TS, PRODUCT_HISTORY_V3.ID) IN (($470.$473, $470.$472)) +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 11 rows selected (0,021 seconds) {code} However, if I change ordering a bit, so planner is forced for reordering, then set of rows is equal to original query: {code} 0: jdbc:phoenix:localhost:2181:/hbase> explain select id, ts, substr(product,1,30) from product_history_v3 where (id, ts) in (select id, ts from product_history_v3 where merchantid = '1479114284851799852-2-11-118-1577502676' and ts < 1499472000000 and ts > 1498867200000 order by id||'-', ts limit 30) order by id, ts limit 30; +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | PLAN +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | CLIENT 40-CHUNK 915204 ROWS 6291521905 BYTES PARALLEL 40-WAY FULL SCAN OVER PRODUCT_HISTORY_V3 | SERVER TOP 30 ROWS SORTED BY [PRODUCT_HISTORY_V3.ID, PRODUCT_HISTORY_V3.TS] | CLIENT MERGE SORT | CLIENT LIMIT 30 | SKIP-SCAN-JOIN TABLE 0 | CLIENT 20-CHUNK 0 ROWS 0 BYTES PARALLEL 20-WAY RANGE SCAN OVER PRODUCT_HISTORY_V3 [2,'1479114284851799852-2-11-118-1577502676',1498867200001] - [2,'1479114284851799852-2-11-118-1577502676',1499472000000] | SERVER FILTER BY FIRST KEY ONLY | SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY ["ID", "TS"] | CLIENT MERGE SORT | CLIENT TOP 30 ROWS SORTED BY [("ID" || '-'), "TS"] | DYNAMIC SERVER FILTER BY (PRODUCT_HISTORY_V3.TS, PRODUCT_HISTORY_V3.ID) IN (($494.$497, $494.$496)) +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+ 11 rows selected (0,02 seconds) 12 rows selected (0,021 seconds) {code} There, certainly, should be a lot of rows to trigger this behaviour. was: Probably it is already fixed. Having a quick search I didn't find exact problem description. I have a table: {code:sql} create immutable table product_history_v3 ( ts bigint not null, id varchar not null, product varchar, merchantid varchar, storeid varchar, constraint pk primary key (ts, id) ) compression=LZ4,max_filesize=150000000,memstore_flushsize=70000000, versions=1,update_cache_frequency=1000,append_only_schema=true, guid_posts_width=10000000, SALT_BUCKETS=20; create local index product_history_v3_id_ts on product_history_v3 (id, ts) compression=LZ4; create local index product_history_v3_merchantid_ts on product_history_v3 (merchantid, ts) include (id) compression=LZ4; create local index product_history_v3_storeid_ts on product_history_v3 (storeid, ts) include (id) compression=LZ4; {code} Simple select by merchanid ordering by id,ts returns correct results: {code:sql} 0: jdbc:phoenix:localhost:2181:/hbase> explain select id, ts from product_history_v3 where merchantid = '1479114284851799852-2-11-118-1577502676' and ts < 1499472000000 and ts > 1498867200000 order by id, ts limit 30; +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+-+ | PLAN | EST_BYTES_READ | | +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+-+ | CLIENT 20-CHUNK 0 ROWS 0 BYTES PARALLEL 20-WAY RANGE SCAN OVER PRODUCT_HISTORY_V3 [2,'1479114284851799852-2-11-118-1577502676',1498867200001] - [2,'1479114284851799852-2-11-118-1577502676',1499472000000] | null | | | SERVER FILTER BY FIRST KEY ONLY | null | | | SERVER TOP 30 ROWS SORTED BY ["ID", "TS"] | null | | | CLIENT MERGE SORT | null | | | CLIENT LIMIT 30 | null | | +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+-+ 5 rows selected (0,019 seconds) {code} It runs very fast until I add {{product}} to selected fields (cause average length of {{product}} is 10kb). So I'm trying to fetch id,ts in subquery, and product in outer query. It runs fast, but returns incorrect results: set of rows doesn't match to set of rows returned by query above. {code} 0: jdbc:phoenix:localhost:2181:/hbase> explain select id, ts, substr(product,1,30) from product_history_v3 where (id, ts) in (select id, ts from product_history_v3 where merchantid = '1479114284851799852-2-11-118-1577502676' and ts < 1499472000000 and ts > 1498867200000 order by id, ts limit 30) order by id, ts limit 30; +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+ | PLAN | EST_BYTES | +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+ | CLIENT 40-CHUNK 915204 ROWS 6291521905 BYTES PARALLEL 40-WAY FULL SCAN OVER PRODUCT_HISTORY_V3 | 0 | | SERVER TOP 30 ROWS SORTED BY [PRODUCT_HISTORY_V3.ID, PRODUCT_HISTORY_V3.TS] | 0 | | CLIENT MERGE SORT | 0 | | CLIENT LIMIT 30 | 0 | | SKIP-SCAN-JOIN TABLE 0 | 0 | | CLIENT 20-CHUNK 0 ROWS 0 BYTES PARALLEL 20-WAY RANGE SCAN OVER PRODUCT_HISTORY_V3 [2,'1479114284851799852-2-11-118-1577502676',1498867200001] - [2,'1479114284851799852-2-11-118-1577502676',1499472000000] | 0 | | SERVER FILTER BY FIRST KEY ONLY | 0 | | SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY ["ID", "TS"] LIMIT 30 GROUPS | 0 | | CLIENT MERGE SORT | 0 | | CLIENT 30 ROW LIMIT | 0 | | DYNAMIC SERVER FILTER BY (PRODUCT_HISTORY_V3.TS, PRODUCT_HISTORY_V3.ID) IN (($470.$473, $470.$472)) | 0 | +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+ 11 rows selected (0,021 seconds) {code} However, if I change ordering a bit, so planner is forced for reordering, then set of rows is equal to original query: {code} 0: jdbc:phoenix:localhost:2181:/hbase> explain select id, ts, substr(product,1,30) from product_history_v3 where (id, ts) in (select id, ts from product_history_v3 where merchantid = '1479114284851799852-2-11-118-1577502676' and ts < 1499472000000 and ts > 1498867200000 order by id||'-', ts limit 3000) order by id, ts limit 30 offset 2970; +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+ | PLAN | EST_BYTES | +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+ | CLIENT 40-CHUNK 915204 ROWS 6291521905 BYTES PARALLEL 40-WAY FULL SCAN OVER PRODUCT_HISTORY_V3 | 0 | | SERVER TOP 3000 ROWS SORTED BY [PRODUCT_HISTORY_V3.ID, PRODUCT_HISTORY_V3.TS] | 0 | | CLIENT MERGE SORT | 0 | | CLIENT OFFSET 2970 | 0 | | CLIENT LIMIT 30 | 0 | | SKIP-SCAN-JOIN TABLE 0 | 0 | | CLIENT 20-CHUNK 0 ROWS 0 BYTES PARALLEL 20-WAY RANGE SCAN OVER PRODUCT_HISTORY_V3 [2,'1479114284851799852-2-11-118-1577502676',1498867200001] - [2,'1479114284851799852-2-11-118-1577502676',1499472000000] | 0 | | SERVER FILTER BY FIRST KEY ONLY | 0 | | SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY ["ID", "TS"] | 0 | | CLIENT MERGE SORT | 0 | | CLIENT TOP 3000 ROWS SORTED BY [("ID" || '-'), "TS"] | 0 | | DYNAMIC SERVER FILTER BY (PRODUCT_HISTORY_V3.TS, PRODUCT_HISTORY_V3.ID) IN (($482.$485, $482.$484)) | 0 | +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+ 12 rows selected (0,021 seconds) {code} There, certainly, should be a lot of rows to trigger this behaviour. > Subquery with ORDER BY on salted table gives wrong results > ---------------------------------------------------------- > > Key: PHOENIX-4504 > URL: https://issues.apache.org/jira/browse/PHOENIX-4504 > Project: Phoenix > Issue Type: Bug > Affects Versions: 4.11.0 > Environment: amazon emr phoenix 4.11.0 hbase 1.3 > Reporter: Sokolov Yura > > Probably it is already fixed. Having a quick search I didn't find exact > problem description. > I have a table: > {code:sql} > create immutable table product_history_v3 ( > ts bigint not null, > id varchar not null, > product varchar, > merchantid varchar, > storeid varchar, > constraint pk primary key (ts, id) > ) compression=LZ4,max_filesize=150000000,memstore_flushsize=70000000, > versions=1,update_cache_frequency=1000,append_only_schema=true, > guid_posts_width=10000000, > SALT_BUCKETS=20; > create local index product_history_v3_id_ts on product_history_v3 (id, ts) > compression=LZ4; > create local index product_history_v3_merchantid_ts on product_history_v3 > (merchantid, ts) include (id) compression=LZ4; > create local index product_history_v3_storeid_ts on product_history_v3 > (storeid, ts) include (id) compression=LZ4; > {code} > Simple select by merchanid ordering by id,ts returns correct results: > {code:sql} > 0: jdbc:phoenix:localhost:2181:/hbase> explain select id, ts from > product_history_v3 where merchantid = > '1479114284851799852-2-11-118-1577502676' and ts < 1499472000000 and ts > > 1498867200000 order by id, ts limit 30; > +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ > | > PLAN > | > +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ > | CLIENT 20-CHUNK 0 ROWS 0 BYTES PARALLEL 20-WAY RANGE SCAN OVER > PRODUCT_HISTORY_V3 > [2,'1479114284851799852-2-11-118-1577502676',1498867200001] - > [2,'1479114284851799852-2-11-118-1577502676',1499472000000] | > | SERVER FILTER BY FIRST KEY ONLY > > | > | SERVER TOP 30 ROWS SORTED BY ["ID", "TS"] > > | > | CLIENT MERGE SORT > > | > | CLIENT LIMIT 30 > > | > +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ > 5 rows selected (0,019 seconds) > {code} > It runs very fast until I add {{product}} to selected fields (cause average > length of {{product}} is 10kb). > So I'm trying to fetch id,ts in subquery, and product in outer query. It runs > fast, but returns incorrect results: set of rows doesn't match to set of rows > returned by query above. > {code} > 0: jdbc:phoenix:localhost:2181:/hbase> explain select id, ts, > substr(product,1,30) from product_history_v3 where (id, ts) in (select id, ts > from product_history_v3 where merchantid = > '1479114284851799852-2-11-118-1577502676' and ts < 1499472000000 and ts > > 1498867200000 order by id, ts limit 30) order by id, ts limit 30; > +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ > | > PLAN > | > +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ > | CLIENT 40-CHUNK 915204 ROWS 6291521905 BYTES PARALLEL 40-WAY FULL SCAN OVER > PRODUCT_HISTORY_V3 > | SERVER TOP 30 ROWS SORTED BY [PRODUCT_HISTORY_V3.ID, > PRODUCT_HISTORY_V3.TS] > | CLIENT MERGE SORT > | CLIENT LIMIT 30 > | SKIP-SCAN-JOIN TABLE 0 > | CLIENT 20-CHUNK 0 ROWS 0 BYTES PARALLEL 20-WAY RANGE SCAN OVER > PRODUCT_HISTORY_V3 > [2,'1479114284851799852-2-11-118-1577502676',1498867200001] - > [2,'1479114284851799852-2-11-118-1577502676',1499472000000] > | SERVER FILTER BY FIRST KEY ONLY > | SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY ["ID", "TS"] > LIMIT 30 GROUPS > | CLIENT MERGE SORT > | CLIENT 30 ROW LIMIT > | DYNAMIC SERVER FILTER BY (PRODUCT_HISTORY_V3.TS, PRODUCT_HISTORY_V3.ID) > IN (($470.$473, $470.$472)) > +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ > 11 rows selected (0,021 seconds) > {code} > However, if I change ordering a bit, so planner is forced for reordering, > then set of rows is equal to original query: > {code} > 0: jdbc:phoenix:localhost:2181:/hbase> explain select id, ts, > substr(product,1,30) from product_history_v3 where (id, ts) in (select id, ts > from product_history_v3 where merchantid = > '1479114284851799852-2-11-118-1577502676' and ts < 1499472000000 and ts > > 1498867200000 order by id||'-', ts limit 30) order by id, ts limit 30; > +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ > | > PLAN > +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ > | CLIENT 40-CHUNK 915204 ROWS 6291521905 BYTES PARALLEL 40-WAY FULL SCAN OVER > PRODUCT_HISTORY_V3 > | SERVER TOP 30 ROWS SORTED BY [PRODUCT_HISTORY_V3.ID, > PRODUCT_HISTORY_V3.TS] > | CLIENT MERGE SORT > | CLIENT LIMIT 30 > | SKIP-SCAN-JOIN TABLE 0 > | CLIENT 20-CHUNK 0 ROWS 0 BYTES PARALLEL 20-WAY RANGE SCAN OVER > PRODUCT_HISTORY_V3 > [2,'1479114284851799852-2-11-118-1577502676',1498867200001] - > [2,'1479114284851799852-2-11-118-1577502676',1499472000000] > | SERVER FILTER BY FIRST KEY ONLY > | SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY ["ID", "TS"] > | CLIENT MERGE SORT > | CLIENT TOP 30 ROWS SORTED BY [("ID" || '-'), "TS"] > | DYNAMIC SERVER FILTER BY (PRODUCT_HISTORY_V3.TS, PRODUCT_HISTORY_V3.ID) > IN (($494.$497, $494.$496)) > +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+ > 11 rows selected (0,02 seconds) > 12 rows selected (0,021 seconds) > {code} > There, certainly, should be a lot of rows to trigger this behaviour. -- This message was sent by Atlassian JIRA (v6.4.14#64029)