subject:"\[jira\] \[Commented\] \(ASTERIXDB\-1186\) Index Join Hint on an index with composite keys causes query to return incorrect results"

[jira] [Commented] (ASTERIXDB-1186) Index Join Hint on an index with composite keys causes query to return incorrect results

2016-08-25 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/ASTERIXDB-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437541#comment-15437541
 ] 

ASF subversion and git services commented on ASTERIXDB-1186:


Commit 52a9d992dfadf6bb3057968416bf217c9a42dffb in asterixdb's branch 
refs/heads/master from [~wangsaeu]
[ https://git-wip-us.apache.org/repos/asf?p=asterixdb.git;h=52a9d99 ]

ASTERIXDB-1186: fixed a composite primary index search issue.

 - Providing only one field to a composite primary index search
   now works properly by broadcasting search predicates to all nodes.

Change-Id: I0f75229ea804a02e7cacd04f7269a4907668eab0
Reviewed-on: https://asterix-gerrit.ics.uci.edu/1094
Sonar-Qube: Jenkins 
Tested-by: Jenkins 
Integration-Tests: Jenkins 
Reviewed-by: Taewoo Kim 


> Index Join Hint on an index with composite keys causes query to return 
> incorrect results
> 
>
> Key: ASTERIXDB-1186
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1186
> Project: Apache AsterixDB
>  Issue Type: Bug
>  Components: AsterixDB, Optimizer
> Environment: AsterixDB 0.8.7-SNAPSHOT
>Reporter: Pouria
>Assignee: Taewoo Kim
>Priority: Critical
>  Labels: soon
>
> If one of the attributes in the join predicate is a *composite* PK (consists 
> of more than one attribute), if the Index Join hint is used the query plan 
> includes wrong exchange (partitioning) which results in wrong results (only a 
> subset of the correct total results will be returned).
> The root cause of it is that the records from the PK side (which is 
> composite) are hash-partitioned on a "combination" of hash values of all 
> attributes in the PK. As a result the records of the non-PK side need to be 
> sent to "all" partitions, but the plan contains the wrong exchange and the 
> runtime wont send them to all partitions. 
> Below is an example:
> > Here is DDL:
> create type LineItemType as {
>   l_orderkey: int64,
>   l_partkey: int64,
>   l_suppkey: int64,
>   l_linenumber: int32,
>   l_quantity: int32,
>   l_extendedprice: double,
>   l_discount: double,
>   l_tax: double,
>   l_returnflag: string,
>   l_linestatus: string,
>   l_shipdate: string,
>   l_commitdate: string,
>   l_receiptdate: string,
>   l_shipinstruct: string,
>   l_shipmode: string,
>   l_comment: string
> }
> create type OrderType as {
>   o_orderkey: int64,
>   o_custkey: int64,
>   o_orderstatus: string,
>   o_totalprice: double,
>   o_orderdate: string,
>   o_orderpriority: string,
>   o_clerk: string,
>   o_shippriority: int32,
>   o_comment: string
> }
> create dataset LineItem(LineItemType) primary key l_orderkey, l_linenumber;
> create dataset Orders(OrderType)  primary key o_orderkey;
> > The following query returns wrong (a subset) of results:
> for $o in dataset('Orders')
> for $l in dataset('LineItem')
> where 
> $l.l_orderkey  /*+ indexnl */ = $o.o_orderkey
> return{
> "o_orderkey": $o.o_orderkey,
> "l_orderkey": $l.l_orderkey
> }
> Here is the plan - As you can see the Orders record are One-on-One exchanged 
> (rather than broadcast). The issue is that the l_orderkey is not the 
> partitioning attribute of LineItem, and matching Orders for a specific 
> order_key can be in any partition (depending on their l_linenumber value - 
> which is the 2nd component of PK in LineItem)
> distribute result [%0->$$13]
> -- DISTRIBUTE_RESULT  |UNPARTITIONED|
>   exchange 
>   -- ONE_TO_ONE_EXCHANGE  |UNPARTITIONED|
> aggregate [$$13] <- [function-call: asterix:agg-sum, Args:[%0->$$19]]
> -- AGGREGATE  |UNPARTITIONED|
>   exchange 
>   -- RANDOM_MERGE_EXCHANGE  |PARTITIONED|
> aggregate [$$19] <- [function-call: asterix:agg-count, Args:[%0->$$9]]
> -- AGGREGATE  |PARTITIONED|
>   project ([$$9])
>   -- STREAM_PROJECT  |PARTITIONED|
> assign [$$9] <- [function-call: 
> asterix:closed-record-constructor, Args:[AString: {o_orderkey}, %0->$$16, 
> AString: {l_orderkey}, %0->$$17]]
> -- ASSIGN  |PARTITIONED|
>   project ([$$17, $$16])
>   -- STREAM_PROJECT  |PARTITIONED|
> exchange 
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   unnest-map [$$17, $$18, $$1] <- function-call: 
> asterix:index-search, Args:[AString: {LineItem}, AInt32: {0}, AString: 
> {dummy}, AString: {LineItem}, ABoolean: {true}, ABoolean: {false}, ABoolean: 
> {true}, AInt32: {1}, %0->$$16, AInt32: {1}, %0->$$16, TRUE, TRUE, TRUE]
>   -- BTREE_SEARCH  |PARTITIONED|
> exchange 
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>

[jira] [Commented] (ASTERIXDB-1186) Index Join Hint on an index with composite keys causes query to return incorrect results

2016-08-19 Thread Taewoo Kim (JIRA)


[ 
https://issues.apache.org/jira/browse/ASTERIXDB-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15429157#comment-15429157
 ] 

Taewoo Kim commented on ASTERIXDB-1186:
---

I have uploaded a patch set: https://asterix-gerrit.ics.uci.edu/#/c/1094/

> Index Join Hint on an index with composite keys causes query to return 
> incorrect results
> 
>
> Key: ASTERIXDB-1186
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1186
> Project: Apache AsterixDB
>  Issue Type: Bug
>  Components: AsterixDB, Optimizer
> Environment: AsterixDB 0.8.7-SNAPSHOT
>Reporter: Pouria
>Assignee: Taewoo Kim
>Priority: Critical
>  Labels: soon
>
> If one of the attributes in the join predicate is a *composite* PK (consists 
> of more than one attribute), if the Index Join hint is used the query plan 
> includes wrong exchange (partitioning) which results in wrong results (only a 
> subset of the correct total results will be returned).
> The root cause of it is that the records from the PK side (which is 
> composite) are hash-partitioned on a "combination" of hash values of all 
> attributes in the PK. As a result the records of the non-PK side need to be 
> sent to "all" partitions, but the plan contains the wrong exchange and the 
> runtime wont send them to all partitions. 
> Below is an example:
> > Here is DDL:
> create type LineItemType as {
>   l_orderkey: int64,
>   l_partkey: int64,
>   l_suppkey: int64,
>   l_linenumber: int32,
>   l_quantity: int32,
>   l_extendedprice: double,
>   l_discount: double,
>   l_tax: double,
>   l_returnflag: string,
>   l_linestatus: string,
>   l_shipdate: string,
>   l_commitdate: string,
>   l_receiptdate: string,
>   l_shipinstruct: string,
>   l_shipmode: string,
>   l_comment: string
> }
> create type OrderType as {
>   o_orderkey: int64,
>   o_custkey: int64,
>   o_orderstatus: string,
>   o_totalprice: double,
>   o_orderdate: string,
>   o_orderpriority: string,
>   o_clerk: string,
>   o_shippriority: int32,
>   o_comment: string
> }
> create dataset LineItem(LineItemType) primary key l_orderkey, l_linenumber;
> create dataset Orders(OrderType)  primary key o_orderkey;
> > The following query returns wrong (a subset) of results:
> for $o in dataset('Orders')
> for $l in dataset('LineItem')
> where 
> $l.l_orderkey  /*+ indexnl */ = $o.o_orderkey
> return{
> "o_orderkey": $o.o_orderkey,
> "l_orderkey": $l.l_orderkey
> }
> Here is the plan - As you can see the Orders record are One-on-One exchanged 
> (rather than broadcast). The issue is that the l_orderkey is not the 
> partitioning attribute of LineItem, and matching Orders for a specific 
> order_key can be in any partition (depending on their l_linenumber value - 
> which is the 2nd component of PK in LineItem)
> distribute result [%0->$$13]
> -- DISTRIBUTE_RESULT  |UNPARTITIONED|
>   exchange 
>   -- ONE_TO_ONE_EXCHANGE  |UNPARTITIONED|
> aggregate [$$13] <- [function-call: asterix:agg-sum, Args:[%0->$$19]]
> -- AGGREGATE  |UNPARTITIONED|
>   exchange 
>   -- RANDOM_MERGE_EXCHANGE  |PARTITIONED|
> aggregate [$$19] <- [function-call: asterix:agg-count, Args:[%0->$$9]]
> -- AGGREGATE  |PARTITIONED|
>   project ([$$9])
>   -- STREAM_PROJECT  |PARTITIONED|
> assign [$$9] <- [function-call: 
> asterix:closed-record-constructor, Args:[AString: {o_orderkey}, %0->$$16, 
> AString: {l_orderkey}, %0->$$17]]
> -- ASSIGN  |PARTITIONED|
>   project ([$$17, $$16])
>   -- STREAM_PROJECT  |PARTITIONED|
> exchange 
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   unnest-map [$$17, $$18, $$1] <- function-call: 
> asterix:index-search, Args:[AString: {LineItem}, AInt32: {0}, AString: 
> {dummy}, AString: {LineItem}, ABoolean: {true}, ABoolean: {false}, ABoolean: 
> {true}, AInt32: {1}, %0->$$16, AInt32: {1}, %0->$$16, TRUE, TRUE, TRUE]
>   -- BTREE_SEARCH  |PARTITIONED|
> exchange 
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   project ([$$16])
>   -- STREAM_PROJECT  |PARTITIONED|
> exchange 
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   data-scan []<-[$$16, $$0] <- dummy:Orders
>   -- DATASOURCE_SCAN  |PARTITIONED|
> exchange 
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   empty-tuple-source
>   -- EMPTY_TUPLE_SOURCE  |PARTITIONED|
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ASTERIXDB-1186) Index Join Hint on an index with composite keys causes query to return incorrect results

2016-08-18 Thread Till (JIRA)


[ 
https://issues.apache.org/jira/browse/ASTERIXDB-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427588#comment-15427588
 ] 

Till commented on ASTERIXDB-1186:
-

Ok, got it now - basically 2 cases in which we don't deal correctly with 
composite keys. 

> Index Join Hint on an index with composite keys causes query to return 
> incorrect results
> 
>
> Key: ASTERIXDB-1186
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1186
> Project: Apache AsterixDB
>  Issue Type: Bug
>  Components: AsterixDB, Optimizer
> Environment: AsterixDB 0.8.7-SNAPSHOT
>Reporter: Pouria
>Assignee: Taewoo Kim
>Priority: Critical
>  Labels: soon
>
> If one of the attributes in the join predicate is a *composite* PK (consists 
> of more than one attribute), if the Index Join hint is used the query plan 
> includes wrong exchange (partitioning) which results in wrong results (only a 
> subset of the correct total results will be returned).
> The root cause of it is that the records from the PK side (which is 
> composite) are hash-partitioned on a "combination" of hash values of all 
> attributes in the PK. As a result the records of the non-PK side need to be 
> sent to "all" partitions, but the plan contains the wrong exchange and the 
> runtime wont send them to all partitions. 
> Below is an example:
> > Here is DDL:
> create type LineItemType as {
>   l_orderkey: int64,
>   l_partkey: int64,
>   l_suppkey: int64,
>   l_linenumber: int32,
>   l_quantity: int32,
>   l_extendedprice: double,
>   l_discount: double,
>   l_tax: double,
>   l_returnflag: string,
>   l_linestatus: string,
>   l_shipdate: string,
>   l_commitdate: string,
>   l_receiptdate: string,
>   l_shipinstruct: string,
>   l_shipmode: string,
>   l_comment: string
> }
> create type OrderType as {
>   o_orderkey: int64,
>   o_custkey: int64,
>   o_orderstatus: string,
>   o_totalprice: double,
>   o_orderdate: string,
>   o_orderpriority: string,
>   o_clerk: string,
>   o_shippriority: int32,
>   o_comment: string
> }
> create dataset LineItem(LineItemType) primary key l_orderkey, l_linenumber;
> create dataset Orders(OrderType)  primary key o_orderkey;
> > The following query returns wrong (a subset) of results:
> for $o in dataset('Orders')
> for $l in dataset('LineItem')
> where 
> $l.l_orderkey  /*+ indexnl */ = $o.o_orderkey
> return{
> "o_orderkey": $o.o_orderkey,
> "l_orderkey": $l.l_orderkey
> }
> Here is the plan - As you can see the Orders record are One-on-One exchanged 
> (rather than broadcast). The issue is that the l_orderkey is not the 
> partitioning attribute of LineItem, and matching Orders for a specific 
> order_key can be in any partition (depending on their l_linenumber value - 
> which is the 2nd component of PK in LineItem)
> distribute result [%0->$$13]
> -- DISTRIBUTE_RESULT  |UNPARTITIONED|
>   exchange 
>   -- ONE_TO_ONE_EXCHANGE  |UNPARTITIONED|
> aggregate [$$13] <- [function-call: asterix:agg-sum, Args:[%0->$$19]]
> -- AGGREGATE  |UNPARTITIONED|
>   exchange 
>   -- RANDOM_MERGE_EXCHANGE  |PARTITIONED|
> aggregate [$$19] <- [function-call: asterix:agg-count, Args:[%0->$$9]]
> -- AGGREGATE  |PARTITIONED|
>   project ([$$9])
>   -- STREAM_PROJECT  |PARTITIONED|
> assign [$$9] <- [function-call: 
> asterix:closed-record-constructor, Args:[AString: {o_orderkey}, %0->$$16, 
> AString: {l_orderkey}, %0->$$17]]
> -- ASSIGN  |PARTITIONED|
>   project ([$$17, $$16])
>   -- STREAM_PROJECT  |PARTITIONED|
> exchange 
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   unnest-map [$$17, $$18, $$1] <- function-call: 
> asterix:index-search, Args:[AString: {LineItem}, AInt32: {0}, AString: 
> {dummy}, AString: {LineItem}, ABoolean: {true}, ABoolean: {false}, ABoolean: 
> {true}, AInt32: {1}, %0->$$16, AInt32: {1}, %0->$$16, TRUE, TRUE, TRUE]
>   -- BTREE_SEARCH  |PARTITIONED|
> exchange 
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   project ([$$16])
>   -- STREAM_PROJECT  |PARTITIONED|
> exchange 
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   data-scan []<-[$$16, $$0] <- dummy:Orders
>   -- DATASOURCE_SCAN  |PARTITIONED|
> exchange 
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   empty-tuple-source
>   -- EMPTY_TUPLE_SOURCE  |PARTITIONED|
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ASTERIXDB-1186) Index Join Hint on an index with composite keys causes query to return incorrect results

2016-08-18 Thread Taewoo Kim (JIRA)


[ 
https://issues.apache.org/jira/browse/ASTERIXDB-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427581#comment-15427581
 ] 

Taewoo Kim commented on ASTERIXDB-1186:
---

Sure. The common part is "composite PK". But, this is about broadcasting search 
predicates and the other is about generating false positive results during an 
index-search.

> Index Join Hint on an index with composite keys causes query to return 
> incorrect results
> 
>
> Key: ASTERIXDB-1186
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1186
> Project: Apache AsterixDB
>  Issue Type: Bug
>  Components: AsterixDB, Optimizer
> Environment: AsterixDB 0.8.7-SNAPSHOT
>Reporter: Pouria
>Assignee: Taewoo Kim
>Priority: Critical
>  Labels: soon
>
> If one of the attributes in the join predicate is a *composite* PK (consists 
> of more than one attribute), if the Index Join hint is used the query plan 
> includes wrong exchange (partitioning) which results in wrong results (only a 
> subset of the correct total results will be returned).
> The root cause of it is that the records from the PK side (which is 
> composite) are hash-partitioned on a "combination" of hash values of all 
> attributes in the PK. As a result the records of the non-PK side need to be 
> sent to "all" partitions, but the plan contains the wrong exchange and the 
> runtime wont send them to all partitions. 
> Below is an example:
> > Here is DDL:
> create type LineItemType as {
>   l_orderkey: int64,
>   l_partkey: int64,
>   l_suppkey: int64,
>   l_linenumber: int32,
>   l_quantity: int32,
>   l_extendedprice: double,
>   l_discount: double,
>   l_tax: double,
>   l_returnflag: string,
>   l_linestatus: string,
>   l_shipdate: string,
>   l_commitdate: string,
>   l_receiptdate: string,
>   l_shipinstruct: string,
>   l_shipmode: string,
>   l_comment: string
> }
> create type OrderType as {
>   o_orderkey: int64,
>   o_custkey: int64,
>   o_orderstatus: string,
>   o_totalprice: double,
>   o_orderdate: string,
>   o_orderpriority: string,
>   o_clerk: string,
>   o_shippriority: int32,
>   o_comment: string
> }
> create dataset LineItem(LineItemType) primary key l_orderkey, l_linenumber;
> create dataset Orders(OrderType)  primary key o_orderkey;
> > The following query returns wrong (a subset) of results:
> for $o in dataset('Orders')
> for $l in dataset('LineItem')
> where 
> $l.l_orderkey  /*+ indexnl */ = $o.o_orderkey
> return{
> "o_orderkey": $o.o_orderkey,
> "l_orderkey": $l.l_orderkey
> }
> Here is the plan - As you can see the Orders record are One-on-One exchanged 
> (rather than broadcast). The issue is that the l_orderkey is not the 
> partitioning attribute of LineItem, and matching Orders for a specific 
> order_key can be in any partition (depending on their l_linenumber value - 
> which is the 2nd component of PK in LineItem)
> distribute result [%0->$$13]
> -- DISTRIBUTE_RESULT  |UNPARTITIONED|
>   exchange 
>   -- ONE_TO_ONE_EXCHANGE  |UNPARTITIONED|
> aggregate [$$13] <- [function-call: asterix:agg-sum, Args:[%0->$$19]]
> -- AGGREGATE  |UNPARTITIONED|
>   exchange 
>   -- RANDOM_MERGE_EXCHANGE  |PARTITIONED|
> aggregate [$$19] <- [function-call: asterix:agg-count, Args:[%0->$$9]]
> -- AGGREGATE  |PARTITIONED|
>   project ([$$9])
>   -- STREAM_PROJECT  |PARTITIONED|
> assign [$$9] <- [function-call: 
> asterix:closed-record-constructor, Args:[AString: {o_orderkey}, %0->$$16, 
> AString: {l_orderkey}, %0->$$17]]
> -- ASSIGN  |PARTITIONED|
>   project ([$$17, $$16])
>   -- STREAM_PROJECT  |PARTITIONED|
> exchange 
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   unnest-map [$$17, $$18, $$1] <- function-call: 
> asterix:index-search, Args:[AString: {LineItem}, AInt32: {0}, AString: 
> {dummy}, AString: {LineItem}, ABoolean: {true}, ABoolean: {false}, ABoolean: 
> {true}, AInt32: {1}, %0->$$16, AInt32: {1}, %0->$$16, TRUE, TRUE, TRUE]
>   -- BTREE_SEARCH  |PARTITIONED|
> exchange 
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   project ([$$16])
>   -- STREAM_PROJECT  |PARTITIONED|
> exchange 
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   data-scan []<-[$$16, $$0] <- dummy:Orders
>   -- DATASOURCE_SCAN  |PARTITIONED|
> exchange 
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   empty-tuple-source
>   -- EMPTY_TUPLE_SOURCE  |PARTITIONED|
>  



--
This

[jira] [Commented] (ASTERIXDB-1186) Index Join Hint on an index with composite keys causes query to return incorrect results

2016-08-18 Thread Till (JIRA)


[ 
https://issues.apache.org/jira/browse/ASTERIXDB-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427577#comment-15427577
 ] 

Till commented on ASTERIXDB-1186:
-

But the difference between the plans here is that in one plan (the working 
plan) we do an index lookup on a primary key with a single attribute. And in 
the other (broken) plans we do a index lookup on one attribute of a primary key 
with a composite key. And ASTERIXDB-920 is about "Wrong results for a primary 
index with composite keys". That's why I thought that this might be related.


> Index Join Hint on an index with composite keys causes query to return 
> incorrect results
> 
>
> Key: ASTERIXDB-1186
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1186
> Project: Apache AsterixDB
>  Issue Type: Bug
>  Components: AsterixDB, Optimizer
> Environment: AsterixDB 0.8.7-SNAPSHOT
>Reporter: Pouria
>Assignee: Taewoo Kim
>Priority: Critical
>  Labels: soon
>
> If one of the attributes in the join predicate is a *composite* PK (consists 
> of more than one attribute), if the Index Join hint is used the query plan 
> includes wrong exchange (partitioning) which results in wrong results (only a 
> subset of the correct total results will be returned).
> The root cause of it is that the records from the PK side (which is 
> composite) are hash-partitioned on a "combination" of hash values of all 
> attributes in the PK. As a result the records of the non-PK side need to be 
> sent to "all" partitions, but the plan contains the wrong exchange and the 
> runtime wont send them to all partitions. 
> Below is an example:
> > Here is DDL:
> create type LineItemType as {
>   l_orderkey: int64,
>   l_partkey: int64,
>   l_suppkey: int64,
>   l_linenumber: int32,
>   l_quantity: int32,
>   l_extendedprice: double,
>   l_discount: double,
>   l_tax: double,
>   l_returnflag: string,
>   l_linestatus: string,
>   l_shipdate: string,
>   l_commitdate: string,
>   l_receiptdate: string,
>   l_shipinstruct: string,
>   l_shipmode: string,
>   l_comment: string
> }
> create type OrderType as {
>   o_orderkey: int64,
>   o_custkey: int64,
>   o_orderstatus: string,
>   o_totalprice: double,
>   o_orderdate: string,
>   o_orderpriority: string,
>   o_clerk: string,
>   o_shippriority: int32,
>   o_comment: string
> }
> create dataset LineItem(LineItemType) primary key l_orderkey, l_linenumber;
> create dataset Orders(OrderType)  primary key o_orderkey;
> > The following query returns wrong (a subset) of results:
> for $o in dataset('Orders')
> for $l in dataset('LineItem')
> where 
> $l.l_orderkey  /*+ indexnl */ = $o.o_orderkey
> return{
> "o_orderkey": $o.o_orderkey,
> "l_orderkey": $l.l_orderkey
> }
> Here is the plan - As you can see the Orders record are One-on-One exchanged 
> (rather than broadcast). The issue is that the l_orderkey is not the 
> partitioning attribute of LineItem, and matching Orders for a specific 
> order_key can be in any partition (depending on their l_linenumber value - 
> which is the 2nd component of PK in LineItem)
> distribute result [%0->$$13]
> -- DISTRIBUTE_RESULT  |UNPARTITIONED|
>   exchange 
>   -- ONE_TO_ONE_EXCHANGE  |UNPARTITIONED|
> aggregate [$$13] <- [function-call: asterix:agg-sum, Args:[%0->$$19]]
> -- AGGREGATE  |UNPARTITIONED|
>   exchange 
>   -- RANDOM_MERGE_EXCHANGE  |PARTITIONED|
> aggregate [$$19] <- [function-call: asterix:agg-count, Args:[%0->$$9]]
> -- AGGREGATE  |PARTITIONED|
>   project ([$$9])
>   -- STREAM_PROJECT  |PARTITIONED|
> assign [$$9] <- [function-call: 
> asterix:closed-record-constructor, Args:[AString: {o_orderkey}, %0->$$16, 
> AString: {l_orderkey}, %0->$$17]]
> -- ASSIGN  |PARTITIONED|
>   project ([$$17, $$16])
>   -- STREAM_PROJECT  |PARTITIONED|
> exchange 
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   unnest-map [$$17, $$18, $$1] <- function-call: 
> asterix:index-search, Args:[AString: {LineItem}, AInt32: {0}, AString: 
> {dummy}, AString: {LineItem}, ABoolean: {true}, ABoolean: {false}, ABoolean: 
> {true}, AInt32: {1}, %0->$$16, AInt32: {1}, %0->$$16, TRUE, TRUE, TRUE]
>   -- BTREE_SEARCH  |PARTITIONED|
> exchange 
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   project ([$$16])
>   -- STREAM_PROJECT  |PARTITIONED|
> exchange 
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   data-scan []<-[$$16, $$0] <- dummy:Orders
>   -- DATASOURCE_SCAN  |PARTITIONED|
> exchange

[jira] [Commented] (ASTERIXDB-1186) Index Join Hint on an index with composite keys causes query to return incorrect results

2016-08-18 Thread Till (JIRA)


[ 
https://issues.apache.org/jira/browse/ASTERIXDB-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427437#comment-15427437
 ] 

Till commented on ASTERIXDB-1186:
-

Related to ASTERIXDB-920 ?

> Index Join Hint on an index with composite keys causes query to return 
> incorrect results
> 
>
> Key: ASTERIXDB-1186
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1186
> Project: Apache AsterixDB
>  Issue Type: Bug
>  Components: AsterixDB, Optimizer
> Environment: AsterixDB 0.8.7-SNAPSHOT
>Reporter: Pouria
>Assignee: Taewoo Kim
>Priority: Critical
>  Labels: soon
>
> If one of the attributes in the join predicate is a *composite* PK (consists 
> of more than one attribute), if the Index Join hint is used the query plan 
> includes wrong exchange (partitioning) which results in wrong results (only a 
> subset of the correct total results will be returned).
> The root cause of it is that the records from the PK side (which is 
> composite) are hash-partitioned on a "combination" of hash values of all 
> attributes in the PK. As a result the records of the non-PK side need to be 
> sent to "all" partitions, but the plan contains the wrong exchange and the 
> runtime wont send them to all partitions. 
> Below is an example:
> > Here is DDL:
> create type LineItemType as {
>   l_orderkey: int64,
>   l_partkey: int64,
>   l_suppkey: int64,
>   l_linenumber: int32,
>   l_quantity: int32,
>   l_extendedprice: double,
>   l_discount: double,
>   l_tax: double,
>   l_returnflag: string,
>   l_linestatus: string,
>   l_shipdate: string,
>   l_commitdate: string,
>   l_receiptdate: string,
>   l_shipinstruct: string,
>   l_shipmode: string,
>   l_comment: string
> }
> create type OrderType as {
>   o_orderkey: int64,
>   o_custkey: int64,
>   o_orderstatus: string,
>   o_totalprice: double,
>   o_orderdate: string,
>   o_orderpriority: string,
>   o_clerk: string,
>   o_shippriority: int32,
>   o_comment: string
> }
> create dataset LineItem(LineItemType) primary key l_orderkey, l_linenumber;
> create dataset Orders(OrderType)  primary key o_orderkey;
> > The following query returns wrong (a subset) of results:
> for $o in dataset('Orders')
> for $l in dataset('LineItem')
> where 
> $l.l_orderkey  /*+ indexnl */ = $o.o_orderkey
> return{
> "o_orderkey": $o.o_orderkey,
> "l_orderkey": $l.l_orderkey
> }
> Here is the plan - As you can see the Orders record are One-on-One exchanged 
> (rather than broadcast). The issue is that the l_orderkey is not the 
> partitioning attribute of LineItem, and matching Orders for a specific 
> order_key can be in any partition (depending on their l_linenumber value - 
> which is the 2nd component of PK in LineItem)
> distribute result [%0->$$13]
> -- DISTRIBUTE_RESULT  |UNPARTITIONED|
>   exchange 
>   -- ONE_TO_ONE_EXCHANGE  |UNPARTITIONED|
> aggregate [$$13] <- [function-call: asterix:agg-sum, Args:[%0->$$19]]
> -- AGGREGATE  |UNPARTITIONED|
>   exchange 
>   -- RANDOM_MERGE_EXCHANGE  |PARTITIONED|
> aggregate [$$19] <- [function-call: asterix:agg-count, Args:[%0->$$9]]
> -- AGGREGATE  |PARTITIONED|
>   project ([$$9])
>   -- STREAM_PROJECT  |PARTITIONED|
> assign [$$9] <- [function-call: 
> asterix:closed-record-constructor, Args:[AString: {o_orderkey}, %0->$$16, 
> AString: {l_orderkey}, %0->$$17]]
> -- ASSIGN  |PARTITIONED|
>   project ([$$17, $$16])
>   -- STREAM_PROJECT  |PARTITIONED|
> exchange 
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   unnest-map [$$17, $$18, $$1] <- function-call: 
> asterix:index-search, Args:[AString: {LineItem}, AInt32: {0}, AString: 
> {dummy}, AString: {LineItem}, ABoolean: {true}, ABoolean: {false}, ABoolean: 
> {true}, AInt32: {1}, %0->$$16, AInt32: {1}, %0->$$16, TRUE, TRUE, TRUE]
>   -- BTREE_SEARCH  |PARTITIONED|
> exchange 
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   project ([$$16])
>   -- STREAM_PROJECT  |PARTITIONED|
> exchange 
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   data-scan []<-[$$16, $$0] <- dummy:Orders
>   -- DATASOURCE_SCAN  |PARTITIONED|
> exchange 
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   empty-tuple-source
>   -- EMPTY_TUPLE_SOURCE  |PARTITIONED|
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ASTERIXDB-1186) Index Join Hint on an index with composite keys causes query to return incorrect results

2016-08-18 Thread Taewoo Kim (JIRA)


[ 
https://issues.apache.org/jira/browse/ASTERIXDB-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427356#comment-15427356
 ] 

Taewoo Kim commented on ASTERIXDB-1186:
---

The result on the current master is Interesting: the first query generates the 
correct (LineItem - outer, Orders - inner) result while the second (LineItem - 
inner, Orders - outer) generates wrong result. In summary, if LineItem is 
outer, it correctly sends the data. The problem of exchange written here does 
not exist in the current master. However, I am not sure why the second one 
doesn't generate the correct result. Does anyone have an idea?

#1. LineItem - outer, Orders - inner
{code}
for $l in dataset('LineItem')
for $o in dataset('Orders')
where
$l.l_orderkey /*+ indexnl */ = $o.o_orderkey
return
{ "o_orderkey": $o.o_orderkey, "l_orderkey": $l.l_orderkey }

{code}

{code}
distribute result [%0->$$7]
-- DISTRIBUTE_RESULT  |PARTITIONED|
  exchange
  -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
project ([$$7])
-- STREAM_PROJECT  |PARTITIONED|
  assign [$$7] <- [function-call: asterix:closed-record-constructor, 
Args:[AString: {o_orderkey}, %0->$$14, AString: {l_orderkey}, %0->$$12]]
  -- ASSIGN  |PARTITIONED|
project ([$$12, $$14])
-- STREAM_PROJECT  |PARTITIONED|
  exchange
  -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
unnest-map [$$14, $$1] <- function-call: asterix:index-search, 
Args:[AString: {Orders}, AInt32: {0}, AString: {test}, AString: {Orders}, 
ABoolean: {true}, ABoolean: {true}, AInt32: {1}, %0->$$12, AInt32: {1}, 
%0->$$12, TRUE, TRUE, TRUE]
-- BTREE_SEARCH  |PARTITIONED|
  exchange
  -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
project ([$$12])
-- STREAM_PROJECT  |PARTITIONED|
  exchange
  -- HASH_PARTITION_MERGE_EXCHANGE MERGE:[$$12(ASC), $$13(ASC)] 
HASH:[$$12]  |PARTITIONED|
project ([$$12, $$13])
-- STREAM_PROJECT  |PARTITIONED|
  exchange
  -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
data-scan []<-[$$12, $$13, $$0] <- test:LineItem
-- DATASOURCE_SCAN  |PARTITIONED|
  exchange
  -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
empty-tuple-source
-- EMPTY_TUPLE_SOURCE  |PARTITIONED|
{code}



#2. LineItem - inner, Orders - outer

{code}
for $o in dataset('Orders')
for $l in dataset('LineItem')
where
$l.l_orderkey /*+ indexnl */ = $o.o_orderkey
return
{ "o_orderkey": $o.o_orderkey, "l_orderkey": $l.l_orderkey }

{code}

{code}
distribute result [%0->$$7]
-- DISTRIBUTE_RESULT  |PARTITIONED|
  exchange
  -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
project ([$$7])
-- STREAM_PROJECT  |PARTITIONED|
  assign [$$7] <- [function-call: asterix:closed-record-constructor, 
Args:[AString: {o_orderkey}, %0->$$12, AString: {l_orderkey}, %0->$$13]]
  -- ASSIGN  |PARTITIONED|
project ([$$12, $$13])
-- STREAM_PROJECT  |PARTITIONED|
  exchange
  -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
unnest-map [$$13, $$14, $$1] <- function-call: 
asterix:index-search, Args:[AString: {LineItem}, AInt32: {0}, AString: {test}, 
AString: {LineItem}, ABoolean: {true}, ABoolean: {true}, AInt32: {1}, %0->$$12, 
AInt32: {1}, %0->$$12, TRUE, TRUE, TRUE]
-- BTREE_SEARCH  |PARTITIONED|
  exchange
  -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
project ([$$12])
-- STREAM_PROJECT  |PARTITIONED|
  exchange
  -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
data-scan []<-[$$12, $$0] <- test:Orders
-- DATASOURCE_SCAN  |PARTITIONED|
  exchange
  -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
empty-tuple-source
-- EMPTY_TUPLE_SOURCE  |PARTITIONED|
{code}



> Index Join Hint on an index with composite keys causes query to return 
> incorrect results
> 
>
> Key: ASTERIXDB-1186
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1186
> Project: Apache AsterixDB
>  Issue Type: Bug
>  Components: AsterixDB, Optimizer
> Environment: AsterixDB 0.8.7-SNAPSHOT
>Reporter: Pouria
>Assignee: Taewoo Kim
>Priority: Critical
>  Labels: soon
>
> If one of the attributes in the join predicate is a *composite* PK (consists 
> of more than one attribute), if the Index Join hint

[jira] [Commented] (ASTERIXDB-1186) Index Join Hint on an index with composite keys causes query to return incorrect results

2016-08-18 Thread Taewoo Kim (JIRA)


[ 
https://issues.apache.org/jira/browse/ASTERIXDB-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427237#comment-15427237
 ] 

Taewoo Kim commented on ASTERIXDB-1186:
---

https://issues.apache.org/jira/browse/ASTERIXDB-920 and this issue is 
different. So, we need to deal with this individually. I am going to check 
this. 

> Index Join Hint on an index with composite keys causes query to return 
> incorrect results
> 
>
> Key: ASTERIXDB-1186
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1186
> Project: Apache AsterixDB
>  Issue Type: Bug
>  Components: AsterixDB, Optimizer
> Environment: AsterixDB 0.8.7-SNAPSHOT
>Reporter: Pouria
>Assignee: Taewoo Kim
>Priority: Critical
>  Labels: soon
>
> If one of the attributes in the join predicate is a *composite* PK (consists 
> of more than one attribute), if the Index Join hint is used the query plan 
> includes wrong exchange (partitioning) which results in wrong results (only a 
> subset of the correct total results will be returned).
> The root cause of it is that the records from the PK side (which is 
> composite) are hash-partitioned on a "combination" of hash values of all 
> attributes in the PK. As a result the records of the non-PK side need to be 
> sent to "all" partitions, but the plan contains the wrong exchange and the 
> runtime wont send them to all partitions. 
> Below is an example:
> > Here is DDL:
> create type LineItemType as {
>   l_orderkey: int64,
>   l_partkey: int64,
>   l_suppkey: int64,
>   l_linenumber: int32,
>   l_quantity: int32,
>   l_extendedprice: double,
>   l_discount: double,
>   l_tax: double,
>   l_returnflag: string,
>   l_linestatus: string,
>   l_shipdate: string,
>   l_commitdate: string,
>   l_receiptdate: string,
>   l_shipinstruct: string,
>   l_shipmode: string,
>   l_comment: string
> }
> create type OrderType as {
>   o_orderkey: int64,
>   o_custkey: int64,
>   o_orderstatus: string,
>   o_totalprice: double,
>   o_orderdate: string,
>   o_orderpriority: string,
>   o_clerk: string,
>   o_shippriority: int32,
>   o_comment: string
> }
> create dataset LineItem(LineItemType) primary key l_orderkey, l_linenumber;
> create dataset Orders(OrderType)  primary key o_orderkey;
> > The following query returns wrong (a subset) of results:
> for $o in dataset('Orders')
> for $l in dataset('LineItem')
> where 
> $l.l_orderkey  /*+ indexnl */ = $o.o_orderkey
> return{
> "o_orderkey": $o.o_orderkey,
> "l_orderkey": $l.l_orderkey
> }
> Here is the plan - As you can see the Orders record are One-on-One exchanged 
> (rather than broadcast). The issue is that the l_orderkey is not the 
> partitioning attribute of LineItem, and matching Orders for a specific 
> order_key can be in any partition (depending on their l_linenumber value - 
> which is the 2nd component of PK in LineItem)
> distribute result [%0->$$13]
> -- DISTRIBUTE_RESULT  |UNPARTITIONED|
>   exchange 
>   -- ONE_TO_ONE_EXCHANGE  |UNPARTITIONED|
> aggregate [$$13] <- [function-call: asterix:agg-sum, Args:[%0->$$19]]
> -- AGGREGATE  |UNPARTITIONED|
>   exchange 
>   -- RANDOM_MERGE_EXCHANGE  |PARTITIONED|
> aggregate [$$19] <- [function-call: asterix:agg-count, Args:[%0->$$9]]
> -- AGGREGATE  |PARTITIONED|
>   project ([$$9])
>   -- STREAM_PROJECT  |PARTITIONED|
> assign [$$9] <- [function-call: 
> asterix:closed-record-constructor, Args:[AString: {o_orderkey}, %0->$$16, 
> AString: {l_orderkey}, %0->$$17]]
> -- ASSIGN  |PARTITIONED|
>   project ([$$17, $$16])
>   -- STREAM_PROJECT  |PARTITIONED|
> exchange 
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   unnest-map [$$17, $$18, $$1] <- function-call: 
> asterix:index-search, Args:[AString: {LineItem}, AInt32: {0}, AString: 
> {dummy}, AString: {LineItem}, ABoolean: {true}, ABoolean: {false}, ABoolean: 
> {true}, AInt32: {1}, %0->$$16, AInt32: {1}, %0->$$16, TRUE, TRUE, TRUE]
>   -- BTREE_SEARCH  |PARTITIONED|
> exchange 
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   project ([$$16])
>   -- STREAM_PROJECT  |PARTITIONED|
> exchange 
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   data-scan []<-[$$16, $$0] <- dummy:Orders
>   -- DATASOURCE_SCAN  |PARTITIONED|
> exchange 
> -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
>   empty-tuple-source
>   -- EMPTY_TUPLE_SOURCE  |PARTITIONED|
>  



--
This message was sent by

[jira] [Commented] (ASTERIXDB-1186) Index Join Hint on an index with composite keys causes query to return incorrect results

[jira] [Commented] (ASTERIXDB-1186) Index Join Hint on an index with composite keys causes query to return incorrect results

[jira] [Commented] (ASTERIXDB-1186) Index Join Hint on an index with composite keys causes query to return incorrect results

[jira] [Commented] (ASTERIXDB-1186) Index Join Hint on an index with composite keys causes query to return incorrect results

[jira] [Commented] (ASTERIXDB-1186) Index Join Hint on an index with composite keys causes query to return incorrect results

[jira] [Commented] (ASTERIXDB-1186) Index Join Hint on an index with composite keys causes query to return incorrect results

[jira] [Commented] (ASTERIXDB-1186) Index Join Hint on an index with composite keys causes query to return incorrect results

[jira] [Commented] (ASTERIXDB-1186) Index Join Hint on an index with composite keys causes query to return incorrect results

8 matches

Site Navigation

Mail list logo

Footer information