[jira] [Commented] (CALCITE-5740) Support for AggToSemiJoinRule

2023-06-18 Thread Rong Rong (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17733935#comment-17733935
 ] 

Rong Rong commented on CALCITE-5740:


sounds good. filed CALCITE-5787 for {{RelNode.getInputFieldsUsed}}. I will 
polish the current impl for AggToSemiJoinRule and file a PR. Thank you!

> Support for AggToSemiJoinRule
> -
>
> Key: CALCITE-5740
> URL: https://issues.apache.org/jira/browse/CALCITE-5740
> Project: Calcite
>  Issue Type: New Feature
>Reporter: Rong Rong
>Priority: Major
>
> **Description**
> Currently we only have JoinToSemiJoin and ProjectToSemiJoin rule.  which in 
> the rule itself it performance check and see if the project accesses columns 
> from the RHS result
> This can be extended to Aggregate as well, experimental code: 
> https://github.com/walterddr/calcite/pull/1/files
> **Alternative**
> Alternative is to add a project/calc between the join and the aggregate to 
> activate the project-to-semi-join rule. please share if there's any other 
> alternative if I haven't considered. 
> thanks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5740) Support for AggToSemiJoinRule

2023-06-16 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17733603#comment-17733603
 ] 

Julian Hyde commented on CALCITE-5740:
--

I think {{getInputFieldsUsed}} would be more useful on {{RelNode}} (with a 
default returning the empty {{ImmutableBitSet}}) than as static methods in 
{{RelOptUtil}}. Can you log a Jira case for it?

It could be a separate commit in this PR if you like. (I do most of my best 
work when refactoring for other issues.) 

> Support for AggToSemiJoinRule
> -
>
> Key: CALCITE-5740
> URL: https://issues.apache.org/jira/browse/CALCITE-5740
> Project: Calcite
>  Issue Type: New Feature
>Reporter: Rong Rong
>Priority: Major
>
> **Description**
> Currently we only have JoinToSemiJoin and ProjectToSemiJoin rule.  which in 
> the rule itself it performance check and see if the project accesses columns 
> from the RHS result
> This can be extended to Aggregate as well, experimental code: 
> https://github.com/walterddr/calcite/pull/1/files
> **Alternative**
> Alternative is to add a project/calc between the join and the aggregate to 
> activate the project-to-semi-join rule. please share if there's any other 
> alternative if I haven't considered. 
> thanks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5740) Support for AggToSemiJoinRule

2023-06-14 Thread Rong Rong (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732811#comment-17732811
 ] 

Rong Rong commented on CALCITE-5740:


actually i think it would be super useful to add the getInputFieldsUsed(), 
either as a function signature from {{{}RelNode{}}}, or as
{code:java}
RelOptUtil.getInputFieldsUsed(relNode);{code}
I can quickly POC if that's desirable 

> Support for AggToSemiJoinRule
> -
>
> Key: CALCITE-5740
> URL: https://issues.apache.org/jira/browse/CALCITE-5740
> Project: Calcite
>  Issue Type: New Feature
>Reporter: Rong Rong
>Priority: Major
>
> **Description**
> Currently we only have JoinToSemiJoin and ProjectToSemiJoin rule.  which in 
> the rule itself it performance check and see if the project accesses columns 
> from the RHS result
> This can be extended to Aggregate as well, experimental code: 
> https://github.com/walterddr/calcite/pull/1/files
> **Alternative**
> Alternative is to add a project/calc between the join and the aggregate to 
> activate the project-to-semi-join rule. please share if there's any other 
> alternative if I haven't considered. 
> thanks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5740) Support for AggToSemiJoinRule

2023-06-13 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732263#comment-17732263
 ] 

Julian Hyde commented on CALCITE-5740:
--

Ha! Well done finding {{getAllFields}} and {{bits}}. Yes, they do the same 
thing. (Though I'd prefer if both returned an {{ImmutableBitSet}}.) It's a 
shame there isn't even a javadoc link tying them together. I wonder whether 
there should be a a method on {{RelNode}}:
{code}
ImmutableBitSet getInputFieldsUsed();
{code}

> Support for AggToSemiJoinRule
> -
>
> Key: CALCITE-5740
> URL: https://issues.apache.org/jira/browse/CALCITE-5740
> Project: Calcite
>  Issue Type: New Feature
>Reporter: Rong Rong
>Priority: Major
>
> **Description**
> Currently we only have JoinToSemiJoin and ProjectToSemiJoin rule.  which in 
> the rule itself it performance check and see if the project accesses columns 
> from the RHS result
> This can be extended to Aggregate as well, experimental code: 
> https://github.com/walterddr/calcite/pull/1/files
> **Alternative**
> Alternative is to add a project/calc between the join and the aggregate to 
> activate the project-to-semi-join rule. please share if there's any other 
> alternative if I haven't considered. 
> thanks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5740) Support for AggToSemiJoinRule

2023-06-11 Thread Rong Rong (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17731376#comment-17731376
 ] 

Rong Rong commented on CALCITE-5740:


[~libenchao]

i dont have a generic mechanism in mind. but the idea is to detect whether the 
access field is only from the LHS of the join

- for Aggregate there's {{RelOptUtil.getAllFields(aggregateNode)}} to detect 
the all the bit access.
- for Project the original method in this class is 
{{RelOptUtil.InputFinder.bits(projectNode.getProjects())}}, we can basically do 
the same for aggregate

we can even create a util to detect all the inputField access bits for anyNode 
but that would require some work (or maybe it already exists :-P )

but for now i dont know if there's any additional benefit for supporting other 
nodes (maybe one more: CalcNode?) let me know what you think, should we (1) go 
with generic and create a bit finder and change the rule to any RelNode, or (2) 
add one for Aggregate only


> Support for AggToSemiJoinRule
> -
>
> Key: CALCITE-5740
> URL: https://issues.apache.org/jira/browse/CALCITE-5740
> Project: Calcite
>  Issue Type: New Feature
>Reporter: Rong Rong
>Priority: Major
>
> **Description**
> Currently we only have JoinToSemiJoin and ProjectToSemiJoin rule.  which in 
> the rule itself it performance check and see if the project accesses columns 
> from the RHS result
> This can be extended to Aggregate as well, experimental code: 
> https://github.com/walterddr/calcite/pull/1/files
> **Alternative**
> Alternative is to add a project/calc between the join and the aggregate to 
> activate the project-to-semi-join rule. please share if there's any other 
> alternative if I haven't considered. 
> thanks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5740) Support for AggToSemiJoinRule

2023-06-10 Thread Benchao Li (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17731212#comment-17731212
 ] 

Benchao Li commented on CALCITE-5740:
-

[~rongr] I agree that your last case can be improved, by introducing another 
rule which matches any node on top of {{Join}}. Do you have any idea in your 
mind how to analyze the "any node"'s input fields?

> Support for AggToSemiJoinRule
> -
>
> Key: CALCITE-5740
> URL: https://issues.apache.org/jira/browse/CALCITE-5740
> Project: Calcite
>  Issue Type: New Feature
>Reporter: Rong Rong
>Priority: Major
>
> **Description**
> Currently we only have JoinToSemiJoin and ProjectToSemiJoin rule.  which in 
> the rule itself it performance check and see if the project accesses columns 
> from the RHS result
> This can be extended to Aggregate as well, experimental code: 
> https://github.com/walterddr/calcite/pull/1/files
> **Alternative**
> Alternative is to add a project/calc between the join and the aggregate to 
> activate the project-to-semi-join rule. please share if there's any other 
> alternative if I haven't considered. 
> thanks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5740) Support for AggToSemiJoinRule

2023-06-07 Thread Rong Rong (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17730328#comment-17730328
 ] 

Rong Rong commented on CALCITE-5740:


i see, yeah that was a poorly chosen example. you are correct COUNT(*) results 
can be different if b.key is not unique. however, this might've been my own 
configuration issue, but running the following query through the planner
SELECT  col, COUNT(*)FROMaWHEREa.key IN (SELECT key FROM b WHERE val BETWEEN 0 
AND 10)
with JOIN_TO_SEMI_JOIN rule will still result in an inner JOIN, with the RHS 
table as the result of 
SELECT DISTINCT key FROM b WHERE val BETWEEN 0 AND 10
e.g. it is still not generating a SEMI-JOIN. 

is there
 # some other rule I can use to configure the planner to generate SEMI-JOIN?
 # some default configuration that will directly generate a SEMI-JOIN when 
going through the SqlToRelConverter?

Thanks!

> Support for AggToSemiJoinRule
> -
>
> Key: CALCITE-5740
> URL: https://issues.apache.org/jira/browse/CALCITE-5740
> Project: Calcite
>  Issue Type: New Feature
>Reporter: Rong Rong
>Priority: Major
>
> **Description**
> Currently we only have JoinToSemiJoin and ProjectToSemiJoin rule.  which in 
> the rule itself it performance check and see if the project accesses columns 
> from the RHS result
> This can be extended to Aggregate as well, experimental code: 
> https://github.com/walterddr/calcite/pull/1/files
> **Alternative**
> Alternative is to add a project/calc between the join and the aggregate to 
> activate the project-to-semi-join rule. please share if there's any other 
> alternative if I haven't considered. 
> thanks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5740) Support for AggToSemiJoinRule

2023-06-03 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17728993#comment-17728993
 ] 

Julian Hyde commented on CALCITE-5740:
--

Your example makes things clearer. I see how the Aggregate allows you deduce 
that no columns are used from the right side of the join. 

Your example is only valid if b.key is unique. Otherwise COUNT will return 
different results before and after the transformation. So, no columns being 
used is a necessary but not sufficient condition to apply the rule. You should 
describe what those conditions are (including which join types are allowed).

If you require uniqueness on the right hand side then I’m not sure there’s a 
point converting the join to a semijoin. Or rather, I think there’s an existing 
rule that removes an unnecessary Aggregate if it’s input is already unique.

Can you come up with an example (perhaps using EMP and DEPT, because we know 
their PK and FK constraints) where this rule can achieve something to other 
rule(s) can?

> Support for AggToSemiJoinRule
> -
>
> Key: CALCITE-5740
> URL: https://issues.apache.org/jira/browse/CALCITE-5740
> Project: Calcite
>  Issue Type: New Feature
>Reporter: Rong Rong
>Priority: Major
>
> **Description**
> Currently we only have JoinToSemiJoin and ProjectToSemiJoin rule.  which in 
> the rule itself it performance check and see if the project accesses columns 
> from the RHS result
> This can be extended to Aggregate as well, experimental code: 
> https://github.com/walterddr/calcite/pull/1/files
> **Alternative**
> Alternative is to add a project/calc between the join and the aggregate to 
> activate the project-to-semi-join rule. please share if there's any other 
> alternative if I haven't considered. 
> thanks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5740) Support for AggToSemiJoinRule

2023-06-02 Thread Rong Rong (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17728894#comment-17728894
 ] 

Rong Rong commented on CALCITE-5740:


forgot to post an example

As an example:
```
SELECT 
  a.col, COUNT(*) 
FROM 
  a JOIN b ON a.key = b.key
WHERE
  b.val BETWEEN 0 AND 10
```
can be converted to 
```
SELECT
  col, COUNT(*)
FROM 
  a
WHERE
  a.key IN (SELECT key FROM b WHERE val BETWEEN 0 AND 10)
```

> Support for AggToSemiJoinRule
> -
>
> Key: CALCITE-5740
> URL: https://issues.apache.org/jira/browse/CALCITE-5740
> Project: Calcite
>  Issue Type: New Feature
>Reporter: Rong Rong
>Priority: Major
>
> **Description**
> Currently we only have JoinToSemiJoin and ProjectToSemiJoin rule.  which in 
> the rule itself it performance check and see if the project accesses columns 
> from the RHS result
> This can be extended to Aggregate as well, experimental code: 
> https://github.com/walterddr/calcite/pull/1/files
> **Alternative**
> Alternative is to add a project/calc between the join and the aggregate to 
> activate the project-to-semi-join rule. please share if there's any other 
> alternative if I haven't considered. 
> thanks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5740) Support for AggToSemiJoinRule

2023-06-02 Thread Rong Rong (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17728893#comment-17728893
 ] 

Rong Rong commented on CALCITE-5740:


hmm. IIUC the PROJECT_TO_SEMI_JOIN doesn't actually touches the PROJECT node 
above. 

the difference between PROJECT_TO_SEMI_JOIN and JOIN_TO_SEMI_JOIN is basically 
only to see if the "PROJECT" above the JOIN actually touches any fields from 
the right-side aggregate; 
- if the PROJECT doesn't exist then it uses the `isEmptyAggregate(aggregate)` 
check to determine whether to convert a JOIN to SEMI_JOIN. see: 
https://github.com/walterddr/calcite/blob/3817b0e42c07a6b185f3c1b921f648ff28e8a3b7/core/src/main/java/org/apache/calcite/rel/rules/SemiJoinRule.java#L84-L87
- if the PROJECT exist it check if the project field bits intersects with the 
right aggregate bits, see: 
https://github.com/walterddr/calcite/blob/3817b0e42c07a6b185f3c1b921f648ff28e8a3b7/core/src/main/java/org/apache/calcite/rel/rules/SemiJoinRule.java#L80-L82
  - after the rule being applied to convert join to semi-join the project seems 
to be put back? 
https://github.com/walterddr/calcite/blob/3817b0e42c07a6b185f3c1b921f648ff28e8a3b7/core/src/main/java/org/apache/calcite/rel/rules/SemiJoinRule.java#L126-L128C1

with this logic I dont think PROJECT is all that special. it is simply used to 
verify that the logic above the JOIN doesn't touch the RHS, and thus we can 
safely convert the JOIN to SEMI-JOIN? was my understanding incorrect here?

If what I understood was correct. i think adding AGG is probably not that 
special, it technically can be any RelNode on top of JOIN as long as there's a 
way to extract the bit reference, i can even make this more generic, just that 
i can't think of a reason to do so.


> Support for AggToSemiJoinRule
> -
>
> Key: CALCITE-5740
> URL: https://issues.apache.org/jira/browse/CALCITE-5740
> Project: Calcite
>  Issue Type: New Feature
>Reporter: Rong Rong
>Priority: Major
>
> **Description**
> Currently we only have JoinToSemiJoin and ProjectToSemiJoin rule.  which in 
> the rule itself it performance check and see if the project accesses columns 
> from the RHS result
> This can be extended to Aggregate as well, experimental code: 
> https://github.com/walterddr/calcite/pull/1/files
> **Alternative**
> Alternative is to add a project/calc between the join and the aggregate to 
> activate the project-to-semi-join rule. please share if there's any other 
> alternative if I haven't considered. 
> thanks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-5740) Support for AggToSemiJoinRule

2023-06-02 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-5740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17728600#comment-17728600
 ] 

Julian Hyde commented on CALCITE-5740:
--

I don’t understand. When does it make sense to convert an Aggregate to a 
SemiJoin? Can you give a query as an example?

I would abbreviate Aggregate to Agg. It saves a few characters but is 
inconsistent with our naming convention. 

> Support for AggToSemiJoinRule
> -
>
> Key: CALCITE-5740
> URL: https://issues.apache.org/jira/browse/CALCITE-5740
> Project: Calcite
>  Issue Type: New Feature
>Reporter: Rong Rong
>Priority: Major
>
> **Description**
> Currently we only have JoinToSemiJoin and ProjectToSemiJoin rule.  which in 
> the rule itself it performance check and see if the project accesses columns 
> from the RHS result
> This can be extended to Aggregate as well, experimental code: 
> https://github.com/walterddr/calcite/pull/1/files
> **Alternative**
> Alternative is to add a project/calc between the join and the aggregate to 
> activate the project-to-semi-join rule. please share if there's any other 
> alternative if I haven't considered. 
> thanks



--
This message was sent by Atlassian Jira
(v8.20.10#820010)