[jira] [Created] (CALCITE-3764) AggregateCaseToFilterRule handles NULL values correctly

2020-01-31 Thread Julian Hyde (Jira)
Julian Hyde created CALCITE-3764:


 Summary: AggregateCaseToFilterRule handles NULL values correctly
 Key: CALCITE-3764
 URL: https://issues.apache.org/jira/browse/CALCITE-3764
 Project: Calcite
  Issue Type: Bug
Reporter: Julian Hyde


{{AggregateCaseToFilterRule}} handles NULL values correctly. It converts

{code:sql}
SELECT COUNT(CASE WHEN b THEN NULL ELSE 1 END) FROM t
{code}
to
{code:sql}
SELECT COUNT(*) FILTER (WHERE b IS FALSE) FROM t
{code}
which fails to count rows where {{b}} is UNKNOWN, so it should convert to
{code:sql}
SELECT COUNT(*) FILTER (WHERE b IS NOT TRUE) FROM t
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (CALCITE-3763) RelBuilder.aggregate should prune unused fields from the input, if the input is a Project

2020-01-31 Thread Julian Hyde (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-3763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julian Hyde reassigned CALCITE-3763:


Assignee: Julian Hyde  (was: Jin Xing)

> RelBuilder.aggregate should prune unused fields from the input, if the input 
> is a Project
> -
>
> Key: CALCITE-3763
> URL: https://issues.apache.org/jira/browse/CALCITE-3763
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Assignee: Julian Hyde
>Priority: Major
>
> {{RelBuilder.aggregate}} should prune unused fields from the input, if the 
> input is a {{Project}}.
> Pruning fields during the planning process is desirable, but often cannot do 
> it - we are applying a {{RelOptRule}} that has to return the same fields, or 
> we don't want to add an extra Project do so the pruning. But when we are in 
> {{RelBuilder.aggregate}} and the input is a Project, neither of those 
> limitations apply. We already have a Project, we are just making it narrower; 
> and we know what fields the {{Aggregate}} will produce.
> For example,
> {code:sql}
> SELECT deptno, SUM(sal) FILTER (WHERE b)
> FROM (
>   SELECT deptno, empno + 10, sal, job = 'CLERK' AS b
>   FROM emp)
> GROUP BY deptno
> {code}
> becomes
> {code:sql}
> SELECT deptno, SUM(sal) FILTER (WHERE b)
> FROM (
>   SELECT deptno, sal, job = 'CLERK' AS b
>   FROM emp)
> GROUP BY deptno
> {code}
> If there are no fields used, remove the {{Project}}. (A {{RelNode}} with no 
> fields is not allowed.)
> {code:sql}
> SELECT COUNT(*) AS C
> FROM (
>  SELECT deptno, empno + 10, sal, job = 'CLERK' AS b
>  FROM emp)
> {code}
> becomes
> {code:sql}
> SELECT COUNT(*) AS c
> FROM emp
> {code}
> Add an option {{RelBuilder.Config.pruneInputOfAggregate}}, default true, so 
> that people can disable this rewrite if it causes problems.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (CALCITE-3763) RelBuilder.aggregate should prune unused fields from the input, if the input is a Project

2020-01-31 Thread Jin Xing (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-3763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jin Xing reassigned CALCITE-3763:
-

Assignee: Jin Xing

> RelBuilder.aggregate should prune unused fields from the input, if the input 
> is a Project
> -
>
> Key: CALCITE-3763
> URL: https://issues.apache.org/jira/browse/CALCITE-3763
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Assignee: Jin Xing
>Priority: Major
>
> {{RelBuilder.aggregate}} should prune unused fields from the input, if the 
> input is a {{Project}}.
> Pruning fields during the planning process is desirable, but often cannot do 
> it - we are applying a {{RelOptRule}} that has to return the same fields, or 
> we don't want to add an extra Project do so the pruning. But when we are in 
> {{RelBuilder.aggregate}} and the input is a Project, neither of those 
> limitations apply. We already have a Project, we are just making it narrower; 
> and we know what fields the {{Aggregate}} will produce.
> For example,
> {code:sql}
> SELECT deptno, SUM(sal) FILTER (WHERE b)
> FROM (
>   SELECT deptno, empno + 10, sal, job = 'CLERK' AS b
>   FROM emp)
> GROUP BY deptno
> {code}
> becomes
> {code:sql}
> SELECT deptno, SUM(sal) FILTER (WHERE b)
> FROM (
>   SELECT deptno, sal, job = 'CLERK' AS b
>   FROM emp)
> GROUP BY deptno
> {code}
> If there are no fields used, remove the {{Project}}. (A {{RelNode}} with no 
> fields is not allowed.)
> {code:sql}
> SELECT COUNT(*) AS C
> FROM (
>  SELECT deptno, empno + 10, sal, job = 'CLERK' AS b
>  FROM emp)
> {code}
> becomes
> {code:sql}
> SELECT COUNT(*) AS c
> FROM emp
> {code}
> Add an option {{RelBuilder.Config.pruneInputOfAggregate}}, default true, so 
> that people can disable this rewrite if it causes problems.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CALCITE-3763) RelBuilder.aggregate should prune unused fields from the input, if the input is a Project

2020-01-31 Thread Julian Hyde (Jira)
Julian Hyde created CALCITE-3763:


 Summary: RelBuilder.aggregate should prune unused fields from the 
input, if the input is a Project
 Key: CALCITE-3763
 URL: https://issues.apache.org/jira/browse/CALCITE-3763
 Project: Calcite
  Issue Type: Bug
Reporter: Julian Hyde


{{RelBuilder.aggregate}} should prune unused fields from the input, if the 
input is a {{Project}}.

Pruning fields during the planning process is desirable, but often cannot do it 
- we are applying a {{RelOptRule}} that has to return the same fields, or we 
don't want to add an extra Project do so the pruning. But when we are in 
{{RelBuilder.aggregate}} and the input is a Project, neither of those 
limitations apply. We already have a Project, we are just making it narrower; 
and we know what fields the {{Aggregate}} will produce.

For example,
{code:sql}
SELECT deptno, SUM(sal) FILTER (WHERE b)
FROM (
  SELECT deptno, empno + 10, sal, job = 'CLERK' AS b
  FROM emp)
GROUP BY deptno
{code}
becomes
{code:sql}
SELECT deptno, SUM(sal) FILTER (WHERE b)
FROM (
  SELECT deptno, sal, job = 'CLERK' AS b
  FROM emp)
GROUP BY deptno
{code}

If there are no fields used, remove the {{Project}}. (A {{RelNode}} with no 
fields is not allowed.)
{code:sql}
SELECT COUNT(*) AS C
FROM (
 SELECT deptno, empno + 10, sal, job = 'CLERK' AS b
 FROM emp)
{code}
becomes
{code:sql}
SELECT COUNT(*) AS c
FROM emp
{code}

Add an option {{RelBuilder.Config.pruneInputOfAggregate}}, default true, so 
that people can disable this rewrite if it causes problems.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CALCITE-3762) AggregateProjectPullUpConstantsRule causes fields to be out of order

2020-01-31 Thread hezhang (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hezhang updated CALCITE-3762:
-
Attachment: calcite-3762.patch

> AggregateProjectPullUpConstantsRule causes fields to be out of order
> 
>
> Key: CALCITE-3762
> URL: https://issues.apache.org/jira/browse/CALCITE-3762
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.10.0, 1.16.0
>Reporter: hezhang
>Priority: Major
> Fix For: 1.16.0
>
> Attachments: calcite-3762.patch, image-2020-02-01-01-29-49-479.png, 
> image-2020-02-01-01-33-54-111.png
>
>
> the sql:
> {code:java}
> select * FROM( SELECT plat, category, rid, populary_num FROM 
> panda_com.crawler_anchor WHERE
>  par_date = '20180819'
>  AND plat = 'huya'
>  AND rid = 'meijiao'
> ) a
>  JOIN
>  (
>  SELECT DISTINCT
>  'huya' plat ,
>  edwin.privatehost ,
>  edwin.profileroom
>  FROM
>  panda_com.ol_huya_isOnline edwin
>  WHERE
>  par_date = '20180819' ) m9
>  ON
>  a.rid= m9.privatehost
>  AND a.plat = m9.plat{code}
> the result:
>  
> {code:java}
> huya yule meijiao 30 huya 10001242 meijiao
> {code}
>  
> but the desired result is:
>  
> {code:java}
> huya yule meijiao 30 huya meijiao 10001242  
> {code}
>  
> *cause:*
> hepPlanner use AggregateProjectPullUpConstantsRule:
> !image-2020-02-01-01-29-49-479.png!
>  after add fix patch:
> !image-2020-02-01-01-33-54-111.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CALCITE-3762) AggregateProjectPullUpConstantsRule causes fields to be out of order

2020-01-31 Thread hezhang (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hezhang updated CALCITE-3762:
-
Description: 
the sql:
{code:java}
select * FROM( SELECT plat, category, rid, populary_num FROM 
panda_com.crawler_anchor WHERE
 par_date = '20180819'
 AND plat = 'huya'
 AND rid = 'meijiao'
) a
 JOIN
 (
 SELECT DISTINCT
 'huya' plat ,
 edwin.privatehost ,
 edwin.profileroom
 FROM
 panda_com.ol_huya_isOnline edwin
 WHERE
 par_date = '20180819' ) m9
 ON
 a.rid= m9.privatehost
 AND a.plat = m9.plat{code}
the result:

 
{code:java}
huya yule meijiao 30 huya 10001242 meijiao

{code}
 

but the desired result is:

 
{code:java}
huya yule meijiao 30 huya meijiao 10001242  

{code}
 

*cause:*

hepPlanner use AggregateProjectPullUpConstantsRule:

!image-2020-02-01-01-29-49-479.png!
 after add fix patch:

!image-2020-02-01-01-33-54-111.png!

  was:
the sql:
{code:java}
select * FROM( SELECT plat, category, rid, populary_num FROM 
panda_com.crawler_anchor WHERE
 par_date = '20180819'
 AND plat = 'huya'
 AND rid = 'meijiao'
) a
 JOIN
 (
 SELECT DISTINCT
 'huya' plat ,
 edwin.privatehost ,
 edwin.profileroom
 FROM
 panda_com.ol_huya_isOnline edwin
 WHERE
 par_date = '20180819' ) m9
 ON
 a.rid= m9.privatehost
 AND a.plat = m9.plat{code}
the result:

 
{code:java}
huya yule meijiao 30 huya 10001242 meijiao

{code}
 

but the desired result is:

 
{code:java}
huya yule meijiao 30 huya meijiao 10001242  

{code}
 

*cause:*

hepPlanner use AggregateProjectPullUpConstantsRule:

!image-2020-02-01-01-29-49-479.png!
after add fix patch:

!image-2020-02-01-01-32-32-485.png!


> AggregateProjectPullUpConstantsRule causes fields to be out of order
> 
>
> Key: CALCITE-3762
> URL: https://issues.apache.org/jira/browse/CALCITE-3762
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.10.0, 1.16.0
>Reporter: hezhang
>Priority: Major
> Fix For: 1.16.0
>
> Attachments: image-2020-02-01-01-29-49-479.png, 
> image-2020-02-01-01-33-54-111.png
>
>
> the sql:
> {code:java}
> select * FROM( SELECT plat, category, rid, populary_num FROM 
> panda_com.crawler_anchor WHERE
>  par_date = '20180819'
>  AND plat = 'huya'
>  AND rid = 'meijiao'
> ) a
>  JOIN
>  (
>  SELECT DISTINCT
>  'huya' plat ,
>  edwin.privatehost ,
>  edwin.profileroom
>  FROM
>  panda_com.ol_huya_isOnline edwin
>  WHERE
>  par_date = '20180819' ) m9
>  ON
>  a.rid= m9.privatehost
>  AND a.plat = m9.plat{code}
> the result:
>  
> {code:java}
> huya yule meijiao 30 huya 10001242 meijiao
> {code}
>  
> but the desired result is:
>  
> {code:java}
> huya yule meijiao 30 huya meijiao 10001242  
> {code}
>  
> *cause:*
> hepPlanner use AggregateProjectPullUpConstantsRule:
> !image-2020-02-01-01-29-49-479.png!
>  after add fix patch:
> !image-2020-02-01-01-33-54-111.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CALCITE-3762) AggregateProjectPullUpConstantsRule causes fields to be out of order

2020-01-31 Thread hezhang (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hezhang updated CALCITE-3762:
-
Attachment: (was: image-2020-02-01-01-32-32-485.png)

> AggregateProjectPullUpConstantsRule causes fields to be out of order
> 
>
> Key: CALCITE-3762
> URL: https://issues.apache.org/jira/browse/CALCITE-3762
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.10.0, 1.16.0
>Reporter: hezhang
>Priority: Major
> Fix For: 1.16.0
>
> Attachments: image-2020-02-01-01-29-49-479.png, 
> image-2020-02-01-01-33-54-111.png
>
>
> the sql:
> {code:java}
> select * FROM( SELECT plat, category, rid, populary_num FROM 
> panda_com.crawler_anchor WHERE
>  par_date = '20180819'
>  AND plat = 'huya'
>  AND rid = 'meijiao'
> ) a
>  JOIN
>  (
>  SELECT DISTINCT
>  'huya' plat ,
>  edwin.privatehost ,
>  edwin.profileroom
>  FROM
>  panda_com.ol_huya_isOnline edwin
>  WHERE
>  par_date = '20180819' ) m9
>  ON
>  a.rid= m9.privatehost
>  AND a.plat = m9.plat{code}
> the result:
>  
> {code:java}
> huya yule meijiao 30 huya 10001242 meijiao
> {code}
>  
> but the desired result is:
>  
> {code:java}
> huya yule meijiao 30 huya meijiao 10001242  
> {code}
>  
> *cause:*
> hepPlanner use AggregateProjectPullUpConstantsRule:
> !image-2020-02-01-01-29-49-479.png!
>  after add fix patch:
> !image-2020-02-01-01-33-54-111.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CALCITE-3762) AggregateProjectPullUpConstantsRule causes fields to be out of order

2020-01-31 Thread hezhang (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hezhang updated CALCITE-3762:
-
Attachment: image-2020-02-01-01-33-54-111.png

> AggregateProjectPullUpConstantsRule causes fields to be out of order
> 
>
> Key: CALCITE-3762
> URL: https://issues.apache.org/jira/browse/CALCITE-3762
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.10.0, 1.16.0
>Reporter: hezhang
>Priority: Major
> Fix For: 1.16.0
>
> Attachments: image-2020-02-01-01-29-49-479.png, 
> image-2020-02-01-01-33-54-111.png
>
>
> the sql:
> {code:java}
> select * FROM( SELECT plat, category, rid, populary_num FROM 
> panda_com.crawler_anchor WHERE
>  par_date = '20180819'
>  AND plat = 'huya'
>  AND rid = 'meijiao'
> ) a
>  JOIN
>  (
>  SELECT DISTINCT
>  'huya' plat ,
>  edwin.privatehost ,
>  edwin.profileroom
>  FROM
>  panda_com.ol_huya_isOnline edwin
>  WHERE
>  par_date = '20180819' ) m9
>  ON
>  a.rid= m9.privatehost
>  AND a.plat = m9.plat{code}
> the result:
>  
> {code:java}
> huya yule meijiao 30 huya 10001242 meijiao
> {code}
>  
> but the desired result is:
>  
> {code:java}
> huya yule meijiao 30 huya meijiao 10001242  
> {code}
>  
> *cause:*
> hepPlanner use AggregateProjectPullUpConstantsRule:
> !image-2020-02-01-01-29-49-479.png!
> after add fix patch:
> !image-2020-02-01-01-32-32-485.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CALCITE-3762) AggregateProjectPullUpConstantsRule causes fields to be out of order

2020-01-31 Thread hezhang (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hezhang updated CALCITE-3762:
-
Attachment: image-2020-02-01-01-32-32-485.png

> AggregateProjectPullUpConstantsRule causes fields to be out of order
> 
>
> Key: CALCITE-3762
> URL: https://issues.apache.org/jira/browse/CALCITE-3762
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.10.0, 1.16.0
>Reporter: hezhang
>Priority: Major
> Fix For: 1.16.0
>
> Attachments: image-2020-02-01-01-29-49-479.png, 
> image-2020-02-01-01-32-32-485.png
>
>
> the sql:
> {code:java}
> select * FROM( SELECT plat, category, rid, populary_num FROM 
> panda_com.crawler_anchor WHERE
>  par_date = '20180819'
>  AND plat = 'huya'
>  AND rid = 'meijiao'
> ) a
>  JOIN
>  (
>  SELECT DISTINCT
>  'huya' plat ,
>  edwin.privatehost ,
>  edwin.profileroom
>  FROM
>  panda_com.ol_huya_isOnline edwin
>  WHERE
>  par_date = '20180819' ) m9
>  ON
>  a.rid= m9.privatehost
>  AND a.plat = m9.plat{code}
> the result:
>  
> {code:java}
> huya yule meijiao 30 huya 10001242 meijiao
> {code}
>  
> but the desired result is:
>  
> {code:java}
> huya yule meijiao 30 huya meijiao 10001242  
> {code}
>  
> *cause:*
> hepPlanner use AggregateProjectPullUpConstantsRule:
> !image-2020-02-01-01-29-49-479.png!
> after add fix patch:
> !image-2020-02-01-01-32-32-485.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CALCITE-3762) AggregateProjectPullUpConstantsRule causes fields to be out of order

2020-01-31 Thread hezhang (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hezhang updated CALCITE-3762:
-
Description: 
the sql:
{code:java}
select * FROM( SELECT plat, category, rid, populary_num FROM 
panda_com.crawler_anchor WHERE
 par_date = '20180819'
 AND plat = 'huya'
 AND rid = 'meijiao'
) a
 JOIN
 (
 SELECT DISTINCT
 'huya' plat ,
 edwin.privatehost ,
 edwin.profileroom
 FROM
 panda_com.ol_huya_isOnline edwin
 WHERE
 par_date = '20180819' ) m9
 ON
 a.rid= m9.privatehost
 AND a.plat = m9.plat{code}
the result:

 
{code:java}
huya yule meijiao 30 huya 10001242 meijiao

{code}
 

but the desired result is:

 
{code:java}
huya yule meijiao 30 huya meijiao 10001242  

{code}
 

*cause:*

hepPlanner use AggregateProjectPullUpConstantsRule:

!image-2020-02-01-01-29-49-479.png!
after add fix patch:

!image-2020-02-01-01-32-32-485.png!

  was:
the sql:
{code:java}
select * FROM( SELECT plat, category, rid, populary_num FROM 
panda_com.crawler_anchor WHERE
 par_date = '20180819'
 AND plat = 'huya'
 AND rid = 'meijiao'
) a
 JOIN
 (
 SELECT DISTINCT
 'huya' plat ,
 edwin.privatehost ,
 edwin.profileroom
 FROM
 panda_com.ol_huya_isOnline edwin
 WHERE
 par_date = '20180819' ) m9
 ON
 a.rid= m9.privatehost
 AND a.plat = m9.plat{code}
the result:

 
{code:java}
huya yule meijiao 30 huya 10001242 meijiao

{code}
 

but the desired result is:

 
{code:java}
huya yule meijiao 30 huya meijiao 10001242  

{code}
 

*cause:*

hepPlanner use AggregateProjectPullUpConstantsRule 

!image-2020-02-01-01-29-49-479.png!
[ISO|http://dict.youdao.com/search?q=ISO&keyfrom=chrome.extension]  [ˈaɪsəʊ; 
ˌaɪ es ˈəʊ]  [详细|http://dict.youdao.com/search?q=ISO&keyfrom=chrome.extension]X
基本翻译
abbr. 国际标准化组织(International Standardization Organization);国际科学组织(International 
Science Organization)
网络释义
[ISO:|http://dict.youdao.com/search?q=ISO&keyfrom=chrome.extension&le=eng] 
国际标准化组织
[ISO 
Toolkit:|http://dict.youdao.com/search?q=ISO%20Toolkit&keyfrom=chrome.extension&le=eng]
 映像格式处理
[ISO 
Auto:|http://dict.youdao.com/search?q=ISO%20Auto&keyfrom=chrome.extension&le=eng]
 自动感光度篇


> AggregateProjectPullUpConstantsRule causes fields to be out of order
> 
>
> Key: CALCITE-3762
> URL: https://issues.apache.org/jira/browse/CALCITE-3762
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.10.0, 1.16.0
>Reporter: hezhang
>Priority: Major
> Fix For: 1.16.0
>
> Attachments: image-2020-02-01-01-29-49-479.png, 
> image-2020-02-01-01-32-32-485.png
>
>
> the sql:
> {code:java}
> select * FROM( SELECT plat, category, rid, populary_num FROM 
> panda_com.crawler_anchor WHERE
>  par_date = '20180819'
>  AND plat = 'huya'
>  AND rid = 'meijiao'
> ) a
>  JOIN
>  (
>  SELECT DISTINCT
>  'huya' plat ,
>  edwin.privatehost ,
>  edwin.profileroom
>  FROM
>  panda_com.ol_huya_isOnline edwin
>  WHERE
>  par_date = '20180819' ) m9
>  ON
>  a.rid= m9.privatehost
>  AND a.plat = m9.plat{code}
> the result:
>  
> {code:java}
> huya yule meijiao 30 huya 10001242 meijiao
> {code}
>  
> but the desired result is:
>  
> {code:java}
> huya yule meijiao 30 huya meijiao 10001242  
> {code}
>  
> *cause:*
> hepPlanner use AggregateProjectPullUpConstantsRule:
> !image-2020-02-01-01-29-49-479.png!
> after add fix patch:
> !image-2020-02-01-01-32-32-485.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CALCITE-3762) AggregateProjectPullUpConstantsRule causes fields to be out of order

2020-01-31 Thread hezhang (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hezhang updated CALCITE-3762:
-
Description: 
the sql:
{code:java}
select * FROM( SELECT plat, category, rid, populary_num FROM 
panda_com.crawler_anchor WHERE
 par_date = '20180819'
 AND plat = 'huya'
 AND rid = 'meijiao'
) a
 JOIN
 (
 SELECT DISTINCT
 'huya' plat ,
 edwin.privatehost ,
 edwin.profileroom
 FROM
 panda_com.ol_huya_isOnline edwin
 WHERE
 par_date = '20180819' ) m9
 ON
 a.rid= m9.privatehost
 AND a.plat = m9.plat{code}
the result:

 
{code:java}
huya yule meijiao 30 huya 10001242 meijiao

{code}
 

but the desired result is:

 
{code:java}
huya yule meijiao 30 huya meijiao 10001242  

{code}
 

*cause:*

hepPlanner use AggregateProjectPullUpConstantsRule 

!image-2020-02-01-01-29-49-479.png!
[ISO|http://dict.youdao.com/search?q=ISO&keyfrom=chrome.extension]  [ˈaɪsəʊ; 
ˌaɪ es ˈəʊ]  [详细|http://dict.youdao.com/search?q=ISO&keyfrom=chrome.extension]X
基本翻译
abbr. 国际标准化组织(International Standardization Organization);国际科学组织(International 
Science Organization)
网络释义
[ISO:|http://dict.youdao.com/search?q=ISO&keyfrom=chrome.extension&le=eng] 
国际标准化组织
[ISO 
Toolkit:|http://dict.youdao.com/search?q=ISO%20Toolkit&keyfrom=chrome.extension&le=eng]
 映像格式处理
[ISO 
Auto:|http://dict.youdao.com/search?q=ISO%20Auto&keyfrom=chrome.extension&le=eng]
 自动感光度篇

  was:
the sql:
{code:java}
select * FROM( SELECT plat, category, rid, populary_num FROM 
panda_com.crawler_anchor WHERE
 par_date = '20180819'
 AND plat = 'huya'
 AND rid = 'meijiao'
) a
 JOIN
 (
 SELECT DISTINCT
 'huya' plat ,
 edwin.privatehost ,
 edwin.profileroom
 FROM
 panda_com.ol_huya_isOnline edwin
 WHERE
 par_date = '20180819' ) m9
 ON
 a.rid= m9.privatehost
 AND a.plat = m9.plat{code}
the result:

 
{code:java}
huya yule meijiao 30 huya 10001242 meijiao
{code}
 

but the desired result is:

 
{code:java}
huya yule meijiao 30 huya meijiao 10001242  
{code}
 

*cause:*

hepPlanner use AggregateProjectPullUpConstantsRule :
{code:java}
HiveProject(plat=[$0], category=[$1], rid=[$2], populary_num=[$3], plat1=[$4], 
privatehost=[$5], profileroom=[$6]) HiveJoin(condition=[true], 
joinType=[inner], algorithm=[none], cost=[not available]) 
HiveProject(plat=[CAST(_UTF-16LE'huya'):VARCHAR(2147483647) CHARACTER SET 
"UTF-16LE" COLLATE "ISO-8859-1$en_US$primary"], category=[$3], 
rid=[CAST(_UTF-16LE'meijiao'):VARCHAR(2147483647) CHARACTER SET "UTF-16LE" 
COLLATE "ISO-8859-1$en_US$primary"], populary_num=[$5]) 
HiveFilter(condition=[AND(=($12, _UTF-16LE'20180819'), =($7, _UTF-16LE'huya'), 
=($0, _UTF-16LE'meijiao'))]) HiveTableScan(table=[[panda_com.crawler_anchor]], 
table:alias=[crawler_anchor]) HiveProject(plat=[_UTF-16LE'huya'], 
privatehost=[$1], profileroom=[$2]) HiveProject($f0=[_UTF-16LE'huya'], 
$f1=[$0], $f2=[$1]) HiveProject($f1=[$0], 
$f2=[CAST(_UTF-16LE'meijiao'):VARCHAR(2147483647) CHARACTER SET "UTF-16LE" 
COLLATE "ISO-8859-1$en_US$primary"]) HiveAggregate(group=[{2}]) 
HiveProject($f0=[_UTF-16LE'huya'], 
$f1=[CAST(_UTF-16LE'meijiao'):VARCHAR(2147483647) CHARACTER SET "UTF-16LE" 
COLLATE "ISO-8859-1$en_US$primary"], $f2=[$4]) HiveFilter(condition=[AND(=($19, 
_UTF-16LE'20180819'), =(CAST(_UTF-16LE'meijiao'):VARCHAR(2147483647) CHARACTER 
SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary", $8))]) 
HiveTableScan(table=[[panda_com.ol_huya_isonline]], table:alias=[edwin])
{code}


> AggregateProjectPullUpConstantsRule causes fields to be out of order
> 
>
> Key: CALCITE-3762
> URL: https://issues.apache.org/jira/browse/CALCITE-3762
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.10.0, 1.16.0
>Reporter: hezhang
>Priority: Major
> Fix For: 1.16.0
>
> Attachments: image-2020-02-01-01-29-49-479.png
>
>
> the sql:
> {code:java}
> select * FROM( SELECT plat, category, rid, populary_num FROM 
> panda_com.crawler_anchor WHERE
>  par_date = '20180819'
> 

[jira] [Updated] (CALCITE-3762) AggregateProjectPullUpConstantsRule causes fields to be out of order

2020-01-31 Thread hezhang (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hezhang updated CALCITE-3762:
-
Attachment: image-2020-02-01-01-29-49-479.png

> AggregateProjectPullUpConstantsRule causes fields to be out of order
> 
>
> Key: CALCITE-3762
> URL: https://issues.apache.org/jira/browse/CALCITE-3762
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.10.0, 1.16.0
>Reporter: hezhang
>Priority: Major
> Fix For: 1.16.0
>
> Attachments: image-2020-02-01-01-29-49-479.png
>
>
> the sql:
> {code:java}
> select * FROM( SELECT plat, category, rid, populary_num FROM 
> panda_com.crawler_anchor WHERE
>  par_date = '20180819'
>  AND plat = 'huya'
>  AND rid = 'meijiao'
> ) a
>  JOIN
>  (
>  SELECT DISTINCT
>  'huya' plat ,
>  edwin.privatehost ,
>  edwin.profileroom
>  FROM
>  panda_com.ol_huya_isOnline edwin
>  WHERE
>  par_date = '20180819' ) m9
>  ON
>  a.rid= m9.privatehost
>  AND a.plat = m9.plat{code}
> the result:
>  
> {code:java}
> huya yule meijiao 30 huya 10001242 meijiao
> {code}
>  
> but the desired result is:
>  
> {code:java}
> huya yule meijiao 30 huya meijiao 10001242  
> {code}
>  
> *cause:*
> hepPlanner use AggregateProjectPullUpConstantsRule :
> {code:java}
> HiveProject(plat=[$0], category=[$1], rid=[$2], populary_num=[$3], 
> plat1=[$4], privatehost=[$5], profileroom=[$6]) HiveJoin(condition=[true], 
> joinType=[inner], algorithm=[none], cost=[not available]) 
> HiveProject(plat=[CAST(_UTF-16LE'huya'):VARCHAR(2147483647) CHARACTER SET 
> "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary"], category=[$3], 
> rid=[CAST(_UTF-16LE'meijiao'):VARCHAR(2147483647) CHARACTER SET "UTF-16LE" 
> COLLATE "ISO-8859-1$en_US$primary"], populary_num=[$5]) 
> HiveFilter(condition=[AND(=($12, _UTF-16LE'20180819'), =($7, 
> _UTF-16LE'huya'), =($0, _UTF-16LE'meijiao'))]) 
> HiveTableScan(table=[[panda_com.crawler_anchor]], 
> table:alias=[crawler_anchor]) HiveProject(plat=[_UTF-16LE'huya'], 
> privatehost=[$1], profileroom=[$2]) HiveProject($f0=[_UTF-16LE'huya'], 
> $f1=[$0], $f2=[$1]) HiveProject($f1=[$0], 
> $f2=[CAST(_UTF-16LE'meijiao'):VARCHAR(2147483647) CHARACTER SET "UTF-16LE" 
> COLLATE "ISO-8859-1$en_US$primary"]) HiveAggregate(group=[{2}]) 
> HiveProject($f0=[_UTF-16LE'huya'], 
> $f1=[CAST(_UTF-16LE'meijiao'):VARCHAR(2147483647) CHARACTER SET "UTF-16LE" 
> COLLATE "ISO-8859-1$en_US$primary"], $f2=[$4]) 
> HiveFilter(condition=[AND(=($19, _UTF-16LE'20180819'), 
> =(CAST(_UTF-16LE'meijiao'):VARCHAR(2147483647) CHARACTER SET "UTF-16LE" 
> COLLATE "ISO-8859-1$en_US$primary", $8))]) 
> HiveTableScan(table=[[panda_com.ol_huya_isonline]], table:alias=[edwin])
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CALCITE-3762) AggregateProjectPullUpConstantsRule causes fields to be out of order

2020-01-31 Thread hezhang (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hezhang updated CALCITE-3762:
-
Description: 
the sql:
{code:java}
select * FROM( SELECT plat, category, rid, populary_num FROM 
panda_com.crawler_anchor WHERE
 par_date = '20180819'
 AND plat = 'huya'
 AND rid = 'meijiao'
) a
 JOIN
 (
 SELECT DISTINCT
 'huya' plat ,
 edwin.privatehost ,
 edwin.profileroom
 FROM
 panda_com.ol_huya_isOnline edwin
 WHERE
 par_date = '20180819' ) m9
 ON
 a.rid= m9.privatehost
 AND a.plat = m9.plat{code}
the result:

 
{code:java}
huya yule meijiao 30 huya 10001242 meijiao
{code}
 

but the desired result is:

 
{code:java}
huya yule meijiao 30 huya meijiao 10001242  
{code}
 

*cause:*

hepPlanner use AggregateProjectPullUpConstantsRule :
{code:java}
HiveProject(plat=[$0], category=[$1], rid=[$2], populary_num=[$3], plat1=[$4], 
privatehost=[$5], profileroom=[$6]) HiveJoin(condition=[true], 
joinType=[inner], algorithm=[none], cost=[not available]) 
HiveProject(plat=[CAST(_UTF-16LE'huya'):VARCHAR(2147483647) CHARACTER SET 
"UTF-16LE" COLLATE "ISO-8859-1$en_US$primary"], category=[$3], 
rid=[CAST(_UTF-16LE'meijiao'):VARCHAR(2147483647) CHARACTER SET "UTF-16LE" 
COLLATE "ISO-8859-1$en_US$primary"], populary_num=[$5]) 
HiveFilter(condition=[AND(=($12, _UTF-16LE'20180819'), =($7, _UTF-16LE'huya'), 
=($0, _UTF-16LE'meijiao'))]) HiveTableScan(table=[[panda_com.crawler_anchor]], 
table:alias=[crawler_anchor]) HiveProject(plat=[_UTF-16LE'huya'], 
privatehost=[$1], profileroom=[$2]) HiveProject($f0=[_UTF-16LE'huya'], 
$f1=[$0], $f2=[$1]) HiveProject($f1=[$0], 
$f2=[CAST(_UTF-16LE'meijiao'):VARCHAR(2147483647) CHARACTER SET "UTF-16LE" 
COLLATE "ISO-8859-1$en_US$primary"]) HiveAggregate(group=[{2}]) 
HiveProject($f0=[_UTF-16LE'huya'], 
$f1=[CAST(_UTF-16LE'meijiao'):VARCHAR(2147483647) CHARACTER SET "UTF-16LE" 
COLLATE "ISO-8859-1$en_US$primary"], $f2=[$4]) HiveFilter(condition=[AND(=($19, 
_UTF-16LE'20180819'), =(CAST(_UTF-16LE'meijiao'):VARCHAR(2147483647) CHARACTER 
SET "UTF-16LE" COLLATE "ISO-8859-1$en_US$primary", $8))]) 
HiveTableScan(table=[[panda_com.ol_huya_isonline]], table:alias=[edwin])
{code}

  was:
the sql:
{code:java}
select * FROM( SELECT plat, category, rid, populary_num FROM 
panda_com.crawler_anchor WHERE
 par_date = '20180819'
 AND plat = 'huya'
 AND rid = 'meijiao'
) a
 JOIN
 (
 SELECT DISTINCT
 'huya' plat ,
 edwin.privatehost ,
 edwin.profileroom
 FROM
 panda_com.ol_huya_isOnline edwin
 WHERE
 par_date = '20180819' ) m9
 ON
 a.rid= m9.privatehost
 AND a.plat = m9.plat{code}
then hepPlanner use 


> AggregateProjectPullUpConstantsRule causes fields to be out of order
> 
>
> Key: CALCITE-3762
> URL: https://issues.apache.org/jira/browse/CALCITE-3762
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.10.0, 1.16.0
>Reporter: hezhang
>Priority: Major
> Fix For: 1.16.0
>
>
> the sql:
> {code:java}
> select * FROM( SELECT plat, category, rid, populary_num FROM 
> panda_com.crawler_anchor WHERE
>  par_date = '20180819'
>  AND plat = 'huya'
>  AND rid = 'meijiao'
> ) a
>  JOIN
>  (
>  SELECT DISTINCT
>  'huya' plat ,
>  edwin.privatehost ,
>  edwin.profileroom
>  FROM
>  panda_com.ol_huya_isOnline edwin
>  WHERE
>  par_date = '20180819' ) m9
>  ON
>  a.rid= m9.privatehost
>  AND a.plat = m9.plat{code}
> the result:
>  
> {code:java}
> huya yule meijiao 30 huya 10001242 meijiao
> {code}
>  
> but the desired result is:
>  
> {code:java}
> huya yule meijiao 30 huya meijiao 10001242  
> {code}
>  
> *cause:*
> hepPlanner use AggregateProjectPullUpConstantsRule :
> {code:

[jira] [Updated] (CALCITE-3762) AggregateProjectPullUpConstantsRule causes fields to be out of order

2020-01-31 Thread hezhang (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hezhang updated CALCITE-3762:
-
Description: 
the sql:
{code:java}
select * FROM( SELECT plat, category, rid, populary_num FROM 
panda_com.crawler_anchor WHERE
 par_date = '20180819'
 AND plat = 'huya'
 AND rid = 'meijiao'
) a
 JOIN
 (
 SELECT DISTINCT
 'huya' plat ,
 edwin.privatehost ,
 edwin.profileroom
 FROM
 panda_com.ol_huya_isOnline edwin
 WHERE
 par_date = '20180819' ) m9
 ON
 a.rid= m9.privatehost
 AND a.plat = m9.plat{code}
then hepPlanner use 

  was:
the sql:
{code:java}
select * FROM( SELECT plat, category, rid, populary_num FROM 
panda_com.crawler_anchor WHERE
 par_date = '20180819'
 AND plat = 'huya'
 AND rid = 'meijiao'
) a
 JOIN
 (
 SELECT DISTINCT
 'huya' plat ,
 edwin.privatehost ,
 edwin.profileroom
 FROM
 panda_com.ol_huya_isOnline edwin
 WHERE
 par_date = '20180819' ) m9
 ON
 a.rid= m9.privatehost
 AND a.plat = m9.plat{code}


> AggregateProjectPullUpConstantsRule causes fields to be out of order
> 
>
> Key: CALCITE-3762
> URL: https://issues.apache.org/jira/browse/CALCITE-3762
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.10.0, 1.16.0
>Reporter: hezhang
>Priority: Major
> Fix For: 1.16.0
>
>
> the sql:
> {code:java}
> select * FROM( SELECT plat, category, rid, populary_num FROM 
> panda_com.crawler_anchor WHERE
>  par_date = '20180819'
>  AND plat = 'huya'
>  AND rid = 'meijiao'
> ) a
>  JOIN
>  (
>  SELECT DISTINCT
>  'huya' plat ,
>  edwin.privatehost ,
>  edwin.profileroom
>  FROM
>  panda_com.ol_huya_isOnline edwin
>  WHERE
>  par_date = '20180819' ) m9
>  ON
>  a.rid= m9.privatehost
>  AND a.plat = m9.plat{code}
> then hepPlanner use 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CALCITE-3762) AggregateProjectPullUpConstantsRule causes fields to be out of order

2020-01-31 Thread hezhang (Jira)


 [ 
https://issues.apache.org/jira/browse/CALCITE-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hezhang updated CALCITE-3762:
-
Description: 
the sql:
{code:java}
select * FROM( SELECT plat, category, rid, populary_num FROM 
panda_com.crawler_anchor WHERE
 par_date = '20180819'
 AND plat = 'huya'
 AND rid = 'meijiao'
) a
 JOIN
 (
 SELECT DISTINCT
 'huya' plat ,
 edwin.privatehost ,
 edwin.profileroom
 FROM
 panda_com.ol_huya_isOnline edwin
 WHERE
 par_date = '20180819' ) m9
 ON
 a.rid= m9.privatehost
 AND a.plat = m9.plat{code}

> AggregateProjectPullUpConstantsRule causes fields to be out of order
> 
>
> Key: CALCITE-3762
> URL: https://issues.apache.org/jira/browse/CALCITE-3762
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.10.0, 1.16.0
>Reporter: hezhang
>Priority: Major
> Fix For: 1.16.0
>
>
> the sql:
> {code:java}
> select * FROM( SELECT plat, category, rid, populary_num FROM 
> panda_com.crawler_anchor WHERE
>  par_date = '20180819'
>  AND plat = 'huya'
>  AND rid = 'meijiao'
> ) a
>  JOIN
>  (
>  SELECT DISTINCT
>  'huya' plat ,
>  edwin.privatehost ,
>  edwin.profileroom
>  FROM
>  panda_com.ol_huya_isOnline edwin
>  WHERE
>  par_date = '20180819' ) m9
>  ON
>  a.rid= m9.privatehost
>  AND a.plat = m9.plat{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CALCITE-3762) AggregateProjectPullUpConstantsRule causes fields to be out of order

2020-01-31 Thread hezhang (Jira)
hezhang created CALCITE-3762:


 Summary: AggregateProjectPullUpConstantsRule causes fields to be 
out of order
 Key: CALCITE-3762
 URL: https://issues.apache.org/jira/browse/CALCITE-3762
 Project: Calcite
  Issue Type: Bug
  Components: core
Affects Versions: 1.16.0, 1.10.0
Reporter: hezhang
 Fix For: 1.16.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-3760) Rewriting non-deterministic function can break query semantics

2020-01-31 Thread Jin Xing (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027518#comment-17027518
 ] 

Jin Xing commented on CALCITE-3760:
---

{quote}As we discussed recently, it would be illegal to merge those Projects 
because of the UDF.
{quote}
Yes, in addition to rewriting SqlNode, determinism of operator also affects 
plan optimization. I think that's CALCITE-2348 try to fix.  Firing certain 
rules on non-deterministic operators as normal might fail to guarantee plan 
equivalence.

> Rewriting non-deterministic function can break query semantics
> --
>
> Key: CALCITE-3760
> URL: https://issues.apache.org/jira/browse/CALCITE-3760
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Jin Xing
>Assignee: Jin Xing
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Calcite rewrite some *SqlFunctions* during validation. But whether the 
> function is deterministic is not considered. For a non-deterministic 
> operator, the rewriting can break semantics. Additionally there's no 
> interface for user to specify the determinism for a UDF/UDAF. 
> Say I have non-deterministic UDF & UDAF and run sql like below
> {code:java}
> select coalesce(udf(col0), 100) from foo;
> select nullif(udaf(col0), 1024) from foo;{code}
> They will be rewritten as
> {code:java}
> select case when udf(col0) is not null then udf(col0) else 100 end
> from foo;
> select case when udaf(col0)=1024 then null udaf(col0)
> from foo{code}
> As we can see that non-deterministic UDF & UDAF are called multiple times 
> after written. Thus the condition in WHEN clause might NOT be held all the 
> time.
> We need to provide an interface for user to specify the determinism in 
> UDF/UDAF and consider whether a SqlNode is deterministic when rewriting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (CALCITE-3760) Rewriting non-deterministic function can break query semantics

2020-01-31 Thread Jin Xing (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027507#comment-17027507
 ] 

Jin Xing edited comment on CALCITE-3760 at 1/31/20 1:50 PM:


Thanks a lot for your kind help [~julianhyde] ~

Actually in my production, we bypass this issue by asking our user to follow a 
contract – – non-deterministic UDF/UDAF should always called independently and 
not allowed to be nested in another SqlCall. We have legal and illegal cases 
like below 
{code:java}
-- legal
select coalesce(udf_col, 100) from 
  (select udf(col) udf_col from foo)

-- illegal
select coalesce(udf(col), 100) from foo{code}
{quote}So, maybe the best way is to use a Project on a Project
{quote}
I understand this comment as to add a Project for the non-deterministic 
UDF/UDAF, thus to guarantee the number of times to be evaluated. I think it 
will work correctly and keep the query semantics. But in current 
implementation, the rewriting of SqlNode works only for SqlCall[1] during 
SqlValidatorImpl#performUnconditionalRewrites. If a SqlFunction wants to 
customize rewriting logic, it just defines how itself is transformed – –  no 
need to care where it locates. But If we want to rewrite and add the Project, 
the customizing rewriting logic will need to touch the outside SqlSelect, which 
I think will complicate the code.

Additionally,  SqlValidatorImpl#performUnconditionalRewrites runs before nodes 
are expanded and fully resolved. The rewriting might be hard. Think about below 
example:
{code:java}
select *, coalesce(udf(col), 100) from foo
{code}
If we simply rewrite it as
{code:java}
select *, case when x is not null then x else 100 end
from (
  select *, udf(c)
  from foo)
{code}
The semantics will be changed.

I still propose to don't do rewriting when found non-deterministic, and for 
deterministic operators the behavior is as before. AFAIK, the reason Calcite 
rewrite functions like COALESCE and NULLIF is for better and simpler 
implementation (NULLIF doesn't have a self implementation yet.). Is there other 
reasons that I missed ?

Thanks again for your help !

 

[1][https://github.com/apache/calcite/blob/c416c31fc376868bdd672afd84ec06dc75d56575/core/src/main/java/org/apache/calcite/sql/SqlOperator.java#L316]
 

 


was (Author: jinxing6...@126.com):
Thanks a lot for your kind help [~julianhyde] ~

Actually in my production, we bypass this issue by asking our user to follow a 
contract – – non-deterministic UDF/UDAF should always called independently and 
not allowed to be nested in another SqlCall. We have legal and illegal cases 
like below

 
{code:java}
-- legal
select coalesce(udf_col, 100) from 
  (select udf(col) udf_col from foo)

-- illegal
select coalesce(udf(col), 100) from foo{code}
 
{quote}So, maybe the best way is to use a Project on a Project
{quote}
I understand this comment as to add a Project for the non-deterministic 
UDF/UDAF, thus to guarantee the number of times to be evaluated. I think it 
will work correctly and keep the query semantics. But in current 
implementation, the rewriting of SqlNode works only for SqlCall[1] during 
SqlValidatorImpl#performUnconditionalRewrites. If a SqlFunction wants to 
customize rewriting logic, it just defines how itself is transformed – –  no 
need to care where it locates. But If we want to rewrite and add the Project, 
the customizing rewriting logic will need to touch the outside SqlSelect, which 
I think will complicate the code.

Additionally,  SqlValidatorImpl#performUnconditionalRewrites runs before nodes 
are expanded and fully resolved. The rewriting might be hard. Think about below 
example:
{code:java}
select *, coalesce(udf(col), 100) from foo
{code}
If we simply rewrite it as
{code:java}
select *, case when x is not null then x else 100 end
from (
  select *, udf(c)
  from foo)
{code}
The semantics will be changed.

I still propose to don't do rewriting when found non-deterministic, and for 
deterministic operators the behavior is as before. AFAIK, the reason Calcite 
rewrite functions like COALESCE and NULLIF is for better and simpler 
implementation (NULLIF doesn't have a self implementation yet.). Is there other 
reasons that I missed ?

[1][https://github.com/apache/calcite/blob/c416c31fc376868bdd672afd84ec06dc75d56575/core/src/main/java/org/apache/calcite/sql/SqlOperator.java#L316]
 

 

> Rewriting non-deterministic function can break query semantics
> --
>
> Key: CALCITE-3760
> URL: https://issues.apache.org/jira/browse/CALCITE-3760
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Jin Xing
>Assignee: Jin Xing
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Calcit

[jira] [Commented] (CALCITE-3760) Rewriting non-deterministic function can break query semantics

2020-01-31 Thread Jin Xing (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027507#comment-17027507
 ] 

Jin Xing commented on CALCITE-3760:
---

Thanks a lot for your kind help [~julianhyde] ~

Actually in my production, we bypass this issue by asking our user to follow a 
contract – – non-deterministic UDF/UDAF should always called independently and 
not allowed to be nested in another SqlCall. We have legal and illegal cases 
like below

 
{code:java}
-- legal
select coalesce(udf_col, 100) from 
  (select udf(col) udf_col from foo)

-- illegal
select coalesce(udf(col), 100) from foo{code}
 
{quote}So, maybe the best way is to use a Project on a Project
{quote}
I understand this comment as to add a Project for the non-deterministic 
UDF/UDAF, thus to guarantee the number of times to be evaluated. I think it 
will work correctly and keep the query semantics. But in current 
implementation, the rewriting of SqlNode works only for SqlCall[1] during 
SqlValidatorImpl#performUnconditionalRewrites. If a SqlFunction wants to 
customize rewriting logic, it just defines how itself is transformed – –  no 
need to care where it locates. But If we want to rewrite and add the Project, 
the customizing rewriting logic will need to touch the outside SqlSelect, which 
I think will complicate the code.

Additionally,  SqlValidatorImpl#performUnconditionalRewrites runs before nodes 
are expanded and fully resolved. The rewriting might be hard. Think about below 
example:
{code:java}
select *, coalesce(udf(col), 100) from foo
{code}
If we simply rewrite it as
{code:java}
select *, case when x is not null then x else 100 end
from (
  select *, udf(c)
  from foo)
{code}
The semantics will be changed.

I still propose to don't do rewriting when found non-deterministic, and for 
deterministic operators the behavior is as before. AFAIK, the reason Calcite 
rewrite functions like COALESCE and NULLIF is for better and simpler 
implementation (NULLIF doesn't have a self implementation yet.). Is there other 
reasons that I missed ?

[1][https://github.com/apache/calcite/blob/c416c31fc376868bdd672afd84ec06dc75d56575/core/src/main/java/org/apache/calcite/sql/SqlOperator.java#L316]
 

 

> Rewriting non-deterministic function can break query semantics
> --
>
> Key: CALCITE-3760
> URL: https://issues.apache.org/jira/browse/CALCITE-3760
> Project: Calcite
>  Issue Type: Bug
>  Components: core
>Reporter: Jin Xing
>Assignee: Jin Xing
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Calcite rewrite some *SqlFunctions* during validation. But whether the 
> function is deterministic is not considered. For a non-deterministic 
> operator, the rewriting can break semantics. Additionally there's no 
> interface for user to specify the determinism for a UDF/UDAF. 
> Say I have non-deterministic UDF & UDAF and run sql like below
> {code:java}
> select coalesce(udf(col0), 100) from foo;
> select nullif(udaf(col0), 1024) from foo;{code}
> They will be rewritten as
> {code:java}
> select case when udf(col0) is not null then udf(col0) else 100 end
> from foo;
> select case when udaf(col0)=1024 then null udaf(col0)
> from foo{code}
> As we can see that non-deterministic UDF & UDAF are called multiple times 
> after written. Thus the condition in WHEN clause might NOT be held all the 
> time.
> We need to provide an interface for user to specify the determinism in 
> UDF/UDAF and consider whether a SqlNode is deterministic when rewriting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (CALCITE-3716) ResultSetMetaData.getTableName should return empty string, not null, when column does not map to a table

2020-01-31 Thread Jin Xing (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027285#comment-17027285
 ] 

Jin Xing edited comment on CALCITE-3716 at 1/31/20 8:45 AM:


{color:#172b4d}Thanks a lot [~julianhyde] for review ~{color}
{quote}Does {{getColumnName}} need to be fixed also?
{quote} 
* {color:#172b4d}I didn't find from {color}[JDBC 
spec|https://docs.oracle.com/javase/8/docs/api/java/sql/ResultSetMetaData.html#getColumnName-int-]{color:#172b4d}
 specifies that {color}*columnName*{color:#172b4d} should be empty string 
rather than null when not applicable.{color}
 * Constructor of ColumnMetaDatas[1] sets *columnName* by *label* (field name 
in row type) if the param passed in is null.
 
I updated and replaced _assertEquals_ with _assertThat_

 

[1][https://github.com/apache/calcite-avatica/blob/dd65a2b18b8c35cfccf1c47b6be87ea7db3ad658/core/src/main/java/org/apache/calcite/avatica/ColumnMetaData.java#L122]


was (Author: jinxing6...@126.com):
{color:#172b4d}Thanks a lot [~julianhyde] for review ~{color}
 * {color:#172b4d}I didn't find from {color}[JDBC 
spec|https://docs.oracle.com/javase/8/docs/api/java/sql/ResultSetMetaData.html#getColumnName-int-]{color:#172b4d}
 specifies that {color}*columnName*{color:#172b4d} should be empty string 
rather than null when not applicable.{color}
 * Constructor of ColumnMetaDatas[1] sets *columnName* by *label* (field name 
in row type) if the param passed in is null.
 * I updated and replaced _assertEquals_ with _assertThat_

 

[1][https://github.com/apache/calcite-avatica/blob/dd65a2b18b8c35cfccf1c47b6be87ea7db3ad658/core/src/main/java/org/apache/calcite/avatica/ColumnMetaData.java#L122]

> ResultSetMetaData.getTableName should return empty string, not null, when 
> column does not map to a table
> 
>
> Key: CALCITE-3716
> URL: https://issues.apache.org/jira/browse/CALCITE-3716
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Assignee: Jin Xing
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Per the [JDBC 
> spec|https://docs.oracle.com/javase/8/docs/api/java/sql/ResultSetMetaData.html#getTableName-int-],
>  {{ResultSetMetaData.getTableName}} should return empty string, not null, 
> when column does not map to a table. Similarly getCatalogName, getSchemaName, 
> getColumnName.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (CALCITE-3716) ResultSetMetaData.getTableName should return empty string, not null, when column does not map to a table

2020-01-31 Thread Jin Xing (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027285#comment-17027285
 ] 

Jin Xing edited comment on CALCITE-3716 at 1/31/20 8:39 AM:


{color:#172b4d}Thanks a lot [~julianhyde] for review ~{color}
 * {color:#172b4d}I didn't find from {color}[JDBC 
spec|https://docs.oracle.com/javase/8/docs/api/java/sql/ResultSetMetaData.html#getColumnName-int-]{color:#172b4d}
 specifies that {color}*columnName*{color:#172b4d} should be empty string 
rather than null when not applicable.{color}
 * Constructor of ColumnMetaDatas[1] sets *columnName* by *label* if the param 
passed in is null.
 * I updated and replaced _assertEquals_ with _assertThat_

 

[1][https://github.com/apache/calcite-avatica/blob/dd65a2b18b8c35cfccf1c47b6be87ea7db3ad658/core/src/main/java/org/apache/calcite/avatica/ColumnMetaData.java#L122]


was (Author: jinxing6...@126.com):
{color:#172b4d}Thanks a lot [~julianhyde] for review ~{color}
 * {color:#172b4d}I didn't find from {color}[JDBC 
spec|https://docs.oracle.com/javase/8/docs/api/java/sql/ResultSetMetaData.html#getColumnName-int-]{color:#172b4d}
 specifies that {color}*columnName*{color:#172b4d} should be empty string 
rather than null.{color}
 * Constructor of ColumnMetaDatas[1] sets *columnName* by *label* if the param 
passed in is null.
 * I updated and replaced _assertEquals_ with _assertThat_

 

[1][https://github.com/apache/calcite-avatica/blob/dd65a2b18b8c35cfccf1c47b6be87ea7db3ad658/core/src/main/java/org/apache/calcite/avatica/ColumnMetaData.java#L122]

> ResultSetMetaData.getTableName should return empty string, not null, when 
> column does not map to a table
> 
>
> Key: CALCITE-3716
> URL: https://issues.apache.org/jira/browse/CALCITE-3716
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Assignee: Jin Xing
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Per the [JDBC 
> spec|https://docs.oracle.com/javase/8/docs/api/java/sql/ResultSetMetaData.html#getTableName-int-],
>  {{ResultSetMetaData.getTableName}} should return empty string, not null, 
> when column does not map to a table. Similarly getCatalogName, getSchemaName, 
> getColumnName.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (CALCITE-3716) ResultSetMetaData.getTableName should return empty string, not null, when column does not map to a table

2020-01-31 Thread Jin Xing (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027285#comment-17027285
 ] 

Jin Xing edited comment on CALCITE-3716 at 1/31/20 8:39 AM:


{color:#172b4d}Thanks a lot [~julianhyde] for review ~{color}
 * {color:#172b4d}I didn't find from {color}[JDBC 
spec|https://docs.oracle.com/javase/8/docs/api/java/sql/ResultSetMetaData.html#getColumnName-int-]{color:#172b4d}
 specifies that {color}*columnName*{color:#172b4d} should be empty string 
rather than null when not applicable.{color}
 * Constructor of ColumnMetaDatas[1] sets *columnName* by *label* (field name 
in row type) if the param passed in is null.
 * I updated and replaced _assertEquals_ with _assertThat_

 

[1][https://github.com/apache/calcite-avatica/blob/dd65a2b18b8c35cfccf1c47b6be87ea7db3ad658/core/src/main/java/org/apache/calcite/avatica/ColumnMetaData.java#L122]


was (Author: jinxing6...@126.com):
{color:#172b4d}Thanks a lot [~julianhyde] for review ~{color}
 * {color:#172b4d}I didn't find from {color}[JDBC 
spec|https://docs.oracle.com/javase/8/docs/api/java/sql/ResultSetMetaData.html#getColumnName-int-]{color:#172b4d}
 specifies that {color}*columnName*{color:#172b4d} should be empty string 
rather than null when not applicable.{color}
 * Constructor of ColumnMetaDatas[1] sets *columnName* by *label* if the param 
passed in is null.
 * I updated and replaced _assertEquals_ with _assertThat_

 

[1][https://github.com/apache/calcite-avatica/blob/dd65a2b18b8c35cfccf1c47b6be87ea7db3ad658/core/src/main/java/org/apache/calcite/avatica/ColumnMetaData.java#L122]

> ResultSetMetaData.getTableName should return empty string, not null, when 
> column does not map to a table
> 
>
> Key: CALCITE-3716
> URL: https://issues.apache.org/jira/browse/CALCITE-3716
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Assignee: Jin Xing
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Per the [JDBC 
> spec|https://docs.oracle.com/javase/8/docs/api/java/sql/ResultSetMetaData.html#getTableName-int-],
>  {{ResultSetMetaData.getTableName}} should return empty string, not null, 
> when column does not map to a table. Similarly getCatalogName, getSchemaName, 
> getColumnName.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CALCITE-3716) ResultSetMetaData.getTableName should return empty string, not null, when column does not map to a table

2020-01-31 Thread Jin Xing (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-3716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17027285#comment-17027285
 ] 

Jin Xing commented on CALCITE-3716:
---

{color:#172b4d}Thanks a lot [~julianhyde] for review ~{color}
 * {color:#172b4d}I didn't find from {color}[JDBC 
spec|https://docs.oracle.com/javase/8/docs/api/java/sql/ResultSetMetaData.html#getColumnName-int-]{color:#172b4d}
 specifies that {color}*columnName*{color:#172b4d} should be empty string 
rather than null.{color}
 * Constructor of ColumnMetaDatas[1] sets *columnName* by *label* if the param 
passed in is null.
 * I updated and replaced _assertEquals_ with _assertThat_

 

[1][https://github.com/apache/calcite-avatica/blob/dd65a2b18b8c35cfccf1c47b6be87ea7db3ad658/core/src/main/java/org/apache/calcite/avatica/ColumnMetaData.java#L122]

> ResultSetMetaData.getTableName should return empty string, not null, when 
> column does not map to a table
> 
>
> Key: CALCITE-3716
> URL: https://issues.apache.org/jira/browse/CALCITE-3716
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Assignee: Jin Xing
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Per the [JDBC 
> spec|https://docs.oracle.com/javase/8/docs/api/java/sql/ResultSetMetaData.html#getTableName-int-],
>  {{ResultSetMetaData.getTableName}} should return empty string, not null, 
> when column does not map to a table. Similarly getCatalogName, getSchemaName, 
> getColumnName.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)