[jira] [Commented] (CALCITE-1317) Rewrite "MAX" as "ORDER BY ... LIMIT ..."

2024-02-03 Thread Julian Hyde (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17814003#comment-17814003
 ] 

Julian Hyde commented on CALCITE-1317:
--

[~jiajunbernoulli], I agree that those optimizations make sense. Note that 
converting "{{WHERE row_number()}} < constant" to "{{LIMIT}} constant" is the 
topic of CALCITE-3181 (and what you have described here is the special case 
where the {{GROUP BY}} key is empty).

> Rewrite "MAX" as "ORDER BY ... LIMIT ..."
> -
>
> Key: CALCITE-1317
> URL: https://issues.apache.org/jira/browse/CALCITE-1317
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Priority: Major
>
> We could optimize 
> {code}
> SELECT *
> FROM emp
> WHERE empno = (SELECT max(empno) FROM emp)
> {code}
> to
> {code}
> SELECT *
> FROM emp
> ORDER BY empno DESC LIMIT 1
> {code}
> (using the fact that {{empno}} is unique and non-NULL). Similarly, we can 
> rewrite
> {code}
> SELECT max(sal)
> FROM emp
> {code}
> to
> {code}
> SELECT sal
> FROM emp
> ORDER BY sal DESC LIMIT 1
> {code}
> (not making any assumptions about whether {{sal}} is unique or allows NULL 
> values) and we can rewrite a query to find the highest paid employee(s) in 
> each department
> {code}
> SELECT *
> FROM emp AS e
> WHERE sal = (
>   SELECT max(sal)
>   FROM emp AS e2
>   WHERE e2.deptno = e.deptno)
> {code}
> as
> {code}
> SELECT deptno, empno, sal
> FROM (
>   SELECT deptno, empno, sal, FIRST_VALUE(sal) OVER w AS topSal
>   FROM emp
>   WINDOW w AS (PARTITION BY deptno ORDER BY sal DESC))
> WHERE sal = topSal
> {code}
> We might benefit from a generalized {{Sort(limit)}} operator that can find 
> the top N within any prefix of the sort key, not just the top N overall.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-1317) Rewrite "MAX" as "ORDER BY ... LIMIT ..."

2024-02-03 Thread Jiajun Xie (Jira)


[ 
https://issues.apache.org/jira/browse/CALCITE-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17813895#comment-17813895
 ] 

Jiajun Xie commented on CALCITE-1317:
-

Oh, I am thinking about similar optimizations recently:
 - Empty OVER
{code:java}
SELECT orderkey, orderstatus FROM ( 
   SELECT row_number() OVER () rn, orderkey, orderstatus
   FROM orders
) WHERE rn <= 5{code}
to
{code:java}
SELECT orderkey, orderstatus 
FROM ORDERS 
LIMIT 5 {code}

 - OVER with ORDER BY
{code:java}
SELECT orderkey, orderstatus FROM ( 
SELECT row_number() OVER (ORDER BY orderstatus) rn, orderkey, orderstatus 
FROM orders ) 
WHERE rn <= 5 {code}
to
{code:java}
SELECT orderkey, orderstatus 
FROM orders 
ORDER BY orderkey 
LIMIT 5{code}

> Rewrite "MAX" as "ORDER BY ... LIMIT ..."
> -
>
> Key: CALCITE-1317
> URL: https://issues.apache.org/jira/browse/CALCITE-1317
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Priority: Major
>
> We could optimize 
> {code}
> SELECT *
> FROM emp
> WHERE empno = (SELECT max(empno) FROM emp)
> {code}
> to
> {code}
> SELECT *
> FROM emp
> ORDER BY empno DESC LIMIT 1
> {code}
> (using the fact that {{empno}} is unique and non-NULL). Similarly, we can 
> rewrite
> {code}
> SELECT max(sal)
> FROM emp
> {code}
> to
> {code}
> SELECT sal
> FROM emp
> ORDER BY sal DESC LIMIT 1
> {code}
> (not making any assumptions about whether {{sal}} is unique or allows NULL 
> values) and we can rewrite a query to find the highest paid employee(s) in 
> each department
> {code}
> SELECT *
> FROM emp AS e
> WHERE sal = (
>   SELECT max(sal)
>   FROM emp AS e2
>   WHERE e2.deptno = e.deptno)
> {code}
> as
> {code}
> SELECT deptno, empno, sal
> FROM (
>   SELECT deptno, empno, sal, FIRST_VALUE(sal) OVER w AS topSal
>   FROM emp
>   WINDOW w AS (PARTITION BY deptno ORDER BY sal DESC))
> WHERE sal = topSal
> {code}
> We might benefit from a generalized {{Sort(limit)}} operator that can find 
> the top N within any prefix of the sort key, not just the top N overall.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (CALCITE-1317) Rewrite "MAX" as "ORDER BY ... LIMIT ..."

2018-01-11 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16323515#comment-16323515
 ] 

Julian Hyde commented on CALCITE-1317:
--

This is similar to the paper [WinMagic: Subquery Elimination Using Window 
Aggregation (Zuarte et al, 
2003)|https://pdfs.semanticscholar.org/0bfa/e505ad588d00d4b204acf8ba4b5646eac244.pdf],
 which converts MAX in a correlated sub-query into OVER. For example it rewrites

{code}SELECT SUM(l_extendedprice) / 7.0 AS avg_yearly
FROM   tpcd.lineitem, tpcd.part
WHERE p_partkey = l_partkey AND
  p_brand = 'Brand#23' AND
  p_container = 'MED BOX' AND
  l_quantity<(SELECT 0.2*avg(l_quantity)
  FROM tpcd.lineitem
  WHERE l_partkey = p_partkey);{code} to {code}WITH WinMagic AS
  (SELECT l_extendedprice, l_quantity,
avg(l_quantity)over(partition by p_partkey)
  AS avg_l_quantity
   FROM  tpcd.lineitem, tpcd.part
   WHERE p_partkey = l_partkey and
 p_brand = 'Brand#23' and
 p_container =  'MED BOX' )
 SELECT SUM(l_extendedprice) / 7.0 as avg_yearly
 FROM WinMagic
 WHERE l_quantity < 0.2 * avg_l_quantity;{code}

> Rewrite "MAX" as "ORDER BY ... LIMIT ..."
> -
>
> Key: CALCITE-1317
> URL: https://issues.apache.org/jira/browse/CALCITE-1317
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Assignee: Julian Hyde
>
> We could optimize 
> {code}
> SELECT *
> FROM emp
> WHERE empno = (SELECT max(empno) FROM emp)
> {code}
> to
> {code}
> SELECT *
> FROM emp
> ORDER BY empno DESC LIMIT 1
> {code}
> (using the fact that {{empno}} is unique and non-NULL). Similarly, we can 
> rewrite
> {code}
> SELECT max(sal)
> FROM emp
> {code}
> to
> {code}
> SELECT sal
> FROM emp
> ORDER BY sal DESC LIMIT 1
> {code}
> (not making any assumptions about whether {{sal}} is unique or allows NULL 
> values) and we can rewrite a query to find the highest paid employee(s) in 
> each department
> {code}
> SELECT *
> FROM emp AS e
> WHERE sal = (
>   SELECT max(sal)
>   FROM emp AS e2
>   WHERE e2.deptno = e.deptno)
> {code}
> as
> {code}
> SELECT deptno, empno, sal
> FROM (
>   SELECT deptno, empno, sal, FIRST_VALUE(sal) OVER w AS topSal
>   FROM emp
>   WINDOW w AS (PARTITION BY deptno ORDER BY sal DESC))
> WHERE sal = topSal
> {code}
> We might benefit from a generalized {{Sort(limit)}} operator that can find 
> the top N within any prefix of the sort key, not just the top N overall.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (CALCITE-1317) Rewrite "MAX" as "ORDER BY ... LIMIT ..."

2016-07-15 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/CALCITE-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380366#comment-15380366
 ] 

Julian Hyde commented on CALCITE-1317:
--

[~maryannxue], [~jamestaylor], You might find this optimization useful, since 
ordered scans of tables' and secondary indexes' primary keys are easy in 
Phoenix.

> Rewrite "MAX" as "ORDER BY ... LIMIT ..."
> -
>
> Key: CALCITE-1317
> URL: https://issues.apache.org/jira/browse/CALCITE-1317
> Project: Calcite
>  Issue Type: Bug
>Reporter: Julian Hyde
>Assignee: Julian Hyde
>
> We could optimize 
> {code}
> SELECT *
> FROM emp
> WHERE empno = (SELECT max(empno) FROM emp)
> {code}
> to
> {code}
> SELECT *
> FROM emp
> ORDER BY empno DESC LIMIT 1
> {code}
> (using the fact that {{empno}} is unique and non-NULL). Similarly, we can 
> rewrite
> {code}
> SELECT max(sal)
> FROM emp
> {code}
> to
> {code}
> SELECT sal
> FROM emp
> ORDER BY sal DESC LIMIT 1
> {code}
> (not making any assumptions about whether {{sal}} is unique or allows NULL 
> values) and we can rewrite a query to find the highest paid employee(s) in 
> each department
> {code}
> SELECT *
> FROM emp AS e
> WHERE sal = (
>   SELECT max(sal)
>   FROM emp AS e2
>   WHERE e2.deptno = e.deptno)
> {code}
> as
> {code}
> SELECT deptno, empno, sal
> FROM (
>   SELECT deptno, empno, sal, FIRST_VALUE(sal) OVER w AS topSal
>   FROM emp
>   WINDOW w AS (PARTITION BY deptno ORDER BY sal DESC))
> WHERE sal = topSal
> {code}
> We might benefit from a generalized {{Sort(limit)}} operator that can find 
> the top N within any prefix of the sort key, not just the top N overall.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)