Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/14452
  
    Q47 is a typical query that can benefit from this PR:
    
        with v1 as(
        select i_category, i_brand,
               s_store_name, s_company_name,
               d_year, d_moy,
               sum(ss_sales_price) sum_sales,
               avg(sum(ss_sales_price)) over
                 (partition by i_category, i_brand,
                            s_store_name, s_company_name, d_year)
                 avg_monthly_sales,
               rank() over
                 (partition by i_category, i_brand,
                            s_store_name, s_company_name
                  order by d_year, d_moy) rn
        from item, store_sales, date_dim, store
        where ss_item_sk = i_item_sk and
              ss_sold_date_sk = d_date_sk and
              ss_store_sk = s_store_sk and
              (
                d_year = 1999 or
                ( d_year = 1999-1 and d_moy =12) or
                ( d_year = 1999+1 and d_moy =1)
        group by i_category, i_brand,
                 s_store_name, s_company_name,
                 d_year, d_moy),
        v2 as(
        select v1.i_category, v1.i_brand, v1.s_store_name, v1.s_company_name, 
v1.d_year,
               v1.d_moy, v1.avg_monthly_sales ,v1.sum_sales, v1_lag.sum_sales 
psum,
               v1_lead.sum_sales nsum
        from v1, v1 v1_lag, v1 v1_lead
        where v1.i_category = v1_lag.i_category and
              v1.i_category = v1_lead.i_category and
              v1.i_brand = v1_lag.i_brand and
              v1.i_brand = v1_lead.i_brand and
              v1.s_store_name = v1_lag.s_store_name and
              v1.s_store_name = v1_lead.s_store_name and
              v1.s_company_name = v1_lag.s_company_name and
              v1.s_company_name = v1_lead.s_company_name and
              v1.rn = v1_lag.rn + 1 and
              v1.rn = v1_lead.rn - 1)
        select * from v2
        where  d_year = 1999 and
               avg_monthly_sales > 0 and
               case when avg_monthly_sales > 0 then abs(sum_sales - 
avg_monthly_sales) / avg_monthly_sales else null end > 0.1
        order by sum_sales - avg_monthly_sales, 3
        limit 100
    
    It has 2 CTEs, `v1` and `v2`. `v2` joins three duplicated `v1` plans. There 
are no disjunctive predicates needed to be pushed down to those `v1` plans. So 
the three duplicated `v1` plans have the same output data in the end. The 
physical plan of `v1` looks complicated.
    
    Obviously without this PR we will run three physical plans of `v1`. This PR 
runs the physical plan of `v1` for only one time and cache its output data.
    
    
    
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to