Till Rohrmann created FLINK-1915:
------------------------------------

             Summary: Faulty plan selection by optimizer
                 Key: FLINK-1915
                 URL: https://issues.apache.org/jira/browse/FLINK-1915
             Project: Flink
          Issue Type: Bug
            Reporter: Till Rohrmann
            Priority: Minor


The optimizer selects for certain jobs a sub-optimal execution plan. 

For example, the {{WebLogAnalysis}} example job contains a coGroup input which 
consists of a {{Filter}} and a subsequent {{Projection}}. The optimizer inserts 
a hash partitioning between the filter and the mapper (projection) and a 
sorting after the projection. It would be more efficient if the hash 
partitioning would have been done after the projection, because the data is 
smaller at this stage.

I could observe a similar behaviour for a larger job, where the hash 
partitioning was executed before a filter operation which was then used as 
input for a join operator. I suspect that the optimizer considers the two plans 
(hash partitioning before the filter and after the filter) as equivalent in the 
absence of proper size estimates. However, executing the hash partitioning 
after the filter should always be more efficient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to