Till Rohrmann created FLINK-1915: ------------------------------------ Summary: Faulty plan selection by optimizer Key: FLINK-1915 URL: https://issues.apache.org/jira/browse/FLINK-1915 Project: Flink Issue Type: Bug Reporter: Till Rohrmann Priority: Minor
The optimizer selects for certain jobs a sub-optimal execution plan. For example, the {{WebLogAnalysis}} example job contains a coGroup input which consists of a {{Filter}} and a subsequent {{Projection}}. The optimizer inserts a hash partitioning between the filter and the mapper (projection) and a sorting after the projection. It would be more efficient if the hash partitioning would have been done after the projection, because the data is smaller at this stage. I could observe a similar behaviour for a larger job, where the hash partitioning was executed before a filter operation which was then used as input for a join operator. I suspect that the optimizer considers the two plans (hash partitioning before the filter and after the filter) as equivalent in the absence of proper size estimates. However, executing the hash partitioning after the filter should always be more efficient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)