alamb opened a new issue, #4967: URL: https://github.com/apache/arrow-datafusion/issues/4967
**Describe the bug** We previously had a plan like this (where the RepartitionExec was added prior to a filter in order to increase parallelism). However, after upgrading DataFusion, the RepartitionExec is no longer there. I actually think this is a slightly worse plan as now the filter can not be done in parallel ``` FilterExec: tag@2 = A RepartitionExec: partitioning=RoundRobinBatch(4) <--- This RepartitionExec has been removed DeduplicateExec: [tag@2 ASC,time@3 ASC] SortPreservingMergeExec: [tag@2 ASC,time@3 ASC] UnionExec ParquetExec: limit=None, partitions={1 group: [[1/1/1/1/00000000-0000-0000-0000-000000000000.parquet]]}, predicate=tag = Dictionary(Int32, Utf8("A")), pruning_predicate=tag_min@0 <= A AND A <= tag_max@1, output_ordering=[tag@2 ASC, time@3 ASC], projection=[bar, foo, tag, time] | SortExec: [tag@2 ASC,time@3 ASC]. RecordBatchesExec: batches_groups=1 batches=1 ``` **To Reproduce** I am working on a reproducer **Expected behavior** A `RepartitionExec` should be added if it will increase parallelism for filtering **Additional context** We found this while upgrading IOx: https://github.com/influxdata/influxdb_iox/pull/6603 -- see https://github.com/influxdata/influxdb_iox/pull/6603/files#r1072606494 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org