Jungtaek Lim created SPARK-45080: ------------------------------------ Summary: Kafka DSv2 streaming source implementation calls planInputPartitions 4 times per microbatch Key: SPARK-45080 URL: https://issues.apache.org/jira/browse/SPARK-45080 Project: Spark Issue Type: Bug Components: Structured Streaming Affects Versions: 4.0.0 Reporter: Jungtaek Lim
I was tracking through method calls for DSv2 streaming source, and figured out planInputPartitions is called 4 times per microbatch. It turned out that multiple calls of planInputPartitions is due to `DataSourceV2ScanExecBase.supportsColumnar`, though it is called through `MicroBatchScanExec.inputPartitions` which is defined as lazy, hence shouldn't happen. The behavior seems to be coupled with catalyst and very hard to figure out why, but with SPARK-44505, we can at least fix this per each data source. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org