[ https://issues.apache.org/jira/browse/SPARK-45080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jungtaek Lim reassigned SPARK-45080: ------------------------------------ Assignee: Jungtaek Lim > Kafka DSv2 streaming source implementation calls planInputPartitions 4 times > per microbatch > ------------------------------------------------------------------------------------------- > > Key: SPARK-45080 > URL: https://issues.apache.org/jira/browse/SPARK-45080 > Project: Spark > Issue Type: Bug > Components: Structured Streaming > Affects Versions: 4.0.0 > Reporter: Jungtaek Lim > Assignee: Jungtaek Lim > Priority: Major > > I was tracking through method calls for DSv2 streaming source, and figured > out planInputPartitions is called 4 times per microbatch. > It turned out that multiple calls of planInputPartitions is due to > `DataSourceV2ScanExecBase.supportsColumnar`, though it is called through > `MicroBatchScanExec.inputPartitions` which is defined as lazy, hence > shouldn't happen. > The behavior seems to be coupled with catalyst and very hard to figure out > why, but with SPARK-44505, we can at least fix this per each data source. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org