[ 
https://issues.apache.org/jira/browse/SPARK-45080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim reassigned SPARK-45080:
------------------------------------

    Assignee: Jungtaek Lim

> Kafka DSv2 streaming source implementation calls planInputPartitions 4 times 
> per microbatch
> -------------------------------------------------------------------------------------------
>
>                 Key: SPARK-45080
>                 URL: https://issues.apache.org/jira/browse/SPARK-45080
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 4.0.0
>            Reporter: Jungtaek Lim
>            Assignee: Jungtaek Lim
>            Priority: Major
>
> I was tracking through method calls for DSv2 streaming source, and figured 
> out planInputPartitions is called 4 times per microbatch.
> It turned out that multiple calls of planInputPartitions is due to 
> `DataSourceV2ScanExecBase.supportsColumnar`, though it is called through 
> `MicroBatchScanExec.inputPartitions` which is defined as lazy, hence 
> shouldn't happen.
> The behavior seems to be coupled with catalyst and very hard to figure out 
> why, but with SPARK-44505, we can at least fix this per each data source.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to