Jungtaek Lim created SPARK-45080:
------------------------------------

             Summary: Kafka DSv2 streaming source implementation calls 
planInputPartitions 4 times per microbatch
                 Key: SPARK-45080
                 URL: https://issues.apache.org/jira/browse/SPARK-45080
             Project: Spark
          Issue Type: Bug
          Components: Structured Streaming
    Affects Versions: 4.0.0
            Reporter: Jungtaek Lim


I was tracking through method calls for DSv2 streaming source, and figured out 
planInputPartitions is called 4 times per microbatch.

It turned out that multiple calls of planInputPartitions is due to 
`DataSourceV2ScanExecBase.supportsColumnar`, though it is called through 
`MicroBatchScanExec.inputPartitions` which is defined as lazy, hence shouldn't 
happen.

The behavior seems to be coupled with catalyst and very hard to figure out why, 
but with SPARK-44505, we can at least fix this per each data source.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to