[Spark Streaming]: Why planInputPartitions is called multiple times for each micro-batch in Spark 3?

2022-04-13 Thread Hussain, Saghir
Hi All

While upgrading our custom streaming data source from Spark 2.4.5 to Spark 
3.2.1, we observed that the planInputPartitions() method in MicroBatchStream is 
being called multiple times(4 in our case) for each micro-batch in Spark 3.

The Apache Spark documentation also says that :
The method planInputPartitions will be called multiple times, to launch one 
Spark job for each micro-batch in this data 
stream.

What is the reason for this?

Thanks & Regards,
Saghir Hussain


Why planInputPartitions is called multiple times in a micro-batch?

2021-07-12 Thread kineret M
Hi,

I'm developing a new Spark connector using data source v2 API (spark 3.1.1).
I noticed that the planInputPartitions method (in MicroBatchStream) is
called twice every micro-batch.

What the motivation/reason is?

Thanks,
Kineret