Github user kunalkhamar commented on a diff in the pull request:
https://github.com/apache/spark/pull/17216#discussion_r106285230
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
---
@@ -380,7 +387,27 @@ class StreamExecution(
logInfo(s"Resuming streaming query, starting with batch $batchId")
currentBatchId = batchId
availableOffsets = nextOffsets.toStreamProgress(sources)
- offsetSeqMetadata =
nextOffsets.metadata.getOrElse(OffsetSeqMetadata())
+
+ // initialize metadata
+ val shufflePartitionsSparkSession: Int =
sparkSession.conf.get(SQLConf.SHUFFLE_PARTITIONS)
+ offsetSeqMetadata = {
+ if (nextOffsets.metadata.isEmpty) {
+ OffsetSeqMetadata(0, 0,
+ Map(SQLConf.SHUFFLE_PARTITIONS.key ->
shufflePartitionsSparkSession.toString))
+ } else {
+ val metadata = nextOffsets.metadata.get
+ val shufflePartitionsToUse =
metadata.conf.getOrElse(SQLConf.SHUFFLE_PARTITIONS.key, {
+ // For backward compatibility, if # partitions was not
recorded in the offset log,
+ // then ensure it is not missing. The new value is picked up
from the conf.
+ logDebug("Number of shuffle partitions from previous run not
found in checkpoint. "
--- End diff --
Changed to log warning.
Rechecked the semantics, it works as expected and warning only printed at
time of first upgrade.
Once we restart query from a v2.1 checkpoint and then stop it, any new
offsets written out will contain num shuffle partitions. Any future restarts
will read these new offsets in
`StreamExecution.populateStartOffsets->offsetLog.getLatest` and pick up the
recorded num shuffle partitions.
Useful to note for future reference that we do not change the old offset
files to contain num shuffle partitions, the semantics are correct because of
call to `offsetLog.getLatest`.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]