Github user tdas commented on a diff in the pull request:
https://github.com/apache/spark/pull/17216#discussion_r106058765
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
---
@@ -380,7 +387,27 @@ class StreamExecution(
logInfo(s"Resuming streaming query, starting with batch $batchId")
currentBatchId = batchId
availableOffsets = nextOffsets.toStreamProgress(sources)
- offsetSeqMetadata =
nextOffsets.metadata.getOrElse(OffsetSeqMetadata())
+
+ // initialize metadata
+ val shufflePartitionsSparkSession: Int =
sparkSession.conf.get(SQLConf.SHUFFLE_PARTITIONS)
+ offsetSeqMetadata = {
+ if (nextOffsets.metadata.isEmpty) {
+ OffsetSeqMetadata(0, 0,
+ Map(SQLConf.SHUFFLE_PARTITIONS.key ->
shufflePartitionsSparkSession.toString))
+ } else {
+ val metadata = nextOffsets.metadata.get
+ val shufflePartitionsToUse =
metadata.conf.getOrElse(SQLConf.SHUFFLE_PARTITIONS.key, {
+ // For backward compatibility, if # partitions was not
recorded in the offset log,
+ // then ensure it is not missing. The new value is picked up
from the conf.
+ logDebug("Number of shuffle partitions from previous run not
found in checkpoint. "
--- End diff --
Make this a log warning. So that we can debug. And it should be printed
only once, at the time of upgrading for the first time.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]