HeartSaVioR opened a new pull request, #49905: URL: https://github.com/apache/spark/pull/49905
Credit to @HyukjinKwon . This PR is just to ensure I've signed up with other committer before proceeding. ### What changes were proposed in this pull request? This PR is a followup of https://github.com/apache/spark/pull/48149 that proposes to rename `spark.databricks.sql.optimizer.pruneFiltersCanPruneStreamingSubplan` to `spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan` ### Why are the changes needed? For consistent naming. ### Does this PR introduce _any_ user-facing change? Yes, as the config was released in Spark 3.5.4 and we are making change with the config. The config itself is internal and users are not expected to deal with this config. The problem comes from the fact that the config is coupled with how we avoid making breaking change against "the streaming query which had run with prior Spark versions". There are two cases: 1. The streaming query has started from Spark 3.5.4 2. The streaming query has started before Spark 3.5.4, and had migrated to Spark 3.5.4 1> When they start the new query in Spark 3.5.4, there is no offset log to read the static config back from, so the value of config spark.databricks.sql.optimizer.pruneFiltersCanPruneStreamingSubplan will follow the default value, true. This will be written back to offset log to ensure this value to be kept on streaming query lifecycle. When they upgrade to the Spark version which we renamed the config, there is offset log to read the static config back, and there is no entry for spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan, hence we enable backward compatibility mode and put the value false for the config spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan. This could break the query if the rule impacts the query, because the effectiveness of the fix is flipped. 2> When they upgrade their existing streaming query from older version to Spark 3.5.4, there is offset log to read the static config back from, and there is no entry for spark.databricks.sql.optimizer.pruneFiltersCanPruneStreamingSubplan, hence we enable backward compatibility mode and put the value false for the config spark.databricks.sql.optimizer.pruneFiltersCanPruneStreamingSubplan. So the fix is disabled. When they further upgrade this query to the Spark version which we renamed the config, same thing applies and the fix is disabled. So no change. Ideally we should take care of the case 1, but the only way we could ensure this is to make an alias to allow both configs to work as the same, and this will enforce us to keep the problematic config forever. (3.5.4 users don't only bump to the next 3.5.x. They can just bump to 4.0.0 hence we have to keep alias for 4.x as well. That's against the rationale for fixing this.) So the only option seems to tolerate the case 1. Hopefully the failure from that rule is not happening frequently, and also they have to start their query from 3.5.4 (exactly this version) and upgrade, so less and less chance to hit. ### How was this patch tested? CI should test it out. ### Was this patch authored or co-authored using generative AI tooling? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
