[PR] [SPARK-51172][SS] Rename to spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan [spark]

via GitHub Wed, 12 Feb 2025 00:25:09 -0800


HeartSaVioR opened a new pull request, #49905:
URL: https://github.com/apache/spark/pull/49905


   Credit to @HyukjinKwon . This PR is just to ensure I've signed up with other 
committer before proceeding.
   
   ### What changes were proposed in this pull request?
   
   This PR is a followup of https://github.com/apache/spark/pull/48149 that 
proposes to rename 
`spark.databricks.sql.optimizer.pruneFiltersCanPruneStreamingSubplan` to 
`spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan`
   
   ### Why are the changes needed?
   
   For consistent naming.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, as the config was released in Spark 3.5.4 and we are making change with 
the config.
   
   The config itself is internal and users are not expected to deal with this 
config. The problem comes from the fact that the config is coupled with how we 
avoid making breaking change against "the streaming query which had run with 
prior Spark versions".
   
   There are two cases:
   
   1. The streaming query has started from Spark 3.5.4
   2. The streaming query has started before Spark 3.5.4, and had migrated to 
Spark 3.5.4
   
   1>
   When they start the new query in Spark 3.5.4, there is no offset log to read 
the static config back from, so the value of config 
spark.databricks.sql.optimizer.pruneFiltersCanPruneStreamingSubplan will follow 
the default value, true.
   
   This will be written back to offset log to ensure this value to be kept on 
streaming query lifecycle.
   
   When they upgrade to the Spark version which we renamed the config, there is 
offset log to read the static config back, and there is no entry for 
spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan, hence we enable 
backward compatibility mode and put the value false for the config 
spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan.
   
   This could break the query if the rule impacts the query, because the 
effectiveness of the fix is flipped.
   
   2>
   When they upgrade their existing streaming query from older version to Spark 
3.5.4, there is offset log to read the static config back from, and there is no 
entry for spark.databricks.sql.optimizer.pruneFiltersCanPruneStreamingSubplan, 
hence we enable backward compatibility mode and put the value false for the 
config spark.databricks.sql.optimizer.pruneFiltersCanPruneStreamingSubplan. So 
the fix is disabled.
   
   When they further upgrade this query to the Spark version which we renamed 
the config, same thing applies and the fix is disabled. So no change.
   
   Ideally we should take care of the case 1, but the only way we could ensure 
this is to make an alias to allow both configs to work as the same, and this 
will enforce us to keep the problematic config forever. (3.5.4 users don't only 
bump to the next 3.5.x. They can just bump to 4.0.0 hence we have to keep alias 
for 4.x as well. That's against the rationale for fixing this.)
   
   So the only option seems to tolerate the case 1. Hopefully the failure from 
that rule is not happening frequently, and also they have to start their query 
from 3.5.4 (exactly this version) and upgrade, so less and less chance to hit.
   
   ### How was this patch tested?
   
   CI should test it out.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [SPARK-51172][SS] Rename to spark.sql.optimizer.pruneFiltersCanPruneStreamingSubplan [spark]

Reply via email to