yadavay-amzn commented on PR #56291: URL: https://github.com/apache/spark/pull/56291#issuecomment-4626361610
cc @cloud-fan @yaooqinn — this is a direct follow-up to SPARK-56546, would appreciate a look when convenient. The change extends the existing `SegmentTreeWindowFunctionFrame` to also handle shrinking frames (`... BETWEEN <lower> AND UNBOUNDED FOLLOWING`) by parameterizing it with `ubound: Option[BoundOrdering]` and a `fallbackFactory`; same eligibility gate, same memory accounting, same metrics. The benchmark numbers in the description show the algorithmic gap (8.5× at N=5K growing to 314× at N=50K, and the legacy O(N²) path becomes infeasible at N≥100K). Note: the fork-side CI's `pyspark-pandas` job has an MLflow doctest failure unrelated to this change (filesystem-backend deprecation in the installed mlflow). The relevant SQL / scalastyle / build matrices ran clean locally — 172 tests pass across the new shrinking suite plus all pre-existing segtree and high-level window suites. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
