sarutak commented on PR #55912:
URL: https://github.com/apache/spark/pull/55912#issuecomment-4552548972
@cloud-fan
Window rewrite cannot express important cases
- Tolerance: RANGE BETWEEN <expr> PRECEDING requires a constant;
row-dependent left.t - tolerance is not expressible as a window frame boundary
- Residual pair-correlated predicates: Conditions referencing both left and
right columns cannot be evaluated inside a window frame
Especially, tolerance is commonly used in practice (financial tick matching
within N seconds, IoT sensor correlation within a time window, etc.)
Also, other popular data processing systems have dedicated operator or code
path for AS-OF join.
* ClickHouse
*
https://github.com/ClickHouse/ClickHouse/blob/master/src/Interpreters/RowRefs.h
*
https://github.com/ClickHouse/ClickHouse/blob/master/src/Interpreters/RowRefs.cpp
* DuckDB
*
https://github.com/duckdb/duckdb/blob/main/src/execution/operator/join/physical_asof_join.cpp
* QuestDB
*
https://github.com/questdb/questdb/blob/master/core/src/main/java/io/questdb/griffin/engine/join/AsOfJoinLightRecordCursorFactory.java
* Polars
*
https://github.com/pola-rs/polars/tree/main/crates/polars-ops/src/frame/join/asof
* Pandas
*
https://github.com/pandas-dev/pandas/blob/main/pandas/core/reshape/merge.py
* Snowflake
* https://www.greybeam.ai/blog/snowflake-asof-join
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]