Re: [PR] [SPARK-56395][SQL] Add NEAREST BY top-K ranking join (catalyst-side) [spark]

via GitHub Mon, 04 May 2026 13:35:26 -0700


sigmod commented on code in PR #55629:
URL: https://github.com/apache/spark/pull/55629#discussion_r3184311240



##########
sql/core/src/test/resources/sql-tests/inputs/join-nearest-by.sql:
##########
@@ -0,0 +1,155 @@
+-- Test cases for NEAREST BY top-K ranking join.
+
+CREATE VIEW users(user_id, score) AS VALUES (1, 10.0), (2, 20.0), (3, 30.0);
+CREATE VIEW products(product, pscore) AS VALUES ('A', 11.0), ('B', 22.0), 
('C', 5.0);
+
+-- Basic APPROX NEAREST BY SIMILARITY with k = 1

Review Comment:
   Add a `SELECT *` test query? Just want to make sure that the output schema 
doesn't include `qid` or `struct`, or unwanted columns (e.g., the max/min_by 
output column), or wrong column name (e.g., with the `Generator` alias).
   
   > SELECT *
   > FROM users u JOIN products p
   >   APPROX NEAREST 1 BY SIMILARITY -abs(u.score - p.pscore);
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-56395][SQL] Add NEAREST BY top-K ranking join (catalyst-side) [spark]

Reply via email to