Re: [PR] [SPARK-56395][SQL] Add NEAREST BY top-K ranking join (catalyst-side) [spark]

via GitHub Mon, 04 May 2026 11:42:34 -0700


dtenedor commented on code in PR #55629:
URL: https://github.com/apache/spark/pull/55629#discussion_r3183430584



##########
docs/sql-ref-syntax-qry-select-join.md:
##########
@@ -53,6 +53,30 @@ relation { [ join_type ] JOIN [ LATERAL ] relation [ 
join_criteria ] | NATURAL j
 
     Specifies an expression with a return type of boolean.
 
+* **nearest_by_clause**
+
+    Specifies a nearest-by top-K ranking join. For each row on the left (query 
side), returns up to `num_results` rows from the right (base side), ranked by 
`ranking_expression`. Only `INNER` (the default) and `LEFT OUTER` join types 
are supported with this clause.
+
+    **Syntax:** `{ APPROX | EXACT } NEAREST [ num_results ] BY { DISTANCE | 
SIMILARITY } ranking_expression`
+
+    `APPROX | EXACT`
+
+    Controls the search algorithm contract. `APPROX` allows the optimizer to 
use faster approximate strategies (such as indexed nearest-neighbor search when 
available). `EXACT` forces brute-force evaluation and requires 
`ranking_expression` to be deterministic.
+
+    `num_results`
+
+    A positive integer literal between 1 and 100000 that limits the number of 
matches per left row. Defaults to 1 when omitted.

Review Comment:
   Why this limit? Is it controlled by a config?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-56395][SQL] Add NEAREST BY top-K ranking join (catalyst-side) [spark]

Reply via email to