hvanhovell commented on code in PR #41585:
URL: https://github.com/apache/spark/pull/41585#discussion_r1232295162
##########
connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala:
##########
@@ -830,6 +830,66 @@ class Dataset[T] private[sql] (
builder.setJoinType(proto.Join.JoinType.JOIN_TYPE_CROSS)
}
+ /**
+ * Joins this Dataset returning a `Tuple2` for each pair where `condition`
evaluates to
+ * true.
+ *
+ * This is similar to the relation `join` function with one important
difference in the
+ * result schema. Since `joinWith` preserves objects present on either side
of the join, the
+ * result schema is similarly nested into a tuple under the column names
`_1` and `_2`.
+ *
+ * This type of join can be useful both for preserving type-safety with the
original object
+ * types as well as working with relational data where either side of the
join has column
+ * names in common.
+ *
+ * @param other Right side of the join.
+ * @param condition Join expression.
+ * @param joinType Type of join to perform. Default `inner`. Must be one of:
+ * `inner`, `cross`, `outer`, `full`,
`fullouter`,`full_outer`, `left`,
+ * `leftouter`, `left_outer`, `right`, `rightouter`,
`right_outer`.
+ *
+ * @group typedrel
+ * @since 3.5.0
+ */
+ def joinWith[U](other: Dataset[U], condition: Column, joinType: String):
Dataset[(T, U)] = {
Review Comment:
That is the thing. That would mean I also have to add this to the protocol.
It is okay I guess. However, we already have tried that in a different PR (not
the new logical plan), and my main gripe with this is that we are adding a very
very scala specific thing to both the protocol and the analyzer, for something
that can easily be done on the client with a number of primitives (access by
ordinal & explicit references). I could probably live without the latter, if we
make ordinal access a bit more flexible (I need access to the first and last
column of the df).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]