viirya commented on a change in pull request #31296:
URL: https://github.com/apache/spark/pull/31296#discussion_r564744404
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
##########
@@ -2007,6 +2007,54 @@ class DatasetSuite extends QueryTest
checkAnswer(withUDF, Row(Row(1), null, null) :: Row(Row(1), null, null) ::
Nil)
}
+
+ test("SPARK-34205: Pipe Dataset") {
+ assume(TestUtils.testCommandAvailable("cat"))
+
+ val nums = spark.range(4)
+ val piped = nums.pipe("cat", (l, printFunc) => printFunc(l.toString)).toDF
Review comment:
Thanks @HeartSaVioR. At least I am glad that the discussion can go
forward no matter which one you prefer to add.
Honestly I think transform is a weird stuff and it is only for to have pipe
feature under SQL syntax. I don't like the transform syntax which is
inconvenient to use and verbose. It is not as flexible as pipe's custom
print-out function. BTW, for typed dataset, because transform is for untyped,
so it is bound to its serialization row format. In the early discussion there
are some comments against that, although it is clarified later pipe doesn't
suffer from this issue.
If we still cannot get a consensus, maybe I should raise a discussion on dev
mailing list to decide pipe or transform top-level API should be added.
@xuanyuanking @AngersZhuuuu The SQL syntax of transform "SELECT
TRANSFORM(...)" is pretty confusing. It looks like expression but actually it
is an operator, and IMHO you cannot turn it to an expression. If you force it
to be an expression, you will create some inconsistency and weird cases.
transform is like pipe and their input/output relation is not 1:1 or N:1 but
arbitrary.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]