AngersZhuuuu commented on a change in pull request #31296:
URL: https://github.com/apache/spark/pull/31296#discussion_r565052466
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
##########
@@ -2007,6 +2007,54 @@ class DatasetSuite extends QueryTest
checkAnswer(withUDF, Row(Row(1), null, null) :: Row(Row(1), null, null) ::
Nil)
}
+
+ test("SPARK-34205: Pipe Dataset") {
+ assume(TestUtils.testCommandAvailable("cat"))
+
+ val nums = spark.range(4)
+ val piped = nums.pipe("cat", (l, printFunc) => printFunc(l.toString)).toDF
Review comment:
Hmmm, just a clarify, we mean we can add an expression (or function?)
like `TRANSFORM`, not convert `TRANSFORM` to it. And we can extract some common
logic with `ScriptTransformationExec`. The usage such as
```
script_transform(input, script, output)
```
input can be a list of input col such as `a, b, c`
out put can a define such as `col1 string, col2 Int`
and the return type is `Array<Struct<col1: String, col2: Int>>` (This
DataType can cover all case, and let user to handle)
Then when execute we can make it just run as default format such as `ROW
FORMAT DELIMIT`
A simple and general way to implement and then we can add it as a DSL.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]