AngersZhuuuu commented on a change in pull request #31296:
URL: https://github.com/apache/spark/pull/31296#discussion_r564348823
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
##########
@@ -2007,6 +2007,54 @@ class DatasetSuite extends QueryTest
checkAnswer(withUDF, Row(Row(1), null, null) :: Row(Row(1), null, null) ::
Nil)
}
+
+ test("SPARK-34205: Pipe Dataset") {
+ assume(TestUtils.testCommandAvailable("cat"))
+
+ val nums = spark.range(4)
+ val piped = nums.pipe("cat", (l, printFunc) => printFunc(l.toString)).toDF
Review comment:
> Unlike Window function, it seems to me that we cannot have a query
like "SELECT a, TRANSFORM(...), c FROM ..." or in DSL format like:
>
> ```scala
> df.select($"a", $"b", transform(...) ...)
> ```
>
> But for Window function we can do:
>
> ```scala
> df.select($"a", $"b", lead("key", 1).over(window) ...)
> ```
>
> That being said, in the end it is also `Dataset.transform`, instead of an
expression DSL.
I have thought this problem too, first I want to add transform as a DSL
function, in this way, we need to make an equivalent ScriptTransformation
expression first. We can think that this is just a new expression, or a new
function.
Also add a `Dataset.scriptTransform` could be fine since we can support more
flexible usage case and larger scope. Also important, it has a standard to
follow and it‘s consistent with SQL.
We don’t necessarily need to write as
```
df.select($"a", $"b", transform(xx))
```
we can write it as
```
df.scriptTransform(input, script, output...)
```
If there is an decision, I can start work in this area these days.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]