xuanyuanking commented on a change in pull request #31296:
URL: https://github.com/apache/spark/pull/31296#discussion_r564507427
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
##########
@@ -2007,6 +2007,54 @@ class DatasetSuite extends QueryTest
checkAnswer(withUDF, Row(Row(1), null, null) :: Row(Row(1), null, null) ::
Nil)
}
+
+ test("SPARK-34205: Pipe Dataset") {
+ assume(TestUtils.testCommandAvailable("cat"))
+
+ val nums = spark.range(4)
+ val piped = nums.pipe("cat", (l, printFunc) => printFunc(l.toString)).toDF
Review comment:
```
What's the top-level API, you mean Plan node like CollectSet or other thing?
```
@AngersZhuuuu The top-level API here means the new API added in Dataset.
```
Can you share how to make transformation as an expression? I don't think It
is an expression at all.
```
@viirya Sure. I followed the comment "`I have thought this problem too,
first I want to add transform as a DSL function, in this way, we need to make
an equivalent ScriptTransformation expression first. We can think that this is
just a new expression, or a new function`" from @AngersZhuuuu. To add a new
expression `ScriptTransformationExpression` for `ScriptTransformation` and turn
to `ScriptTransformationExec`.
Two limitations here might need more discussion:
- The script transformation may produce more than one row for a single row,
so it cannot use together with other expressions.
- The script in hive transformation is partition-based, but if we make it an
expression, it becomes row based.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]