[GitHub] [spark] AngersZhuuuu commented on a change in pull request #31296: [SPARK-34205][SQL][SS] Add pipe to Dataset to enable Streaming Dataset pipe

GitBox Tue, 26 Jan 2021 22:14:43 -0800


AngersZhuuuu commented on a change in pull request #31296:
URL: https://github.com/apache/spark/pull/31296#discussion_r565052466




##########
File path: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
##########
@@ -2007,6 +2007,54 @@ class DatasetSuite extends QueryTest
 
     checkAnswer(withUDF, Row(Row(1), null, null) :: Row(Row(1), null, null) :: 
Nil)
   }
+
+  test("SPARK-34205: Pipe Dataset") {
+    assume(TestUtils.testCommandAvailable("cat"))
+
+    val nums = spark.range(4)
+    val piped = nums.pipe("cat", (l, printFunc) => printFunc(l.toString)).toDF

Review comment:
       Hmmm, just a clarify, we mean we can add an expression (or function?) 
like `TRANSFORM`, not convert `TRANSFORM` to it. And we can extract some common 
logic with `ScriptTransformationExec`.  The usage such as
   ```
    script_transform(input, script, output)
   ```
   input can be a list of input col such as `a, b, c`
   out put can a define such as `col1 string, col2 Int` 
   and the return type is  `Array<Struct<col1: String, col2: Int>>` (This 
DataType can cover all case, and let user to handle)
   
   Then when execute we can make it just run as default format such as `ROW 
FORMAT DELIMIT`
   A simple and general way to implement and then we can add it as a DSL.
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] AngersZhuuuu commented on a change in pull request #31296: [SPARK-34205][SQL][SS] Add pipe to Dataset to enable Streaming Dataset pipe

Reply via email to