viirya commented on a change in pull request #31296:
URL: https://github.com/apache/spark/pull/31296#discussion_r564744404



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala
##########
@@ -2007,6 +2007,54 @@ class DatasetSuite extends QueryTest
 
     checkAnswer(withUDF, Row(Row(1), null, null) :: Row(Row(1), null, null) :: 
Nil)
   }
+
+  test("SPARK-34205: Pipe Dataset") {
+    assume(TestUtils.testCommandAvailable("cat"))
+
+    val nums = spark.range(4)
+    val piped = nums.pipe("cat", (l, printFunc) => printFunc(l.toString)).toDF

Review comment:
       Thanks @HeartSaVioR. At least I am glad that the discussion can go 
forward no matter which one you prefer to add.
   
   Honestly I think transform is a weird stuff and it is only for to have pipe 
feature under Hive SQL syntax. I don't like the transform syntax which is 
inconvenient to use and verbose. It is not as flexible as pipe's custom 
print-out function. BTW, for typed dataset, because transform is for untyped, 
so it is bound to its serialization row format. In the early discussion there 
are some comments against that, although it is clarified later pipe doesn't 
suffer from this issue.
   
   If we still cannot get a consensus, maybe I should raise a discussion on dev 
mailing list to decide pipe or transform top-level API should be added.
   
   @xuanyuanking @AngersZhuuuu The SQL syntax of transform "SELECT 
TRANSFORM(...)" is pretty confusing. It looks like expression but actually it 
is an operator, and IMHO you cannot turn it to an expression. If you force it 
to be an expression, you will create some inconsistency and weird cases. 
transform is like pipe and their input/output relation is not 1:1 or N:1 but 
arbitrary.
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to