[GitHub] spark pull request #22960: [SPARK-25955][TEST] Porting JSON tests for CSV fu...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22960 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22960: [SPARK-25955][TEST] Porting JSON tests for CSV fu...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22960#discussion_r231413853 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CsvFunctionsSuite.scala --- @@ -86,4 +86,82 @@ class CsvFunctionsSuite extends QueryTest with SharedSQLContext { checkAnswer(df.select(to_csv($"a", options)), Row("26/08/2015 18:00") :: Nil) } + + test("from_csv uses DDL strings for defining a schema - java") { +val df = Seq("""1,"haa).toDS() +checkAnswer( + df.select( +from_csv($"value", lit("a INT, b STRING"), new java.util.HashMap[String, String]())), + Row(Row(1, "haa")) :: Nil) + } + + test("roundtrip to_csv -> from_csv") { +val df = Seq(Tuple1(Tuple1(1)), Tuple1(null)).toDF("struct") +val schema = df.schema(0).dataType.asInstanceOf[StructType] +val options = Map.empty[String, String] +val readback = df.select(to_csv($"struct").as("csv")) + .select(from_csv($"csv", schema, options).as("struct")) + +checkAnswer(df, readback) + } + + test("roundtrip from_csv -> to_csv") { +val df = Seq(Some("1"), None).toDF("csv") +val schema = new StructType().add("a", IntegerType) +val options = Map.empty[String, String] +val readback = df.select(from_csv($"csv", schema, options).as("struct")) + .select(to_csv($"struct").as("csv")) + +checkAnswer(df, readback) + } + + test("infers schemas of a CSV string and pass to to from_csv") { +val in = Seq("""0.123456789,987654321,"San Francisco).toDS() +val options = Map.empty[String, String].asJava +val out = in.select(from_csv('value, schema_of_csv("0.1,1,a"), options) as "parsed") +val expected = StructType(Seq(StructField( + "parsed", + StructType(Seq( +StructField("_c0", DoubleType, true), +StructField("_c1", IntegerType, true), +StructField("_c2", StringType, true)) + +assert(out.schema == expected) + } + + test("Support to_csv in SQL") { --- End diff -- I think we can just get rid of it. I can't imagine both functions are specifically broken alone in `selectExpr`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22960: [SPARK-25955][TEST] Porting JSON tests for CSV fu...
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/22960#discussion_r231399775 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CsvFunctionsSuite.scala --- @@ -86,4 +86,82 @@ class CsvFunctionsSuite extends QueryTest with SharedSQLContext { checkAnswer(df.select(to_csv($"a", options)), Row("26/08/2015 18:00") :: Nil) } + + test("from_csv uses DDL strings for defining a schema - java") { +val df = Seq("""1,"haa).toDS() +checkAnswer( + df.select( +from_csv($"value", lit("a INT, b STRING"), new java.util.HashMap[String, String]())), + Row(Row(1, "haa")) :: Nil) + } + + test("roundtrip to_csv -> from_csv") { +val df = Seq(Tuple1(Tuple1(1)), Tuple1(null)).toDF("struct") +val schema = df.schema(0).dataType.asInstanceOf[StructType] +val options = Map.empty[String, String] +val readback = df.select(to_csv($"struct").as("csv")) + .select(from_csv($"csv", schema, options).as("struct")) + +checkAnswer(df, readback) + } + + test("roundtrip from_csv -> to_csv") { +val df = Seq(Some("1"), None).toDF("csv") +val schema = new StructType().add("a", IntegerType) +val options = Map.empty[String, String] +val readback = df.select(from_csv($"csv", schema, options).as("struct")) + .select(to_csv($"struct").as("csv")) + +checkAnswer(df, readback) + } + + test("infers schemas of a CSV string and pass to to from_csv") { +val in = Seq("""0.123456789,987654321,"San Francisco).toDS() +val options = Map.empty[String, String].asJava +val out = in.select(from_csv('value, schema_of_csv("0.1,1,a"), options) as "parsed") +val expected = StructType(Seq(StructField( + "parsed", + StructType(Seq( +StructField("_c0", DoubleType, true), +StructField("_c1", IntegerType, true), +StructField("_c2", StringType, true)) + +assert(out.schema == expected) + } + + test("Support to_csv in SQL") { --- End diff -- This is only for double check that the functions are available/(and work) from expressions in Scala. Probably we can make the test smaller. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22960: [SPARK-25955][TEST] Porting JSON tests for CSV fu...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22960#discussion_r231380992 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CsvFunctionsSuite.scala --- @@ -86,4 +86,82 @@ class CsvFunctionsSuite extends QueryTest with SharedSQLContext { checkAnswer(df.select(to_csv($"a", options)), Row("26/08/2015 18:00") :: Nil) } + + test("from_csv uses DDL strings for defining a schema - java") { +val df = Seq("""1,"haa).toDS() +checkAnswer( + df.select( +from_csv($"value", lit("a INT, b STRING"), new java.util.HashMap[String, String]())), --- End diff -- The only difference is `from_csv` and `from_json`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22960: [SPARK-25955][TEST] Porting JSON tests for CSV fu...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22960#discussion_r231344120 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/CsvFunctionsSuite.scala --- @@ -86,4 +86,82 @@ class CsvFunctionsSuite extends QueryTest with SharedSQLContext { checkAnswer(df.select(to_csv($"a", options)), Row("26/08/2015 18:00") :: Nil) } + + test("from_csv uses DDL strings for defining a schema - java") { +val df = Seq("""1,"haa).toDS() +checkAnswer( + df.select( +from_csv($"value", lit("a INT, b STRING"), new java.util.HashMap[String, String]())), + Row(Row(1, "haa")) :: Nil) + } + + test("roundtrip to_csv -> from_csv") { +val df = Seq(Tuple1(Tuple1(1)), Tuple1(null)).toDF("struct") +val schema = df.schema(0).dataType.asInstanceOf[StructType] +val options = Map.empty[String, String] +val readback = df.select(to_csv($"struct").as("csv")) + .select(from_csv($"csv", schema, options).as("struct")) + +checkAnswer(df, readback) + } + + test("roundtrip from_csv -> to_csv") { +val df = Seq(Some("1"), None).toDF("csv") +val schema = new StructType().add("a", IntegerType) +val options = Map.empty[String, String] +val readback = df.select(from_csv($"csv", schema, options).as("struct")) + .select(to_csv($"struct").as("csv")) + +checkAnswer(df, readback) + } + + test("infers schemas of a CSV string and pass to to from_csv") { +val in = Seq("""0.123456789,987654321,"San Francisco).toDS() +val options = Map.empty[String, String].asJava +val out = in.select(from_csv('value, schema_of_csv("0.1,1,a"), options) as "parsed") +val expected = StructType(Seq(StructField( + "parsed", + StructType(Seq( +StructField("_c0", DoubleType, true), +StructField("_c1", IntegerType, true), +StructField("_c2", StringType, true)) + +assert(out.schema == expected) + } + + test("Support to_csv in SQL") { --- End diff -- @MaxGekk, wouldn't the tests in `csv-functions.sql` be enough for SQL support test? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22960: [SPARK-25955][TEST] Porting JSON tests for CSV fu...
GitHub user MaxGekk opened a pull request: https://github.com/apache/spark/pull/22960 [SPARK-25955][TEST] Porting JSON tests for CSV functions ## What changes were proposed in this pull request? In the PR, I propose to port existing JSON tests from `JsonFunctionsSuite` that are applicable for CSV, and put them to `CsvFunctionsSuite`. In particular: - roundtrip `from_csv` to `to_csv`, and `to_csv` to `from_csv` - using `schema_of_csv` in `from_csv` - Java API `from_csv` - using `from_csv` and `to_csv` in exprs. You can merge this pull request into a Git repository by running: $ git pull https://github.com/MaxGekk/spark-1 csv-additional-tests Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22960.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22960 commit 345f2a6b0d480a6e24a398380a49792366fa8c6e Author: Maxim Gekk Date: 2018-11-06T19:36:34Z Tests - roundtrip from_csv <-> to_csv commit 606be67e8a422a0e650c2dc29bd724be9b80e411 Author: Maxim Gekk Date: 2018-11-06T20:07:44Z SQL + java tests --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org