[GitHub] spark pull request #22960: [SPARK-25955][TEST] Porting JSON tests for CSV fu...

2018-11-07 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22960


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22960: [SPARK-25955][TEST] Porting JSON tests for CSV fu...

2018-11-07 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22960#discussion_r231413853
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/CsvFunctionsSuite.scala ---
@@ -86,4 +86,82 @@ class CsvFunctionsSuite extends QueryTest with 
SharedSQLContext {
 
 checkAnswer(df.select(to_csv($"a", options)), Row("26/08/2015 18:00") 
:: Nil)
   }
+
+  test("from_csv uses DDL strings for defining a schema - java") {
+val df = Seq("""1,"haa).toDS()
+checkAnswer(
+  df.select(
+from_csv($"value", lit("a INT, b STRING"), new 
java.util.HashMap[String, String]())),
+  Row(Row(1, "haa")) :: Nil)
+  }
+
+  test("roundtrip to_csv -> from_csv") {
+val df = Seq(Tuple1(Tuple1(1)), Tuple1(null)).toDF("struct")
+val schema = df.schema(0).dataType.asInstanceOf[StructType]
+val options = Map.empty[String, String]
+val readback = df.select(to_csv($"struct").as("csv"))
+  .select(from_csv($"csv", schema, options).as("struct"))
+
+checkAnswer(df, readback)
+  }
+
+  test("roundtrip from_csv -> to_csv") {
+val df = Seq(Some("1"), None).toDF("csv")
+val schema = new StructType().add("a", IntegerType)
+val options = Map.empty[String, String]
+val readback = df.select(from_csv($"csv", schema, 
options).as("struct"))
+  .select(to_csv($"struct").as("csv"))
+
+checkAnswer(df, readback)
+  }
+
+  test("infers schemas of a CSV string and pass to to from_csv") {
+val in = Seq("""0.123456789,987654321,"San Francisco).toDS()
+val options = Map.empty[String, String].asJava
+val out = in.select(from_csv('value, schema_of_csv("0.1,1,a"), 
options) as "parsed")
+val expected = StructType(Seq(StructField(
+  "parsed",
+  StructType(Seq(
+StructField("_c0", DoubleType, true),
+StructField("_c1", IntegerType, true),
+StructField("_c2", StringType, true))
+
+assert(out.schema == expected)
+  }
+
+  test("Support to_csv in SQL") {
--- End diff --

I think we can just get rid of it. I can't imagine both functions are 
specifically broken alone in `selectExpr`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22960: [SPARK-25955][TEST] Porting JSON tests for CSV fu...

2018-11-06 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22960#discussion_r231399775
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/CsvFunctionsSuite.scala ---
@@ -86,4 +86,82 @@ class CsvFunctionsSuite extends QueryTest with 
SharedSQLContext {
 
 checkAnswer(df.select(to_csv($"a", options)), Row("26/08/2015 18:00") 
:: Nil)
   }
+
+  test("from_csv uses DDL strings for defining a schema - java") {
+val df = Seq("""1,"haa).toDS()
+checkAnswer(
+  df.select(
+from_csv($"value", lit("a INT, b STRING"), new 
java.util.HashMap[String, String]())),
+  Row(Row(1, "haa")) :: Nil)
+  }
+
+  test("roundtrip to_csv -> from_csv") {
+val df = Seq(Tuple1(Tuple1(1)), Tuple1(null)).toDF("struct")
+val schema = df.schema(0).dataType.asInstanceOf[StructType]
+val options = Map.empty[String, String]
+val readback = df.select(to_csv($"struct").as("csv"))
+  .select(from_csv($"csv", schema, options).as("struct"))
+
+checkAnswer(df, readback)
+  }
+
+  test("roundtrip from_csv -> to_csv") {
+val df = Seq(Some("1"), None).toDF("csv")
+val schema = new StructType().add("a", IntegerType)
+val options = Map.empty[String, String]
+val readback = df.select(from_csv($"csv", schema, 
options).as("struct"))
+  .select(to_csv($"struct").as("csv"))
+
+checkAnswer(df, readback)
+  }
+
+  test("infers schemas of a CSV string and pass to to from_csv") {
+val in = Seq("""0.123456789,987654321,"San Francisco).toDS()
+val options = Map.empty[String, String].asJava
+val out = in.select(from_csv('value, schema_of_csv("0.1,1,a"), 
options) as "parsed")
+val expected = StructType(Seq(StructField(
+  "parsed",
+  StructType(Seq(
+StructField("_c0", DoubleType, true),
+StructField("_c1", IntegerType, true),
+StructField("_c2", StringType, true))
+
+assert(out.schema == expected)
+  }
+
+  test("Support to_csv in SQL") {
--- End diff --

This is only for double check that the functions are available/(and work) 
from expressions in Scala. Probably we can make the test smaller.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22960: [SPARK-25955][TEST] Porting JSON tests for CSV fu...

2018-11-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22960#discussion_r231380992
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/CsvFunctionsSuite.scala ---
@@ -86,4 +86,82 @@ class CsvFunctionsSuite extends QueryTest with 
SharedSQLContext {
 
 checkAnswer(df.select(to_csv($"a", options)), Row("26/08/2015 18:00") 
:: Nil)
   }
+
+  test("from_csv uses DDL strings for defining a schema - java") {
+val df = Seq("""1,"haa).toDS()
+checkAnswer(
+  df.select(
+from_csv($"value", lit("a INT, b STRING"), new 
java.util.HashMap[String, String]())),
--- End diff --

The only difference is `from_csv` and `from_json`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22960: [SPARK-25955][TEST] Porting JSON tests for CSV fu...

2018-11-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22960#discussion_r231344120
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/CsvFunctionsSuite.scala ---
@@ -86,4 +86,82 @@ class CsvFunctionsSuite extends QueryTest with 
SharedSQLContext {
 
 checkAnswer(df.select(to_csv($"a", options)), Row("26/08/2015 18:00") 
:: Nil)
   }
+
+  test("from_csv uses DDL strings for defining a schema - java") {
+val df = Seq("""1,"haa).toDS()
+checkAnswer(
+  df.select(
+from_csv($"value", lit("a INT, b STRING"), new 
java.util.HashMap[String, String]())),
+  Row(Row(1, "haa")) :: Nil)
+  }
+
+  test("roundtrip to_csv -> from_csv") {
+val df = Seq(Tuple1(Tuple1(1)), Tuple1(null)).toDF("struct")
+val schema = df.schema(0).dataType.asInstanceOf[StructType]
+val options = Map.empty[String, String]
+val readback = df.select(to_csv($"struct").as("csv"))
+  .select(from_csv($"csv", schema, options).as("struct"))
+
+checkAnswer(df, readback)
+  }
+
+  test("roundtrip from_csv -> to_csv") {
+val df = Seq(Some("1"), None).toDF("csv")
+val schema = new StructType().add("a", IntegerType)
+val options = Map.empty[String, String]
+val readback = df.select(from_csv($"csv", schema, 
options).as("struct"))
+  .select(to_csv($"struct").as("csv"))
+
+checkAnswer(df, readback)
+  }
+
+  test("infers schemas of a CSV string and pass to to from_csv") {
+val in = Seq("""0.123456789,987654321,"San Francisco).toDS()
+val options = Map.empty[String, String].asJava
+val out = in.select(from_csv('value, schema_of_csv("0.1,1,a"), 
options) as "parsed")
+val expected = StructType(Seq(StructField(
+  "parsed",
+  StructType(Seq(
+StructField("_c0", DoubleType, true),
+StructField("_c1", IntegerType, true),
+StructField("_c2", StringType, true))
+
+assert(out.schema == expected)
+  }
+
+  test("Support to_csv in SQL") {
--- End diff --

@MaxGekk, wouldn't the tests in `csv-functions.sql` be enough for SQL 
support test?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22960: [SPARK-25955][TEST] Porting JSON tests for CSV fu...

2018-11-06 Thread MaxGekk
GitHub user MaxGekk opened a pull request:

https://github.com/apache/spark/pull/22960

[SPARK-25955][TEST] Porting JSON tests for CSV functions

## What changes were proposed in this pull request?

In the PR, I propose to port existing JSON tests from `JsonFunctionsSuite` 
that are applicable for CSV, and put them to `CsvFunctionsSuite`. In particular:
- roundtrip `from_csv` to `to_csv`, and `to_csv` to `from_csv`
- using `schema_of_csv` in `from_csv`
- Java API `from_csv`
- using `from_csv` and `to_csv` in exprs.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/MaxGekk/spark-1 csv-additional-tests

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22960.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22960


commit 345f2a6b0d480a6e24a398380a49792366fa8c6e
Author: Maxim Gekk 
Date:   2018-11-06T19:36:34Z

Tests - roundtrip from_csv <-> to_csv

commit 606be67e8a422a0e650c2dc29bd724be9b80e411
Author: Maxim Gekk 
Date:   2018-11-06T20:07:44Z

SQL + java tests




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org