subject:"\[GitHub\] spark pull request #18875\: \[SPARK\-21513\]\[SQL\] Allow UDF to

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-12 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18875


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-12 Thread goldmedal

Github user goldmedal commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r138501335
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -26,20 +26,50 @@ import 
org.apache.spark.sql.catalyst.expressions.SpecializedGetters
 import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, 
MapData}
 import org.apache.spark.sql.types._
 
+/**
+ * `JackGenerator` can only be initialized with a `StructType` or a 
`MapType`.
+ * Once it is initialized with `StructType`, it can be used to write out a 
struct or an array of
+ * struct. Once it is initialized with `MapType`, it can be used to write 
out a map or an array
+ * of map. An exception will be thrown if trying to write out a struct if 
it is initialized with
+ * a `MapType`, and vice verse.
+ */
 private[sql] class JacksonGenerator(
-schema: StructType,
+dataType: DataType,
 writer: Writer,
 options: JSONOptions) {
   // A `ValueWriter` is responsible for writing a field of an 
`InternalRow` to appropriate
   // JSON data. Here we are using `SpecializedGetters` rather than 
`InternalRow` so that
   // we can directly access data in `ArrayData` without the help of 
`SpecificMutableRow`.
   private type ValueWriter = (SpecializedGetters, Int) => Unit
 
+  // `JackGenerator` can only be initialized with a `StructType` or a 
`MapType`.
+  require(dataType.isInstanceOf[StructType] | 
dataType.isInstanceOf[MapType],
--- End diff --

oh.  Yes, you're right. This is my mistake. :(


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-12 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r138500447
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -26,20 +26,50 @@ import 
org.apache.spark.sql.catalyst.expressions.SpecializedGetters
 import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, 
MapData}
 import org.apache.spark.sql.types._
 
+/**
+ * `JackGenerator` can only be initialized with a `StructType` or a 
`MapType`.
+ * Once it is initialized with `StructType`, it can be used to write out a 
struct or an array of
+ * struct. Once it is initialized with `MapType`, it can be used to write 
out a map or an array
+ * of map. An exception will be thrown if trying to write out a struct if 
it is initialized with
+ * a `MapType`, and vice verse.
+ */
 private[sql] class JacksonGenerator(
-schema: StructType,
+dataType: DataType,
 writer: Writer,
 options: JSONOptions) {
   // A `ValueWriter` is responsible for writing a field of an 
`InternalRow` to appropriate
   // JSON data. Here we are using `SpecializedGetters` rather than 
`InternalRow` so that
   // we can directly access data in `ArrayData` without the help of 
`SpecificMutableRow`.
   private type ValueWriter = (SpecializedGetters, Int) => Unit
 
+  // `JackGenerator` can only be initialized with a `StructType` or a 
`MapType`.
+  require(dataType.isInstanceOf[StructType] | 
dataType.isInstanceOf[MapType],
--- End diff --

Hm.. I think we should use `||` for short circuiting next time.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-12 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r138500520
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -617,6 +618,14 @@ case class JsonToStructs(
{"time":"26/08/2015"}
   > SELECT _FUNC_(array(named_struct('a', 1, 'b', 2));
[{"a":1,"b":2}]
+  > SELECT _FUNC_(map('a',named_struct('b',1)));
--- End diff --

little nit: `... ',n ...` -> `... ', n ...`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-11 Thread goldmedal

Github user goldmedal commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r138237482
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -193,14 +223,35 @@ private[sql] class JacksonGenerator(
*
* @param row The row to convert
*/
-  def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, 
rootFieldWriters))
+  def write(row: InternalRow): Unit = {
+writeObject(writeFields(
+  fieldWriters = rootFieldWriters,
+  row = row,
+  schema = dataType.asInstanceOf[StructType]))
+  }
+
+
+  /**
+   * Transforms multiple `InternalRow`s or `MapData`s to JSON array using 
Jackson
+   *
+   * @param array The array of rows or maps to convert
+   */
+  def write(array: ArrayData): Unit = writeArray(writeArrayData(
+fieldWriter = arrElementWriter,
+array = array
+  ))
 
   /**
-   * Transforms multiple `InternalRow`s to JSON array using Jackson
+   * Transforms a single `MapData` to JSON object using Jackson
*
-   * @param array The array of rows to convert
+   * @param map a map to convert
*/
-  def write(array: ArrayData): Unit = writeArray(writeArrayData(array, 
arrElementWriter))
+  def write(map: MapData): Unit = {
+writeObject(writeMapData(
+  fieldWriter = mapElementWriter,
--- End diff --

Ok Thanks for review :) I'll update it tonight.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-11 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r138233095
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/json/JacksonGeneratorSuite.scala
 ---
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.json
+
+import java.io.CharArrayWriter
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, 
DateTimeUtils, GenericArrayData}
+import org.apache.spark.sql.types._
+
+class JacksonGeneratorSuite extends SparkFunSuite {
+
+  val gmtId = DateTimeUtils.TimeZoneGMT.getID
+  val option = new JSONOptions(Map.empty, gmtId)
+
+  test("initial with StructType and write out a row") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = InternalRow(1)
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """{"a":1}""")
+  }
+
+  test("initial with StructType and write out rows") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = new GenericArrayData(InternalRow(1) :: InternalRow(2) :: 
Nil)
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """[{"a":1},{"a":2}]""")
+  }
+
+  test("initial with StructType and write out an array with single empty 
row") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = new GenericArrayData(InternalRow(null) :: Nil)
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """[{}]""")
+  }
+
+  test("initial with StructType and write out an empty array") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = new GenericArrayData(Nil)
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """[]""")
+  }
+
+  test("initial with Map and write out a map data") {
+val dataType = MapType(StringType, IntegerType)
+val input = ArrayBasedMapData(Map("a" -> 1))
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """{"a":1}""")
+  }
+
+  test("initial with Map and write out an array of maps") {
+val dataType = MapType(StringType, IntegerType)
+val input = new GenericArrayData(
+  ArrayBasedMapData(Map("a" -> 1)) :: ArrayBasedMapData(Map("b" -> 2)) 
:: Nil)
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """[{"a":1},{"b":2}]""")
+  }
+
+  test("error handling: initial with StructType but error calling write a 
map") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = ArrayBasedMapData(Map("a" -> 1))
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+intercept[Exception] {
+  gen.write(input)
+}
+  }
+
+  test("error handling: initial with StructType but error calling write an 
array of maps") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = new GenericArrayData(
+  ArrayBasedMapData(Map("a" -> 1)) :: ArrayBasedMapData(Map("b" -> 2)) 
:: Nil)
+val writer = new

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-11 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r138232491
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -193,14 +223,35 @@ private[sql] class JacksonGenerator(
*
* @param row The row to convert
*/
-  def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, 
rootFieldWriters))
+  def write(row: InternalRow): Unit = {
+writeObject(writeFields(
+  fieldWriters = rootFieldWriters,
+  row = row,
+  schema = dataType.asInstanceOf[StructType]))
+  }
+
+
+  /**
+   * Transforms multiple `InternalRow`s or `MapData`s to JSON array using 
Jackson
+   *
+   * @param array The array of rows or maps to convert
+   */
+  def write(array: ArrayData): Unit = writeArray(writeArrayData(
+fieldWriter = arrElementWriter,
+array = array
+  ))
 
   /**
-   * Transforms multiple `InternalRow`s to JSON array using Jackson
+   * Transforms a single `MapData` to JSON object using Jackson
*
-   * @param array The array of rows to convert
+   * @param map a map to convert
*/
-  def write(array: ArrayData): Unit = writeArray(writeArrayData(array, 
arrElementWriter))
+  def write(map: MapData): Unit = {
+writeObject(writeMapData(
+  fieldWriter = mapElementWriter,
--- End diff --

And also add a similar comment in `write(row: InternalRow)`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-11 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r138228117
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -193,14 +223,35 @@ private[sql] class JacksonGenerator(
*
* @param row The row to convert
*/
-  def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, 
rootFieldWriters))
+  def write(row: InternalRow): Unit = {
+writeObject(writeFields(
+  fieldWriters = rootFieldWriters,
+  row = row,
+  schema = dataType.asInstanceOf[StructType]))
+  }
+
+
+  /**
+   * Transforms multiple `InternalRow`s or `MapData`s to JSON array using 
Jackson
+   *
+   * @param array The array of rows or maps to convert
+   */
+  def write(array: ArrayData): Unit = writeArray(writeArrayData(
+fieldWriter = arrElementWriter,
+array = array
+  ))
--- End diff --

Let's change this one back to:

```scala
def write(array: ArrayData): Unit = writeArray(writeArrayData(array, 
arrElementWriter))
```




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-11 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r138228185
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -193,14 +223,35 @@ private[sql] class JacksonGenerator(
*
* @param row The row to convert
*/
-  def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, 
rootFieldWriters))
+  def write(row: InternalRow): Unit = {
+writeObject(writeFields(
+  fieldWriters = rootFieldWriters,
+  row = row,
+  schema = dataType.asInstanceOf[StructType]))
+  }
+
+
+  /**
+   * Transforms multiple `InternalRow`s or `MapData`s to JSON array using 
Jackson
+   *
+   * @param array The array of rows or maps to convert
+   */
+  def write(array: ArrayData): Unit = writeArray(writeArrayData(
+fieldWriter = arrElementWriter,
+array = array
+  ))
 
   /**
-   * Transforms multiple `InternalRow`s to JSON array using Jackson
+   * Transforms a single `MapData` to JSON object using Jackson
*
-   * @param array The array of rows to convert
+   * @param map a map to convert
*/
-  def write(array: ArrayData): Unit = writeArray(writeArrayData(array, 
arrElementWriter))
+  def write(map: MapData): Unit = {
+writeObject(writeMapData(
+  fieldWriter = mapElementWriter,
--- End diff --

Let's add a small comment saying this actually triggers the proper 
validation.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-11 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r138228111
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -193,14 +223,35 @@ private[sql] class JacksonGenerator(
*
* @param row The row to convert
*/
-  def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, 
rootFieldWriters))
+  def write(row: InternalRow): Unit = {
+writeObject(writeFields(
+  fieldWriters = rootFieldWriters,
+  row = row,
+  schema = dataType.asInstanceOf[StructType]))
+  }
+
+
+  /**
+   * Transforms multiple `InternalRow`s or `MapData`s to JSON array using 
Jackson
+   *
+   * @param array The array of rows or maps to convert
+   */
+  def write(array: ArrayData): Unit = writeArray(writeArrayData(
+fieldWriter = arrElementWriter,
+array = array
+  ))
--- End diff --

Let's change this one back to:

```scala
def write(array: ArrayData): Unit = writeArray(writeArrayData(array, 
arrElementWriter))
```




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-11 Thread goldmedal

Github user goldmedal commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r138110523
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/json/JacksonGeneratorSuite.scala
 ---
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.json
+
+import java.io.CharArrayWriter
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, 
DateTimeUtils, GenericArrayData}
+import org.apache.spark.sql.types._
+
+class JacksonGeneratorSuite extends SparkFunSuite {
+
+  val gmtId = DateTimeUtils.TimeZoneGMT.getID
+  val option = new JSONOptions(Map.empty, gmtId)
+
+  test("initial with StructType and write out a row") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = InternalRow(1)
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """{"a":1}""")
+  }
+
+  test("initial with StructType and write out rows") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = new GenericArrayData(InternalRow(1) :: InternalRow(2) :: 
Nil)
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """[{"a":1},{"a":2}]""")
+  }
+
+  test("initial with StructType and write out an array with single empty 
row") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = new GenericArrayData(InternalRow(null) :: Nil)
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """[{}]""")
+  }
+
+  test("initial with StructType and write out an empty array") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = new GenericArrayData(Nil)
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """[]""")
+  }
+
+  test("initial with Map and write out a map data") {
+val dataType = MapType(StringType, IntegerType)
+val input = ArrayBasedMapData(Map("a" -> 1))
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """{"a":1}""")
+  }
+
+  test("initial with Map and write out an array of maps") {
+val dataType = MapType(StringType, IntegerType)
+val input = new GenericArrayData(
+  ArrayBasedMapData(Map("a" -> 1)) :: ArrayBasedMapData(Map("b" -> 2)) 
:: Nil)
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """[{"a":1},{"b":2}]""")
+  }
+
+  test("error handling: initial with StructType but error calling write a 
map") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = ArrayBasedMapData(Map("a" -> 1))
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+intercept[Exception] {
+  gen.write(input)
+}
+  }
+
+  test("error handling: initial with StructType but error calling write an 
array of maps") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = new GenericArrayData(
+  ArrayBasedMapData(Map("a" -> 1)) :: ArrayBasedMapData(Map("b" -> 2)) 
:: Nil)
+val writer = new

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-11 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r138107641
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/json/JacksonGeneratorSuite.scala
 ---
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.json
+
+import java.io.CharArrayWriter
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, 
DateTimeUtils, GenericArrayData}
+import org.apache.spark.sql.types._
+
+class JacksonGeneratorSuite extends SparkFunSuite {
+
+  val gmtId = DateTimeUtils.TimeZoneGMT.getID
+  val option = new JSONOptions(Map.empty, gmtId)
+
+  test("initial with StructType and write out a row") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = InternalRow(1)
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """{"a":1}""")
+  }
+
+  test("initial with StructType and write out rows") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = new GenericArrayData(InternalRow(1) :: InternalRow(2) :: 
Nil)
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """[{"a":1},{"a":2}]""")
+  }
+
+  test("initial with StructType and write out an array with single empty 
row") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = new GenericArrayData(InternalRow(null) :: Nil)
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """[{}]""")
+  }
+
+  test("initial with StructType and write out an empty array") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = new GenericArrayData(Nil)
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """[]""")
+  }
+
+  test("initial with Map and write out a map data") {
+val dataType = MapType(StringType, IntegerType)
+val input = ArrayBasedMapData(Map("a" -> 1))
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """{"a":1}""")
+  }
+
+  test("initial with Map and write out an array of maps") {
+val dataType = MapType(StringType, IntegerType)
+val input = new GenericArrayData(
+  ArrayBasedMapData(Map("a" -> 1)) :: ArrayBasedMapData(Map("b" -> 2)) 
:: Nil)
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """[{"a":1},{"b":2}]""")
+  }
+
+  test("error handling: initial with StructType but error calling write a 
map") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = ArrayBasedMapData(Map("a" -> 1))
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+intercept[Exception] {
+  gen.write(input)
+}
+  }
+
+  test("error handling: initial with StructType but error calling write an 
array of maps") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = new GenericArrayData(
+  ArrayBasedMapData(Map("a" -> 1)) :: ArrayBasedMapData(Map("b" -> 2)) 
:: Nil)
+val writer = new

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-11 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r138104414
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala ---
@@ -180,10 +180,30 @@ class JsonFunctionsSuite extends QueryTest with 
SharedSQLContext {
 
   test("to_json - array") {
 val df = Seq(Tuple1(Tuple1(1) :: Nil)).toDF("a")
+val df2 = Seq(Tuple1(Map("a" -> 1) :: Nil)).toDF("a")
 
 checkAnswer(
   df.select(to_json($"a")),
   Row("""[{"_1":1}]""") :: Nil)
+checkAnswer(
+  df2.select(to_json($"a")),
+  Row("""[{"a":1}]""") :: Nil)
+  }
+
+  test("to_json - map") {
+val df1 = Seq(Map("a" -> Tuple1(1))).toDF("a")
+val df2 = Seq(Map(Tuple1(1) -> Tuple1(1))).toDF("a")
+val df3 = Seq(Map("a" -> 1)).toDF("a")
+
+checkAnswer(
+  df1.select(to_json($"a")),
+  Row("""{"a":{"_1":1}}""") :: Nil)
+checkAnswer(
+  df2.select(to_json($"a")),
+  Row("""{"[0,1]":{"_1":1}}""") :: Nil)
--- End diff --

Yeah, I agree to remove it for now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-11 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r138103265
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala ---
@@ -180,10 +180,30 @@ class JsonFunctionsSuite extends QueryTest with 
SharedSQLContext {
 
   test("to_json - array") {
 val df = Seq(Tuple1(Tuple1(1) :: Nil)).toDF("a")
+val df2 = Seq(Tuple1(Map("a" -> 1) :: Nil)).toDF("a")
 
 checkAnswer(
   df.select(to_json($"a")),
   Row("""[{"_1":1}]""") :: Nil)
+checkAnswer(
+  df2.select(to_json($"a")),
+  Row("""[{"a":1}]""") :: Nil)
+  }
+
+  test("to_json - map") {
+val df1 = Seq(Map("a" -> Tuple1(1))).toDF("a")
+val df2 = Seq(Map(Tuple1(1) -> Tuple1(1))).toDF("a")
+val df3 = Seq(Map("a" -> 1)).toDF("a")
+
+checkAnswer(
+  df1.select(to_json($"a")),
+  Row("""{"a":{"_1":1}}""") :: Nil)
+checkAnswer(
+  df2.select(to_json($"a")),
+  Row("""{"[0,1]":{"_1":1}}""") :: Nil)
--- End diff --

I get your point and I know that also makes sense. How about removing this 
case out away? I think we might have to discuss more if this is a bug or not 
but want to rather take out what we are not sure of for now (as it is strictly 
not related with this PR, if I understood correctly).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-11 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r138102302
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala ---
@@ -180,10 +180,30 @@ class JsonFunctionsSuite extends QueryTest with 
SharedSQLContext {
 
   test("to_json - array") {
 val df = Seq(Tuple1(Tuple1(1) :: Nil)).toDF("a")
+val df2 = Seq(Tuple1(Map("a" -> 1) :: Nil)).toDF("a")
 
 checkAnswer(
   df.select(to_json($"a")),
   Row("""[{"_1":1}]""") :: Nil)
+checkAnswer(
+  df2.select(to_json($"a")),
+  Row("""[{"a":1}]""") :: Nil)
+  }
+
+  test("to_json - map") {
+val df1 = Seq(Map("a" -> Tuple1(1))).toDF("a")
+val df2 = Seq(Map(Tuple1(1) -> Tuple1(1))).toDF("a")
+val df3 = Seq(Map("a" -> 1)).toDF("a")
+
+checkAnswer(
+  df1.select(to_json($"a")),
+  Row("""{"a":{"_1":1}}""") :: Nil)
+checkAnswer(
+  df2.select(to_json($"a")),
+  Row("""{"[0,1]":{"_1":1}}""") :: Nil)
--- End diff --

Yes, I was thinking that showing debugging (internal formatted) strings is 
rather a bug and actually we might have to fix if possible, as I think we have 
pretty `toString` format for public API `Row` class.

I remember we eventually decided to fix `TimestampType` and `DataType` in 
JSON, which were printed as longs and integers, to human readable formats, 
SPARK-16216. I think there was a discussion that they should be written as are 
(long and int) vs human readable format at that time if I understood correctly.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-11 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r138095618
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala ---
@@ -180,10 +180,30 @@ class JsonFunctionsSuite extends QueryTest with 
SharedSQLContext {
 
   test("to_json - array") {
 val df = Seq(Tuple1(Tuple1(1) :: Nil)).toDF("a")
+val df2 = Seq(Tuple1(Map("a" -> 1) :: Nil)).toDF("a")
 
 checkAnswer(
   df.select(to_json($"a")),
   Row("""[{"_1":1}]""") :: Nil)
+checkAnswer(
+  df2.select(to_json($"a")),
+  Row("""[{"a":1}]""") :: Nil)
+  }
+
+  test("to_json - map") {
+val df1 = Seq(Map("a" -> Tuple1(1))).toDF("a")
+val df2 = Seq(Map(Tuple1(1) -> Tuple1(1))).toDF("a")
+val df3 = Seq(Map("a" -> 1)).toDF("a")
+
+checkAnswer(
+  df1.select(to_json($"a")),
+  Row("""{"a":{"_1":1}}""") :: Nil)
+checkAnswer(
+  df2.select(to_json($"a")),
+  Row("""{"[0,1]":{"_1":1}}""") :: Nil)
--- End diff --

I think we don't have a `toString` method for `UnsafeRow` to show up 
well-format string representation. The current `toString` for it is just for 
debugging. It simply gets the hex string for each 8 bytes. Consider the nature 
of `UnsafeRow`, this `toString` might make sense to me.

We can remove this test case if the result could be confusing, otherwise we 
should add a comment to explain why the JSON looks like it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-11 Thread goldmedal

Github user goldmedal commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r138094337
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/json/JacksonGeneratorSuite.scala
 ---
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.json
+
+import java.io.CharArrayWriter
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, 
DateTimeUtils, GenericArrayData}
+import org.apache.spark.sql.types._
+
+class JacksonGeneratorSuite extends SparkFunSuite {
+
+  val gmtId = DateTimeUtils.TimeZoneGMT.getID
+  val option = new JSONOptions(Map.empty, gmtId)
+
+  test("initial with StructType and write out a row") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = InternalRow(1)
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """{"a":1}""")
+  }
+
+  test("initial with StructType and write out rows") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = new GenericArrayData(InternalRow(1) :: InternalRow(2) :: 
Nil)
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """[{"a":1},{"a":2}]""")
+  }
+
+  test("initial with StructType and write out an array with single empty 
row") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = new GenericArrayData(InternalRow(null) :: Nil)
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """[{}]""")
+  }
+
+  test("initial with StructType and write out an empty array") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = new GenericArrayData(Nil)
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """[]""")
+  }
+
+  test("initial with Map and write out a map data") {
+val dataType = MapType(StringType, IntegerType)
+val input = ArrayBasedMapData(Map("a" -> 1))
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """{"a":1}""")
+  }
+
+  test("initial with Map and write out an array of maps") {
+val dataType = MapType(StringType, IntegerType)
+val input = new GenericArrayData(
+  ArrayBasedMapData(Map("a" -> 1)) :: ArrayBasedMapData(Map("b" -> 2)) 
:: Nil)
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """[{"a":1},{"b":2}]""")
+  }
+
+  test("error handling: initial with StructType but error calling write a 
map") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = ArrayBasedMapData(Map("a" -> 1))
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+intercept[Exception] {
+  gen.write(input)
+}
+  }
+
+  test("error handling: initial with StructType but error calling write an 
array of maps") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = new GenericArrayData(
+  ArrayBasedMapData(Map("a" -> 1)) :: ArrayBasedMapData(Map("b" -> 2)) 
:: Nil)
+val writer = new

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-11 Thread goldmedal

Github user goldmedal commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r138087980
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/json/JacksonGeneratorSuite.scala
 ---
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.json
+
+import java.io.CharArrayWriter
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, 
DateTimeUtils, GenericArrayData}
+import org.apache.spark.sql.types._
+
+class JacksonGeneratorSuite extends SparkFunSuite {
+
+  val gmtId = DateTimeUtils.TimeZoneGMT.getID
+  val option = new JSONOptions(Map.empty, gmtId)
+
+  test("initial with StructType and write out a row") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = InternalRow(1)
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """{"a":1}""")
+  }
+
+  test("initial with StructType and write out rows") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = new GenericArrayData(InternalRow(1) :: InternalRow(2) :: 
Nil)
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """[{"a":1},{"a":2}]""")
+  }
+
+  test("initial with StructType and write out an array with single empty 
row") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = new GenericArrayData(InternalRow(null) :: Nil)
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """[{}]""")
+  }
+
+  test("initial with StructType and write out an empty array") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = new GenericArrayData(Nil)
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """[]""")
+  }
+
+  test("initial with Map and write out a map data") {
+val dataType = MapType(StringType, IntegerType)
+val input = ArrayBasedMapData(Map("a" -> 1))
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """{"a":1}""")
+  }
+
+  test("initial with Map and write out an array of maps") {
+val dataType = MapType(StringType, IntegerType)
+val input = new GenericArrayData(
+  ArrayBasedMapData(Map("a" -> 1)) :: ArrayBasedMapData(Map("b" -> 2)) 
:: Nil)
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """[{"a":1},{"b":2}]""")
+  }
+
+  test("error handling: initial with StructType but error calling write a 
map") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = ArrayBasedMapData(Map("a" -> 1))
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+intercept[Exception] {
+  gen.write(input)
+}
+  }
+
+  test("error handling: initial with StructType but error calling write an 
array of maps") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = new GenericArrayData(
+  ArrayBasedMapData(Map("a" -> 1)) :: ArrayBasedMapData(Map("b" -> 2)) 
:: Nil)
+val writer = new

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-11 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r138087162
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/json/JacksonGeneratorSuite.scala
 ---
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.json
+
+import java.io.CharArrayWriter
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, 
DateTimeUtils, GenericArrayData}
+import org.apache.spark.sql.types._
+
+class JacksonGeneratorSuite extends SparkFunSuite {
+
+  val gmtId = DateTimeUtils.TimeZoneGMT.getID
+  val option = new JSONOptions(Map.empty, gmtId)
+
+  test("initial with StructType and write out a row") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = InternalRow(1)
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """{"a":1}""")
+  }
+
+  test("initial with StructType and write out rows") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = new GenericArrayData(InternalRow(1) :: InternalRow(2) :: 
Nil)
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """[{"a":1},{"a":2}]""")
+  }
+
+  test("initial with StructType and write out an array with single empty 
row") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = new GenericArrayData(InternalRow(null) :: Nil)
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """[{}]""")
+  }
+
+  test("initial with StructType and write out an empty array") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = new GenericArrayData(Nil)
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """[]""")
+  }
+
+  test("initial with Map and write out a map data") {
+val dataType = MapType(StringType, IntegerType)
+val input = ArrayBasedMapData(Map("a" -> 1))
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """{"a":1}""")
+  }
+
+  test("initial with Map and write out an array of maps") {
+val dataType = MapType(StringType, IntegerType)
+val input = new GenericArrayData(
+  ArrayBasedMapData(Map("a" -> 1)) :: ArrayBasedMapData(Map("b" -> 2)) 
:: Nil)
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """[{"a":1},{"b":2}]""")
+  }
+
+  test("error handling: initial with StructType but error calling write a 
map") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = ArrayBasedMapData(Map("a" -> 1))
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+intercept[Exception] {
+  gen.write(input)
+}
+  }
+
+  test("error handling: initial with StructType but error calling write an 
array of maps") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = new GenericArrayData(
+  ArrayBasedMapData(Map("a" -> 1)) :: ArrayBasedMapData(Map("b" -> 2)) 
:: Nil)
+val writer = new

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-11 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r138080916
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/json/JacksonGeneratorSuite.scala
 ---
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.json
+
+import java.io.CharArrayWriter
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, 
DateTimeUtils, GenericArrayData}
+import org.apache.spark.sql.types._
+
+class JacksonGeneratorSuite extends SparkFunSuite {
+
+  val gmtId = DateTimeUtils.TimeZoneGMT.getID
+  val option = new JSONOptions(Map.empty, gmtId)
+
+  test("initial with StructType and write out a row") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = InternalRow(1)
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """{"a":1}""")
+  }
+
+  test("initial with StructType and write out rows") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = new GenericArrayData(InternalRow(1) :: InternalRow(2) :: 
Nil)
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """[{"a":1},{"a":2}]""")
+  }
+
+  test("initial with StructType and write out an array with single empty 
row") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = new GenericArrayData(InternalRow(null) :: Nil)
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """[{}]""")
+  }
+
+  test("initial with StructType and write out an empty array") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = new GenericArrayData(Nil)
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """[]""")
+  }
+
+  test("initial with Map and write out a map data") {
+val dataType = MapType(StringType, IntegerType)
+val input = ArrayBasedMapData(Map("a" -> 1))
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """{"a":1}""")
+  }
+
+  test("initial with Map and write out an array of maps") {
+val dataType = MapType(StringType, IntegerType)
+val input = new GenericArrayData(
+  ArrayBasedMapData(Map("a" -> 1)) :: ArrayBasedMapData(Map("b" -> 2)) 
:: Nil)
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """[{"a":1},{"b":2}]""")
+  }
+
+  test("error handling: initial with StructType but error calling write a 
map") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = ArrayBasedMapData(Map("a" -> 1))
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+intercept[Exception] {
+  gen.write(input)
+}
+  }
+
+  test("error handling: initial with StructType but error calling write an 
array of maps") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = new GenericArrayData(
+  ArrayBasedMapData(Map("a" -> 1)) :: ArrayBasedMapData(Map("b" -> 2)) 
:: Nil)
+val writer = new

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-11 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r138078894
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/JsonExpressionsSuite.scala
 ---
@@ -612,6 +612,54 @@ class JsonExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 )
   }
 
+  test("SPARK-21513: to_json support map[string, struct] to json") {
+val schema = MapType(StringType, StructType(StructField("a", 
IntegerType) :: Nil))
+val input = Literal.create(ArrayBasedMapData(Map("test" -> 
InternalRow(1))), schema)
+checkEvaluation(
+  StructsToJson(Map.empty, input),
+  """{"test":{"a":1}}"""
+)
+  }
+
+  test("SPARK-21513: to_json support map[struct, struct] to json") {
+val schema = MapType(StructType(StructField("a", IntegerType) :: Nil),
+  StructType(StructField("b", IntegerType) :: Nil))
+val input = Literal.create(ArrayBasedMapData(Map(InternalRow(1) -> 
InternalRow(2))), schema)
+checkEvaluation(
+  StructsToJson(Map.empty, input),
+  """{"[1]":{"b":2}}"""
+)
+  }
+
+  test("SPARK-21513: to_json support map[string, integer] to json") {
+val schema = MapType(StringType, IntegerType)
+val input = Literal.create(ArrayBasedMapData(Map("a" -> 1)), schema)
+checkEvaluation(
+  StructsToJson(Map.empty, input),
+  """{"a":1}"""
+)
+  }
+
+  test("to_json - array with maps") {
+val inputSchema = ArrayType(MapType(StringType, IntegerType))
+val input = new GenericArrayData(ArrayBasedMapData(
+  Map("a" -> 1)) :: ArrayBasedMapData(Map("b" -> 2)) :: Nil)
+val output = """[{"a":1},{"b":2}]"""
+checkEvaluation(
+  StructsToJson(Map.empty, Literal.create(input, inputSchema), gmtId),
+  output)
+  }
+
+  test("to_json - array with single map") {
+val inputSchema = ArrayType(MapType(StringType, IntegerType))
+val input = new GenericArrayData(ArrayBasedMapData(
+  Map("a" -> 1)) :: Nil)
--- End diff --

Let's make this inlined.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-11 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r138079126
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -193,14 +226,26 @@ private[sql] class JacksonGenerator(
*
* @param row The row to convert
*/
-  def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, 
rootFieldWriters))
+  def write(row: InternalRow): Unit = {
+writeObject(writeFields(row, dataType.asInstanceOf[StructType], 
rootFieldWriters))
+  }
+
 
   /**
-   * Transforms multiple `InternalRow`s to JSON array using Jackson
+   * Transforms multiple `InternalRow`s or `MapData`s to JSON array using 
Jackson
*
-   * @param array The array of rows to convert
+   * @param array The array of rows or maps to convert
*/
   def write(array: ArrayData): Unit = writeArray(writeArrayData(array, 
arrElementWriter))
 
+  /**
+   * Transforms a `MapData` to JSON object using Jackson
--- End diff --

Not a big deal but `` a `MapData` `` -> `` a single `MapData` `` just for 
consistency with ` write(row: InternalRow)`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-11 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r138077808
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala ---
@@ -180,10 +180,30 @@ class JsonFunctionsSuite extends QueryTest with 
SharedSQLContext {
 
   test("to_json - array") {
 val df = Seq(Tuple1(Tuple1(1) :: Nil)).toDF("a")
+val df2 = Seq(Tuple1(Map("a" -> 1) :: Nil)).toDF("a")
 
 checkAnswer(
   df.select(to_json($"a")),
   Row("""[{"_1":1}]""") :: Nil)
+checkAnswer(
+  df2.select(to_json($"a")),
+  Row("""[{"a":1}]""") :: Nil)
+  }
+
+  test("to_json - map") {
+val df1 = Seq(Map("a" -> Tuple1(1))).toDF("a")
+val df2 = Seq(Map(Tuple1(1) -> Tuple1(1))).toDF("a")
+val df3 = Seq(Map("a" -> 1)).toDF("a")
+
+checkAnswer(
+  df1.select(to_json($"a")),
+  Row("""{"a":{"_1":1}}""") :: Nil)
+checkAnswer(
+  df2.select(to_json($"a")),
+  Row("""{"[0,1]":{"_1":1}}""") :: Nil)
--- End diff --

This case looks rather a string representation from `UnsafeRow` and a bug. 
Let's remove this test case for now and fix it later together in another PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-11 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r138057979
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -677,14 +696,25 @@ case class StructsToJson(
   override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
 case _: StructType | ArrayType(_: StructType, _) =>
   try {
-JacksonUtils.verifySchema(rowSchema)
+JacksonUtils.verifySchema(rowSchema.asInstanceOf[StructType])
+TypeCheckResult.TypeCheckSuccess
+  } catch {
+case e: UnsupportedOperationException =>
+  TypeCheckResult.TypeCheckFailure(e.getMessage)
+  }
+case _: MapType | ArrayType(_: MapType, _) =>
+  // TODO: let `JacksonUtils.verifySchema` verify a `MapType`
+  try {
+val st = StructType(StructField("a", 
rowSchema.asInstanceOf[MapType]) :: Nil)
+JacksonUtils.verifySchema(st)
 TypeCheckResult.TypeCheckSuccess
   } catch {
 case e: UnsupportedOperationException =>
   TypeCheckResult.TypeCheckFailure(e.getMessage)
   }
 case _ => TypeCheckResult.TypeCheckFailure(
-  s"Input type ${child.dataType.simpleString} must be a struct or 
array of structs.")
+  s"Input type ${child.dataType.simpleString} must be a struct, array 
of structs or " +
+  s"a map or array of map.")
--- End diff --

Looks `s` could be removed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-11 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r138073822
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/json/JacksonGeneratorSuite.scala
 ---
@@ -0,0 +1,133 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.json
+
+import java.io.CharArrayWriter
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, 
DateTimeUtils, GenericArrayData}
+import org.apache.spark.sql.types._
+
+class JacksonGeneratorSuite extends SparkFunSuite {
+
+  val gmtId = DateTimeUtils.TimeZoneGMT.getID
+  val option = new JSONOptions(Map.empty, gmtId)
+
+  test("initial with StructType and write out a row") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = InternalRow(1)
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """{"a":1}""")
+  }
+
+  test("initial with StructType and write out rows") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = new GenericArrayData(InternalRow(1) :: InternalRow(2) :: 
Nil)
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """[{"a":1},{"a":2}]""")
+  }
+
+  test("initial with StructType and write out an array with single empty 
row") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = new GenericArrayData(InternalRow(null) :: Nil)
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """[{}]""")
+  }
+
+  test("initial with StructType and write out an empty array") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = new GenericArrayData(Nil)
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """[]""")
+  }
+
+  test("initial with Map and write out a map data") {
+val dataType = MapType(StringType, IntegerType)
+val input = ArrayBasedMapData(Map("a" -> 1))
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """{"a":1}""")
+  }
+
+  test("initial with Map and write out an array of maps") {
+val dataType = MapType(StringType, IntegerType)
+val input = new GenericArrayData(
+  ArrayBasedMapData(Map("a" -> 1)) :: ArrayBasedMapData(Map("b" -> 2)) 
:: Nil)
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """[{"a":1},{"b":2}]""")
+  }
+
+  test("error handling: initial with StructType but error calling write a 
map") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = ArrayBasedMapData(Map("a" -> 1))
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+intercept[Exception] {
+  gen.write(input)
+}
+  }
+
+  test("error handling: initial with StructType but error calling write an 
array of maps") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = new GenericArrayData(
+  ArrayBasedMapData(Map("a" -> 1)) :: ArrayBasedMapData(Map("b" -> 2)) 
:: Nil)
+val writer = new

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-11 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r138058970
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -26,20 +26,53 @@ import 
org.apache.spark.sql.catalyst.expressions.SpecializedGetters
 import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, 
MapData}
 import org.apache.spark.sql.types._
 
+/**
+ * `JackGenerator` can only be initialized with a `StructType` or a 
`MapType`.
+ * Once it is initialized with `StructType`, it can be used to write out a 
struct or an array of
+ * struct. Once it is initialized with ``MapType``, it can be used to 
write out a map or an array
+ * of map. An exception will be thrown if trying to write out a struct if 
it is initialized with
+ * a `MapType`, and vice verse.
+ */
 private[sql] class JacksonGenerator(
-schema: StructType,
+dataType: DataType,
 writer: Writer,
 options: JSONOptions) {
   // A `ValueWriter` is responsible for writing a field of an 
`InternalRow` to appropriate
   // JSON data. Here we are using `SpecializedGetters` rather than 
`InternalRow` so that
   // we can directly access data in `ArrayData` without the help of 
`SpecificMutableRow`.
   private type ValueWriter = (SpecializedGetters, Int) => Unit
 
+  // `JackGenerator` can only be initialized with a `StructType` or a 
`MapType`.
+  dataType match {
+case _: StructType | _: MapType =>
+case _ => throw new UnsupportedOperationException(
+  s"`JacksonGenerator` only supports to be initialized with a 
`StructType` " +
--- End diff --

This `s` could be removed too. Let's avoid backquoting classes in the error 
messages.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-11 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r138058695
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -26,20 +26,53 @@ import 
org.apache.spark.sql.catalyst.expressions.SpecializedGetters
 import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, 
MapData}
 import org.apache.spark.sql.types._
 
+/**
+ * `JackGenerator` can only be initialized with a `StructType` or a 
`MapType`.
+ * Once it is initialized with `StructType`, it can be used to write out a 
struct or an array of
+ * struct. Once it is initialized with ``MapType``, it can be used to 
write out a map or an array
+ * of map. An exception will be thrown if trying to write out a struct if 
it is initialized with
+ * a `MapType`, and vice verse.
+ */
 private[sql] class JacksonGenerator(
-schema: StructType,
+dataType: DataType,
 writer: Writer,
 options: JSONOptions) {
   // A `ValueWriter` is responsible for writing a field of an 
`InternalRow` to appropriate
   // JSON data. Here we are using `SpecializedGetters` rather than 
`InternalRow` so that
   // we can directly access data in `ArrayData` without the help of 
`SpecificMutableRow`.
   private type ValueWriter = (SpecializedGetters, Int) => Unit
 
+  // `JackGenerator` can only be initialized with a `StructType` or a 
`MapType`.
+  dataType match {
--- End diff --

I'd just do a `require(... .isInstanceOf... || ..., "...")` instead.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-11 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r138058166
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -26,20 +26,53 @@ import 
org.apache.spark.sql.catalyst.expressions.SpecializedGetters
 import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, 
MapData}
 import org.apache.spark.sql.types._
 
+/**
+ * `JackGenerator` can only be initialized with a `StructType` or a 
`MapType`.
+ * Once it is initialized with `StructType`, it can be used to write out a 
struct or an array of
+ * struct. Once it is initialized with ``MapType``, it can be used to 
write out a map or an array
--- End diff --

nit: ``` ``MapType`` ``` -> `` `MapType` ``


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-11 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137997910
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -26,20 +26,55 @@ import 
org.apache.spark.sql.catalyst.expressions.SpecializedGetters
 import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, 
MapData}
 import org.apache.spark.sql.types._
 
+/**
+ * `JackGenerator` can only be initialized with a `StructType` or a 
`MapType`.
+ * Once it is initialized with `StructType`, it can be used to write out a 
struct or an array of
+ * struct. Once it is initialized with ``MapType``, it can be used to 
write out a map or an array
+ * of map. An exception will be thrown if trying to write out a struct if 
it is initialized with
+ * a `MapType`, and vice verse.
+ */
 private[sql] class JacksonGenerator(
-schema: StructType,
+dataType: DataType,
 writer: Writer,
 options: JSONOptions) {
   // A `ValueWriter` is responsible for writing a field of an 
`InternalRow` to appropriate
   // JSON data. Here we are using `SpecializedGetters` rather than 
`InternalRow` so that
   // we can directly access data in `ArrayData` without the help of 
`SpecificMutableRow`.
   private type ValueWriter = (SpecializedGetters, Int) => Unit
 
+  // `JackGenerator` can only be initialized with a `StructType` or a 
`MapType`.
+  dataType match {
+case _: StructType | _: MapType =>
+case _ => throw new UnsupportedOperationException(
+  s"`JacksonGenerator` only supports to be initialized with a 
`StructType` " +
+  s"or `MapType` but got ${dataType.simpleString}")
+  }
+
   // `ValueWriter`s for all fields of the schema
-  private val rootFieldWriters: Array[ValueWriter] = 
schema.map(_.dataType).map(makeWriter).toArray
+  private lazy val rootFieldWriters: Array[ValueWriter] = dataType match {
+case st: StructType => st.map(_.dataType).map(makeWriter).toArray
+case _ => throw new UnsupportedOperationException(
+  s"Initial type ${dataType.simpleString} must be a struct")
+  }
+
   // `ValueWriter` for array data storing rows of the schema.
-  private val arrElementWriter: ValueWriter = (arr: SpecializedGetters, i: 
Int) => {
-writeObject(writeFields(arr.getStruct(i, schema.length), schema, 
rootFieldWriters))
+  private lazy val arrElementWriter: ValueWriter = dataType match {
+case st: StructType =>
+  (arr: SpecializedGetters, i: Int) => {
+writeObject(writeFields(arr.getStruct(i, st.length), st, 
rootFieldWriters))
+  }
+case mt: MapType =>
+  (arr: SpecializedGetters, i: Int) => {
+writeObject(writeMapData(arr.getMap(i), mt, mapElementWriter))
+  }
+case _ => throw new UnsupportedOperationException(
--- End diff --

We don't need this. `dataType` can only be `StructType` or `MapType`, as we 
checked it in constructor.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-11 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137988061
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/json/JacksonGeneratorSuite.scala
 ---
@@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.json
+
+import java.io.CharArrayWriter
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, 
DateTimeUtils, GenericArrayData}
+import org.apache.spark.sql.types._
+
+class JacksonGeneratorSuite extends SparkFunSuite {
+
+  val gmtId = DateTimeUtils.TimeZoneGMT.getID
+  val option = new JSONOptions(Map.empty, gmtId)
+
+  test("initial with StructType and write out a row") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = InternalRow(1)
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """{"a":1}""")
+  }
+
+  test("initial with StructType and write out rows") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = new GenericArrayData(InternalRow(1) :: InternalRow(2) :: 
Nil)
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """[{"a":1},{"a":2}]""")
+  }
+
+  test("initial with StructType and write out an array with single empty 
row") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = new GenericArrayData(InternalRow(null) :: Nil)
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """[{}]""")
+  }
+
+  test("initial with StructType and write out an empty array") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = new GenericArrayData(Nil)
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """[]""")
+  }
+
+  test("initial with Map and write out a map data") {
+val dataType = MapType(StringType, IntegerType)
+val input = ArrayBasedMapData(Map("a" -> 1))
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """{"a":1}""")
+  }
+
+  test("initial with Map and write out an array of maps") {
+val dataType = MapType(StringType, IntegerType)
+val input = new GenericArrayData(
+  ArrayBasedMapData(Map("a" -> 1)) :: ArrayBasedMapData(Map("b" -> 2)) 
:: Nil)
+val writer = new CharArrayWriter()
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+gen.flush()
+assert(writer.toString === """[{"a":1},{"b":2}]""")
+  }
--- End diff --

We need negative test cases, i.e, `JacksonGenerator` initialized with 
`StructType` but used to write map, array of map. Also `JacksonGenerator` 
initialized with `MapType` but used to write struct, array of struct.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-11 Thread goldmedal

Github user goldmedal commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137987843
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -193,14 +228,32 @@ private[sql] class JacksonGenerator(
*
* @param row The row to convert
*/
-  def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, 
rootFieldWriters))
+  def write(row: InternalRow): Unit = dataType match {
+case st: StructType =>
+  writeObject(writeFields(row, st, rootFieldWriters))
+case _ => throw new UnsupportedOperationException(
--- End diff --

It will throw `ClassCastException`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-11 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137987691
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -193,14 +228,32 @@ private[sql] class JacksonGenerator(
*
* @param row The row to convert
*/
-  def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, 
rootFieldWriters))
+  def write(row: InternalRow): Unit = dataType match {
+case st: StructType =>
+  writeObject(writeFields(row, st, rootFieldWriters))
+case _ => throw new UnsupportedOperationException(
--- End diff --

hmm, what type of exception thrown now if call `write(row)` when the 
generator is initialized with `MapType`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-11 Thread goldmedal

Github user goldmedal commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137987296
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -193,14 +228,32 @@ private[sql] class JacksonGenerator(
*
* @param row The row to convert
*/
-  def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, 
rootFieldWriters))
+  def write(row: InternalRow): Unit = dataType match {
+case st: StructType =>
+  writeObject(writeFields(row, st, rootFieldWriters))
+case _ => throw new UnsupportedOperationException(
--- End diff --

oh I got you wrong. I thought you mean the matching in `rootFieldWeriters`. 
So we need to keep both of them?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-11 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137986494
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -193,14 +228,32 @@ private[sql] class JacksonGenerator(
*
* @param row The row to convert
*/
-  def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, 
rootFieldWriters))
+  def write(row: InternalRow): Unit = dataType match {
+case st: StructType =>
+  writeObject(writeFields(row, st, rootFieldWriters))
+case _ => throw new UnsupportedOperationException(
--- End diff --

I saw you remove the pattern matching?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-11 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137986264
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -677,14 +696,25 @@ case class StructsToJson(
   override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
 case _: StructType | ArrayType(_: StructType, _) =>
   try {
-JacksonUtils.verifySchema(rowSchema)
+JacksonUtils.verifySchema(rowSchema.asInstanceOf[StructType])
+TypeCheckResult.TypeCheckSuccess
+  } catch {
+case e: UnsupportedOperationException =>
+  TypeCheckResult.TypeCheckFailure(e.getMessage)
+  }
+case _: MapType | ArrayType(_: MapType, _) =>
+  // TODO: let `JacksonUtils.verifySchema` verify a `MapType`
+  try {
+val st = StructType(StructField("a", 
rowSchema.asInstanceOf[MapType]) :: Nil)
+JacksonUtils.verifySchema(st)
 TypeCheckResult.TypeCheckSuccess
   } catch {
 case e: UnsupportedOperationException =>
   TypeCheckResult.TypeCheckFailure(e.getMessage)
   }
 case _ => TypeCheckResult.TypeCheckFailure(
-  s"Input type ${child.dataType.simpleString} must be a struct or 
array of structs.")
+  s"Input type ${child.dataType.simpleString} must be a struct, array 
of structs or " +
+  s"an arbitrary map.")
--- End diff --

`an arbitrary map` -> `a map or array of map.`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-11 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137986124
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -677,14 +696,42 @@ case class StructsToJson(
   override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
 case _: StructType | ArrayType(_: StructType, _) =>
   try {
-JacksonUtils.verifySchema(rowSchema)
+JacksonUtils.verifySchema(rowSchema.asInstanceOf[StructType])
+TypeCheckResult.TypeCheckSuccess
+  } catch {
+case e: UnsupportedOperationException =>
+  TypeCheckResult.TypeCheckFailure(e.getMessage)
+  }
+case ArrayType(mt: MapType, _) =>
+  try {
+val st = StructType(StructField("a", mt) :: Nil)
+JacksonUtils.verifySchema(st)
+TypeCheckResult.TypeCheckSuccess
+  } catch {
+case e: UnsupportedOperationException =>
+  TypeCheckResult.TypeCheckFailure(e.getMessage)
+  }
+case MapType(_: DataType, st: StructType, _: Boolean) =>
--- End diff --

Yeah, as you support arbitrary map. Not all map types actually, it leaves 
to `JacksonUtils` to verify the schema.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread goldmedal

Github user goldmedal commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137984767
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -677,14 +696,42 @@ case class StructsToJson(
   override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
 case _: StructType | ArrayType(_: StructType, _) =>
   try {
-JacksonUtils.verifySchema(rowSchema)
+JacksonUtils.verifySchema(rowSchema.asInstanceOf[StructType])
+TypeCheckResult.TypeCheckSuccess
+  } catch {
+case e: UnsupportedOperationException =>
+  TypeCheckResult.TypeCheckFailure(e.getMessage)
+  }
+case ArrayType(mt: MapType, _) =>
+  try {
+val st = StructType(StructField("a", mt) :: Nil)
+JacksonUtils.verifySchema(st)
+TypeCheckResult.TypeCheckSuccess
+  } catch {
+case e: UnsupportedOperationException =>
+  TypeCheckResult.TypeCheckFailure(e.getMessage)
+  }
+case MapType(_: DataType, st: StructType, _: Boolean) =>
--- End diff --

@viirya I think if we have `case mt: MapType`, we don't need this pattern 
to verify schema, right?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread goldmedal

Github user goldmedal commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137980084
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -677,14 +696,27 @@ case class StructsToJson(
   override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
 case _: StructType | ArrayType(_: StructType, _) =>
   try {
-JacksonUtils.verifySchema(rowSchema)
+JacksonUtils.verifySchema(rowSchema.asInstanceOf[StructType])
+TypeCheckResult.TypeCheckSuccess
+  } catch {
+case e: UnsupportedOperationException =>
+  TypeCheckResult.TypeCheckFailure(e.getMessage)
+  }
+case ArrayType(_: MapType, _) =>
+  TypeCheckResult.TypeCheckSuccess
+case MapType(_: DataType, st: StructType, _: Boolean) =>
+  try {
+JacksonUtils.verifySchema(st)
 TypeCheckResult.TypeCheckSuccess
   } catch {
 case e: UnsupportedOperationException =>
   TypeCheckResult.TypeCheckFailure(e.getMessage)
   }
+case _: MapType =>
--- End diff --

ok. I just leave a TODO comment for it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137979202
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -677,14 +696,27 @@ case class StructsToJson(
   override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
 case _: StructType | ArrayType(_: StructType, _) =>
   try {
-JacksonUtils.verifySchema(rowSchema)
+JacksonUtils.verifySchema(rowSchema.asInstanceOf[StructType])
+TypeCheckResult.TypeCheckSuccess
+  } catch {
+case e: UnsupportedOperationException =>
+  TypeCheckResult.TypeCheckFailure(e.getMessage)
+  }
+case ArrayType(_: MapType, _) =>
+  TypeCheckResult.TypeCheckSuccess
+case MapType(_: DataType, st: StructType, _: Boolean) =>
+  try {
+JacksonUtils.verifySchema(st)
 TypeCheckResult.TypeCheckSuccess
   } catch {
 case e: UnsupportedOperationException =>
   TypeCheckResult.TypeCheckFailure(e.getMessage)
   }
+case _: MapType =>
--- End diff --

Basically I don't like to increase the size of this PR anymore.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137979158
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -677,14 +696,27 @@ case class StructsToJson(
   override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
 case _: StructType | ArrayType(_: StructType, _) =>
   try {
-JacksonUtils.verifySchema(rowSchema)
+JacksonUtils.verifySchema(rowSchema.asInstanceOf[StructType])
+TypeCheckResult.TypeCheckSuccess
+  } catch {
+case e: UnsupportedOperationException =>
+  TypeCheckResult.TypeCheckFailure(e.getMessage)
+  }
+case ArrayType(_: MapType, _) =>
+  TypeCheckResult.TypeCheckSuccess
+case MapType(_: DataType, st: StructType, _: Boolean) =>
+  try {
+JacksonUtils.verifySchema(st)
 TypeCheckResult.TypeCheckSuccess
   } catch {
 case e: UnsupportedOperationException =>
   TypeCheckResult.TypeCheckFailure(e.getMessage)
   }
+case _: MapType =>
--- End diff --

Yeah, it is. Maybe we should revise `JacksonUtils` to be able to verify 
`MapType` directly. But I don't see `JacksonUtils.verifySchema` used in other 
place. We can leave it as TODO if you want to.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread goldmedal

Github user goldmedal commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137979014
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -677,14 +696,27 @@ case class StructsToJson(
   override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
 case _: StructType | ArrayType(_: StructType, _) =>
   try {
-JacksonUtils.verifySchema(rowSchema)
+JacksonUtils.verifySchema(rowSchema.asInstanceOf[StructType])
+TypeCheckResult.TypeCheckSuccess
+  } catch {
+case e: UnsupportedOperationException =>
+  TypeCheckResult.TypeCheckFailure(e.getMessage)
+  }
+case ArrayType(_: MapType, _) =>
+  TypeCheckResult.TypeCheckSuccess
+case MapType(_: DataType, st: StructType, _: Boolean) =>
+  try {
+JacksonUtils.verifySchema(st)
 TypeCheckResult.TypeCheckSuccess
   } catch {
 case e: UnsupportedOperationException =>
   TypeCheckResult.TypeCheckFailure(e.getMessage)
   }
+case _: MapType =>
--- End diff --

oh yes. It's a tricky way but make sense. =D


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread goldmedal

Github user goldmedal commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137978728
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -193,14 +228,32 @@ private[sql] class JacksonGenerator(
*
* @param row The row to convert
*/
-  def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, 
rootFieldWriters))
+  def write(row: InternalRow): Unit = dataType match {
+case st: StructType =>
+  writeObject(writeFields(row, st, rootFieldWriters))
+case _ => throw new UnsupportedOperationException(
--- End diff --

ok got it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137978375
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -677,14 +696,27 @@ case class StructsToJson(
   override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
 case _: StructType | ArrayType(_: StructType, _) =>
   try {
-JacksonUtils.verifySchema(rowSchema)
+JacksonUtils.verifySchema(rowSchema.asInstanceOf[StructType])
+TypeCheckResult.TypeCheckSuccess
+  } catch {
+case e: UnsupportedOperationException =>
+  TypeCheckResult.TypeCheckFailure(e.getMessage)
+  }
+case ArrayType(_: MapType, _) =>
+  TypeCheckResult.TypeCheckSuccess
+case MapType(_: DataType, st: StructType, _: Boolean) =>
+  try {
+JacksonUtils.verifySchema(st)
 TypeCheckResult.TypeCheckSuccess
   } catch {
 case e: UnsupportedOperationException =>
   TypeCheckResult.TypeCheckFailure(e.getMessage)
   }
+case _: MapType =>
--- End diff --

Maybe we can just do something like:

case mt: MapType =>
  val st = StructType(StructField("a", mt) :: Nil)
  JacksonUtils.verifySchema(st)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread goldmedal

Github user goldmedal commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137978096
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -677,14 +696,27 @@ case class StructsToJson(
   override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
 case _: StructType | ArrayType(_: StructType, _) =>
   try {
-JacksonUtils.verifySchema(rowSchema)
+JacksonUtils.verifySchema(rowSchema.asInstanceOf[StructType])
+TypeCheckResult.TypeCheckSuccess
+  } catch {
+case e: UnsupportedOperationException =>
+  TypeCheckResult.TypeCheckFailure(e.getMessage)
+  }
+case ArrayType(_: MapType, _) =>
+  TypeCheckResult.TypeCheckSuccess
+case MapType(_: DataType, st: StructType, _: Boolean) =>
+  try {
+JacksonUtils.verifySchema(st)
 TypeCheckResult.TypeCheckSuccess
   } catch {
 case e: UnsupportedOperationException =>
   TypeCheckResult.TypeCheckFailure(e.getMessage)
   }
+case _: MapType =>
--- End diff --

@viirya But the `JacksonUtils.verifySchema` only verify a `StructType`, so 
we need to add a new one?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread goldmedal

Github user goldmedal commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137977643
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/json/JacksonGeneratorSuite.scala
 ---
@@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.json
+
+import java.io.CharArrayWriter
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, 
DateTimeUtils, GenericArrayData}
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+class JacksonGeneratorSuite extends SparkFunSuite {
+
+  val gmtId = DateTimeUtils.TimeZoneGMT.getID
+  val option = new JSONOptions(Map.empty, gmtId)
+  val writer = new CharArrayWriter()
+  def getAndReset(gen: JacksonGenerator): UTF8String = {
+gen.flush()
+val json = writer.toString
+writer.reset()
+UTF8String.fromString(json)
+  }
+
+  test("initial with StructType and write out a row") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = InternalRow(1)
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+assert(getAndReset(gen) === UTF8String.fromString("""{"a":1}"""))
--- End diff --

@viirya I think we also need to do `gen.flush()`, right? So maybe we can 
keep `getAndReset` and modify it as below. Will it be better?
```
  def getAndReset(gen: JacksonGenerator, writer: Writer): String = {
gen.flush()
writer.toString
  }
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137977638
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -193,14 +228,32 @@ private[sql] class JacksonGenerator(
*
* @param row The row to convert
*/
-  def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, 
rootFieldWriters))
+  def write(row: InternalRow): Unit = dataType match {
+case st: StructType =>
+  writeObject(writeFields(row, st, rootFieldWriters))
+case _ => throw new UnsupportedOperationException(
--- End diff --

`ClassCastExceptoin` may not informative as `UnsupportedOperationException` 
here. I'd prefer to see `UnsupportedOperationException`. Let's keep this 
pattern matching for now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137977826
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/json/JacksonGeneratorSuite.scala
 ---
@@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.json
+
+import java.io.CharArrayWriter
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, 
DateTimeUtils, GenericArrayData}
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+class JacksonGeneratorSuite extends SparkFunSuite {
+
+  val gmtId = DateTimeUtils.TimeZoneGMT.getID
+  val option = new JSONOptions(Map.empty, gmtId)
+  val writer = new CharArrayWriter()
+  def getAndReset(gen: JacksonGenerator): UTF8String = {
+gen.flush()
+val json = writer.toString
+writer.reset()
+UTF8String.fromString(json)
+  }
+
+  test("initial with StructType and write out a row") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = InternalRow(1)
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+assert(getAndReset(gen) === UTF8String.fromString("""{"a":1}"""))
--- End diff --

Yeah, but it's just two lines. Depends on you to have a function or not. At 
least it shouldn't be called `getAndReset` now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137977281
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -193,14 +228,32 @@ private[sql] class JacksonGenerator(
*
* @param row The row to convert
*/
-  def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, 
rootFieldWriters))
+  def write(row: InternalRow): Unit = dataType match {
+case st: StructType =>
+  writeObject(writeFields(row, st, rootFieldWriters))
+case _ => throw new UnsupportedOperationException(
--- End diff --

Do you mean `dataType` now is a map type but `writeFields`'s 2nd parameter 
is a `StructType`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread goldmedal

Github user goldmedal commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137977153
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/json/JacksonGeneratorSuite.scala
 ---
@@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.json
+
+import java.io.CharArrayWriter
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, 
DateTimeUtils, GenericArrayData}
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+class JacksonGeneratorSuite extends SparkFunSuite {
+
+  val gmtId = DateTimeUtils.TimeZoneGMT.getID
+  val option = new JSONOptions(Map.empty, gmtId)
+  val writer = new CharArrayWriter()
+  def getAndReset(gen: JacksonGenerator): UTF8String = {
+gen.flush()
+val json = writer.toString
+writer.reset()
+UTF8String.fromString(json)
+  }
+
+  test("initial with StructType and write out a row") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = InternalRow(1)
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+assert(getAndReset(gen) === UTF8String.fromString("""{"a":1}"""))
--- End diff --

Yes, maybe we can just `writer.toString` and assert it with the normal 
string.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread goldmedal

Github user goldmedal commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137977056
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/json/JacksonGeneratorSuite.scala
 ---
@@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.json
+
+import java.io.CharArrayWriter
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, 
DateTimeUtils, GenericArrayData}
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+class JacksonGeneratorSuite extends SparkFunSuite {
+
+  val gmtId = DateTimeUtils.TimeZoneGMT.getID
+  val option = new JSONOptions(Map.empty, gmtId)
+  val writer = new CharArrayWriter()
+  def getAndReset(gen: JacksonGenerator): UTF8String = {
+gen.flush()
+val json = writer.toString
+writer.reset()
--- End diff --

ok. I got it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread goldmedal

Github user goldmedal commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137976953
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -193,14 +228,32 @@ private[sql] class JacksonGenerator(
*
* @param row The row to convert
*/
-  def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, 
rootFieldWriters))
+  def write(row: InternalRow): Unit = dataType match {
+case st: StructType =>
+  writeObject(writeFields(row, st, rootFieldWriters))
+case _ => throw new UnsupportedOperationException(
--- End diff --

@viirya I found if we don't check it in here, it will throw 
`ClassCastExceptoin` from `writeObject()` firstly. So, maybe we don't need to 
check type in `rootFieldWriters` ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137976262
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -677,14 +696,27 @@ case class StructsToJson(
   override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
 case _: StructType | ArrayType(_: StructType, _) =>
   try {
-JacksonUtils.verifySchema(rowSchema)
+JacksonUtils.verifySchema(rowSchema.asInstanceOf[StructType])
+TypeCheckResult.TypeCheckSuccess
+  } catch {
+case e: UnsupportedOperationException =>
+  TypeCheckResult.TypeCheckFailure(e.getMessage)
+  }
+case ArrayType(_: MapType, _) =>
+  TypeCheckResult.TypeCheckSuccess
+case MapType(_: DataType, st: StructType, _: Boolean) =>
+  try {
+JacksonUtils.verifySchema(st)
 TypeCheckResult.TypeCheckSuccess
   } catch {
 case e: UnsupportedOperationException =>
   TypeCheckResult.TypeCheckFailure(e.getMessage)
   }
+case _: MapType =>
--- End diff --

We need to revise `JacksonUtils` to check map type too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137976065
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -677,14 +696,27 @@ case class StructsToJson(
   override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
 case _: StructType | ArrayType(_: StructType, _) =>
   try {
-JacksonUtils.verifySchema(rowSchema)
+JacksonUtils.verifySchema(rowSchema.asInstanceOf[StructType])
+TypeCheckResult.TypeCheckSuccess
+  } catch {
+case e: UnsupportedOperationException =>
+  TypeCheckResult.TypeCheckFailure(e.getMessage)
+  }
+case ArrayType(_: MapType, _) =>
+  TypeCheckResult.TypeCheckSuccess
+case MapType(_: DataType, st: StructType, _: Boolean) =>
+  try {
+JacksonUtils.verifySchema(st)
 TypeCheckResult.TypeCheckSuccess
   } catch {
 case e: UnsupportedOperationException =>
   TypeCheckResult.TypeCheckFailure(e.getMessage)
   }
+case _: MapType =>
--- End diff --

Not all map types are valid for JSON. We still need verify the type.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread goldmedal

Github user goldmedal commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137975716
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -193,14 +228,32 @@ private[sql] class JacksonGenerator(
*
* @param row The row to convert
*/
-  def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, 
rootFieldWriters))
+  def write(row: InternalRow): Unit = dataType match {
+case st: StructType =>
+  writeObject(writeFields(row, st, rootFieldWriters))
+case _ => throw new UnsupportedOperationException(
--- End diff --

ok. I'll remove it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137975624
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/json/JacksonGeneratorSuite.scala
 ---
@@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.json
+
+import java.io.CharArrayWriter
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, 
DateTimeUtils, GenericArrayData}
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+class JacksonGeneratorSuite extends SparkFunSuite {
+
+  val gmtId = DateTimeUtils.TimeZoneGMT.getID
+  val option = new JSONOptions(Map.empty, gmtId)
+  val writer = new CharArrayWriter()
+  def getAndReset(gen: JacksonGenerator): UTF8String = {
+gen.flush()
+val json = writer.toString
+writer.reset()
--- End diff --

I don't prefer tests depending each others. I think we can just have a 
writer for each test, instead of reusing it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137975414
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/json/JacksonGeneratorSuite.scala
 ---
@@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.json
+
+import java.io.CharArrayWriter
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, 
DateTimeUtils, GenericArrayData}
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+class JacksonGeneratorSuite extends SparkFunSuite {
+
+  val gmtId = DateTimeUtils.TimeZoneGMT.getID
+  val option = new JSONOptions(Map.empty, gmtId)
+  val writer = new CharArrayWriter()
+  def getAndReset(gen: JacksonGenerator): UTF8String = {
+gen.flush()
+val json = writer.toString
+writer.reset()
+UTF8String.fromString(json)
+  }
+
+  test("initial with StructType and write out a row") {
+val dataType = StructType(StructField("a", IntegerType) :: Nil)
+val input = InternalRow(1)
+val gen = new JacksonGenerator(dataType, writer, option)
+gen.write(input)
+assert(getAndReset(gen) === UTF8String.fromString("""{"a":1}"""))
--- End diff --

Can we simply compare the strings? I.e., don't need to do 
`UTF8String.fromString` in `getAndReset` and here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137975214
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -193,14 +228,30 @@ private[sql] class JacksonGenerator(
*
* @param row The row to convert
*/
-  def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, 
rootFieldWriters))
+  def write(row: InternalRow): Unit = dataType match {
+case st: StructType =>
+  writeObject(writeFields(row, st, rootFieldWriters))
+case _ => throw new UnsupportedOperationException(
+  s"`JacksonGenerator` can only be used to write out a row when 
initialized with `StructType`.")
+  }
 
   /**
-   * Transforms multiple `InternalRow`s to JSON array using Jackson
+   * Transforms multiple `InternalRow`s or `MapData`s to JSON array using 
Jackson
*
-   * @param array The array of rows to convert
+   * @param array The array of rows or maps to convert
*/
   def write(array: ArrayData): Unit = writeArray(writeArrayData(array, 
arrElementWriter))
 
+  /**
+   * Transforms a `MapData` to JSON object using Jackson
+   *
+   * @param map a map to convert
+   */
+  def write(map: MapData): Unit = dataType match {
+case mt: MapType => writeObject(writeMapData(map, mt, 
mapElementWriter))
+case _ => throw new UnsupportedOperationException(
--- End diff --

Similar to https://github.com/apache/spark/pull/18875/files#r137975134, we 
can avoid this. Doing pattern matching per writing seems too burden.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137975134
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -193,14 +228,32 @@ private[sql] class JacksonGenerator(
*
* @param row The row to convert
*/
-  def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, 
rootFieldWriters))
+  def write(row: InternalRow): Unit = dataType match {
+case st: StructType =>
+  writeObject(writeFields(row, st, rootFieldWriters))
+case _ => throw new UnsupportedOperationException(
--- End diff --

Rethinking about this, I think we can avoid matching the data type and 
simply do `writeObject(writeFields(row, schema, rootFieldWriters))` like 
before. When accessing `rootFieldWriters`, there is an exception thrown then.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137974036
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -193,14 +228,32 @@ private[sql] class JacksonGenerator(
*
* @param row The row to convert
*/
-  def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, 
rootFieldWriters))
+  def write(row: InternalRow): Unit = dataType match {
+case st: StructType =>
+  writeObject(writeFields(row, st, rootFieldWriters))
+case _ => throw new UnsupportedOperationException(
+  s"`JacksonGenerator` can only be used to write out a row when 
initialized with `StructType`.")
+  }
 
   /**
-   * Transforms multiple `InternalRow`s to JSON array using Jackson
+   * Transforms multiple `InternalRow`s or `MapData`s to JSON array using 
Jackson
*
-   * @param array The array of rows to convert
+   * @param array The array of rows or maps to convert
*/
-  def write(array: ArrayData): Unit = writeArray(writeArrayData(array, 
arrElementWriter))
+  def write(array: ArrayData): Unit = {
+writeArray(writeArrayData(array, arrElementWriter))
+  }
--- End diff --

We don't need to change this line.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137973919
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -26,20 +26,55 @@ import 
org.apache.spark.sql.catalyst.expressions.SpecializedGetters
 import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, 
MapData}
 import org.apache.spark.sql.types._
 
+/**
+ * `JackGenerator` can only be initialized with a `StructType` or a 
`MapType`.
+ * Once it is initialized with `StructType`, it can be used to write out a 
struct or an array of
+ * struct. Once it is initialized with `MapType`, it can be used to write 
out a map. An exception
--- End diff --

`Once it is initialized with ``MapType``, it can be used to write out a map 
or an array of map.`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread goldmedal

Github user goldmedal commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137959771
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -193,14 +228,58 @@ private[sql] class JacksonGenerator(
*
* @param row The row to convert
*/
-  def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, 
rootFieldWriters))
+  def write(row: InternalRow): Unit = dataType match {
+case st: StructType =>
+  writeObject(writeFields(row, st, rootFieldWriters))
+case _ => throw new UnsupportedOperationException(
+  s"`JacksonGenerator` can only be used to write out a row when 
initialized with `StructType`.")
+  }
 
   /**
-   * Transforms multiple `InternalRow`s to JSON array using Jackson
+   * Transforms multiple `InternalRow`s or `MapData`s to JSON array using 
Jackson
*
-   * @param array The array of rows to convert
+   * @param array The array of rows or maps to convert
*/
-  def write(array: ArrayData): Unit = writeArray(writeArrayData(array, 
arrElementWriter))
+  def write(array: ArrayData): Unit = dataType match {
+case st: StructType =>
+  try {
--- End diff --

I think you're right. If the caller has an error calling, it will also 
throw a `ClassCastException` to remind the caller. So we're unnecessary to 
check it.
I'll remove it. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread goldmedal

Github user goldmedal commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137958902
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -193,14 +228,58 @@ private[sql] class JacksonGenerator(
*
* @param row The row to convert
*/
-  def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, 
rootFieldWriters))
+  def write(row: InternalRow): Unit = dataType match {
+case st: StructType =>
+  writeObject(writeFields(row, st, rootFieldWriters))
+case _ => throw new UnsupportedOperationException(
+  s"`JacksonGenerator` can only be used to write out a row when 
initialized with `StructType`.")
+  }
 
   /**
-   * Transforms multiple `InternalRow`s to JSON array using Jackson
+   * Transforms multiple `InternalRow`s or `MapData`s to JSON array using 
Jackson
*
-   * @param array The array of rows to convert
+   * @param array The array of rows or maps to convert
*/
-  def write(array: ArrayData): Unit = writeArray(writeArrayData(array, 
arrElementWriter))
+  def write(array: ArrayData): Unit = dataType match {
+case st: StructType =>
+  try {
+if (array.numElements() > 0) {
+  array.getStruct(0, st.length)
+}
+  } catch {
+case cce: ClassCastException =>
+  throw new UnsupportedOperationException(
+s"`JacksonGenerator` can only be used to write out an array of 
struct " +
+  s"when initialized with `StructType`")
+  }
+  writeArray(writeArrayData(array, arrElementWriter))
+case _: MapType =>
+  try {
+if (array.numElements() > 0) {
+  array.getMap(0)
+}
+  } catch {
+case cce: ClassCastException =>
+  throw new UnsupportedOperationException(
+s"`JacksonGenerator` can only be used to write out an array of 
map when initialized" +
+s"with `MapType`")
+  }
+  writeArray(writeArrayData(array, arrElementWriter))
+case _ => throw new UnsupportedOperationException(
--- End diff --

Yes, you're right. I'll remove this pattern case.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137950179
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -193,14 +228,58 @@ private[sql] class JacksonGenerator(
*
* @param row The row to convert
*/
-  def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, 
rootFieldWriters))
+  def write(row: InternalRow): Unit = dataType match {
+case st: StructType =>
+  writeObject(writeFields(row, st, rootFieldWriters))
+case _ => throw new UnsupportedOperationException(
+  s"`JacksonGenerator` can only be used to write out a row when 
initialized with `StructType`.")
+  }
 
   /**
-   * Transforms multiple `InternalRow`s to JSON array using Jackson
+   * Transforms multiple `InternalRow`s or `MapData`s to JSON array using 
Jackson
*
-   * @param array The array of rows to convert
+   * @param array The array of rows or maps to convert
*/
-  def write(array: ArrayData): Unit = writeArray(writeArrayData(array, 
arrElementWriter))
+  def write(array: ArrayData): Unit = dataType match {
+case st: StructType =>
+  try {
+if (array.numElements() > 0) {
+  array.getStruct(0, st.length)
+}
+  } catch {
+case cce: ClassCastException =>
+  throw new UnsupportedOperationException(
+s"`JacksonGenerator` can only be used to write out an array of 
struct " +
+  s"when initialized with `StructType`")
+  }
+  writeArray(writeArrayData(array, arrElementWriter))
+case _: MapType =>
+  try {
+if (array.numElements() > 0) {
+  array.getMap(0)
+}
+  } catch {
+case cce: ClassCastException =>
+  throw new UnsupportedOperationException(
+s"`JacksonGenerator` can only be used to write out an array of 
map when initialized" +
+s"with `MapType`")
+  }
+  writeArray(writeArrayData(array, arrElementWriter))
+case _ => throw new UnsupportedOperationException(
--- End diff --

I think we already check the `dataType` is valid when constructing this. We 
don't need this pattern case.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137950158
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -193,14 +228,58 @@ private[sql] class JacksonGenerator(
*
* @param row The row to convert
*/
-  def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, 
rootFieldWriters))
+  def write(row: InternalRow): Unit = dataType match {
+case st: StructType =>
+  writeObject(writeFields(row, st, rootFieldWriters))
+case _ => throw new UnsupportedOperationException(
+  s"`JacksonGenerator` can only be used to write out a row when 
initialized with `StructType`.")
+  }
 
   /**
-   * Transforms multiple `InternalRow`s to JSON array using Jackson
+   * Transforms multiple `InternalRow`s or `MapData`s to JSON array using 
Jackson
*
-   * @param array The array of rows to convert
+   * @param array The array of rows or maps to convert
*/
-  def write(array: ArrayData): Unit = writeArray(writeArrayData(array, 
arrElementWriter))
+  def write(array: ArrayData): Unit = dataType match {
+case st: StructType =>
+  try {
+if (array.numElements() > 0) {
+  array.getStruct(0, st.length)
+}
+  } catch {
+case cce: ClassCastException =>
+  throw new UnsupportedOperationException(
+s"`JacksonGenerator` can only be used to write out an array of 
struct " +
+  s"when initialized with `StructType`")
+  }
+  writeArray(writeArrayData(array, arrElementWriter))
+case _: MapType =>
+  try {
--- End diff --

ditto.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-10 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137950153
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -193,14 +228,58 @@ private[sql] class JacksonGenerator(
*
* @param row The row to convert
*/
-  def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, 
rootFieldWriters))
+  def write(row: InternalRow): Unit = dataType match {
+case st: StructType =>
+  writeObject(writeFields(row, st, rootFieldWriters))
+case _ => throw new UnsupportedOperationException(
+  s"`JacksonGenerator` can only be used to write out a row when 
initialized with `StructType`.")
+  }
 
   /**
-   * Transforms multiple `InternalRow`s to JSON array using Jackson
+   * Transforms multiple `InternalRow`s or `MapData`s to JSON array using 
Jackson
*
-   * @param array The array of rows to convert
+   * @param array The array of rows or maps to convert
*/
-  def write(array: ArrayData): Unit = writeArray(writeArrayData(array, 
arrElementWriter))
+  def write(array: ArrayData): Unit = dataType match {
+case st: StructType =>
+  try {
--- End diff --

If we assume that the caller know what it's doing, we can get rid of this 
check.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-07 Thread goldmedal

Github user goldmedal commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137710147
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -26,20 +26,50 @@ import 
org.apache.spark.sql.catalyst.expressions.SpecializedGetters
 import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, 
MapData}
 import org.apache.spark.sql.types._
 
+// `JackGenerator` can only be initialized with a `StructType` or a 
`MapType`.
+// Once it is initialized with `StructType`, it can be used to write out a 
struct or an array of
+// struct. Once it is initialized with `MapType`, it can be used to write 
out a map. An exception
+// will be thrown if trying to write out a struct if it is initialized 
with a `MapType`,
+// and vice verse.
--- End diff --

ok.  I'll modify it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-07 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137706345
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -26,20 +26,50 @@ import 
org.apache.spark.sql.catalyst.expressions.SpecializedGetters
 import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, 
MapData}
 import org.apache.spark.sql.types._
 
+// `JackGenerator` can only be initialized with a `StructType` or a 
`MapType`.
+// Once it is initialized with `StructType`, it can be used to write out a 
struct or an array of
+// struct. Once it is initialized with `MapType`, it can be used to write 
out a map. An exception
+// will be thrown if trying to write out a struct if it is initialized 
with a `MapType`,
+// and vice verse.
--- End diff --

For this kind of comment, we use the style like:

/**
 * Code comments...
 *
 */


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-06 Thread goldmedal

Github user goldmedal commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137288735
  
--- Diff: sql/core/src/test/resources/sql-tests/results/cross-join.sql.out 
---
@@ -128,6 +128,7 @@ two 2   two 2   one 1   two 
2
 two2   two 2   three   3   two 2
 two2   two 2   two 2   two 2
 
+
--- End diff --

ummm. That's so weird. I'm not sure about it but maybe this while line was 
be generated when I regenerated the golden file. I'll recover it. =P


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-06 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137287045
  
--- Diff: sql/core/src/test/resources/sql-tests/results/cross-join.sql.out 
---
@@ -128,6 +128,7 @@ two 2   two 2   one 1   two 
2
 two2   two 2   three   3   two 2
 two2   two 2   two 2   two 2
 
+
--- End diff --

I don't saw you commit change to `cross-join.sql`, why we have change to 
`cross-join.sql.out`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-06 Thread goldmedal

Github user goldmedal commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137285989
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -669,14 +679,25 @@ case class StructsToJson(
   override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
 case _: StructType | ArrayType(_: StructType, _) =>
   try {
-JacksonUtils.verifySchema(rowSchema)
+JacksonUtils.verifySchema(rowSchema.asInstanceOf[StructType])
+TypeCheckResult.TypeCheckSuccess
+  } catch {
+case e: UnsupportedOperationException =>
+  TypeCheckResult.TypeCheckFailure(e.getMessage)
+  }
+case MapType(_: DataType, st: StructType, _: Boolean) =>
+  try {
+JacksonUtils.verifySchema(st)
 TypeCheckResult.TypeCheckSuccess
   } catch {
 case e: UnsupportedOperationException =>
   TypeCheckResult.TypeCheckFailure(e.getMessage)
   }
+case _: MapType =>
+  TypeCheckResult.TypeCheckSuccess
--- End diff --

ok. thanks for review =)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-06 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137284596
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -22,24 +22,49 @@ import java.io.Writer
 import com.fasterxml.jackson.core._
 
 import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
--- End diff --

Don't forget to remove this useless import.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-06 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137284389
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -669,14 +679,25 @@ case class StructsToJson(
   override def checkInputDataTypes(): TypeCheckResult = child.dataType 
match {
 case _: StructType | ArrayType(_: StructType, _) =>
   try {
-JacksonUtils.verifySchema(rowSchema)
+JacksonUtils.verifySchema(rowSchema.asInstanceOf[StructType])
+TypeCheckResult.TypeCheckSuccess
+  } catch {
+case e: UnsupportedOperationException =>
+  TypeCheckResult.TypeCheckFailure(e.getMessage)
+  }
+case MapType(_: DataType, st: StructType, _: Boolean) =>
+  try {
+JacksonUtils.verifySchema(st)
 TypeCheckResult.TypeCheckSuccess
   } catch {
 case e: UnsupportedOperationException =>
   TypeCheckResult.TypeCheckFailure(e.getMessage)
   }
+case _: MapType =>
+  TypeCheckResult.TypeCheckSuccess
--- End diff --

This part of change looks verbose as `MapType` shows in two pattern cases. 
I will look into this part tomorrow.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-06 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137283687
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -22,24 +22,49 @@ import java.io.Writer
 import com.fasterxml.jackson.core._
 
 import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
 import org.apache.spark.sql.catalyst.expressions.SpecializedGetters
 import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, 
MapData}
 import org.apache.spark.sql.types._
 
 private[sql] class JacksonGenerator(
--- End diff --

The change to `JacksonGenerator` needs tests. Looks like we don't have 
existing tests for `JacksonGenerator` too. I think we are better to have one.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-06 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137282534
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -193,14 +218,34 @@ private[sql] class JacksonGenerator(
*
* @param row The row to convert
*/
-  def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, 
rootFieldWriters))
+  def write(row: InternalRow): Unit = dataType match {
+case st: StructType =>
+  writeObject(writeFields(row, st, rootFieldWriters))
+case _ => throw new UnsupportedOperationException(
+  s"this api is only used when `JacksonGenerator` is initialized with 
`StructType`")
+  }
 
   /**
* Transforms multiple `InternalRow`s to JSON array using Jackson
*
* @param array The array of rows to convert
*/
-  def write(array: ArrayData): Unit = writeArray(writeArrayData(array, 
arrElementWriter))
+  def write(array: ArrayData): Unit = dataType match {
+case _: StructType => writeArray(writeArrayData(array, 
arrElementWriter))
+case _ => throw new UnsupportedOperationException(
+  s"this api is only used when `JacksonGenerator` is initialized with 
`StructType`")
--- End diff --

`JacksonGenerator` can only be used to write out an array when initialized 
with `StructType`.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-06 Thread goldmedal

Github user goldmedal commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137282128
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -22,24 +22,49 @@ import java.io.Writer
 import com.fasterxml.jackson.core._
 
 import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
 import org.apache.spark.sql.catalyst.expressions.SpecializedGetters
 import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, 
MapData}
 import org.apache.spark.sql.types._
 
 private[sql] class JacksonGenerator(
-schema: StructType,
+dataType: DataType,
 writer: Writer,
 options: JSONOptions) {
   // A `ValueWriter` is responsible for writing a field of an 
`InternalRow` to appropriate
   // JSON data. Here we are using `SpecializedGetters` rather than 
`InternalRow` so that
   // we can directly access data in `ArrayData` without the help of 
`SpecificMutableRow`.
   private type ValueWriter = (SpecializedGetters, Int) => Unit
 
+  // `JackGenerator` only supports to write out a struct, an array of 
struct or an arbitrary map
--- End diff --

ok. I'll change it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-06 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137282296
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -193,14 +218,34 @@ private[sql] class JacksonGenerator(
*
* @param row The row to convert
*/
-  def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, 
rootFieldWriters))
+  def write(row: InternalRow): Unit = dataType match {
+case st: StructType =>
+  writeObject(writeFields(row, st, rootFieldWriters))
+case _ => throw new UnsupportedOperationException(
+  s"this api is only used when `JacksonGenerator` is initialized with 
`StructType`")
--- End diff --

`JacksonGenerator` can only be used to write out a row when initialized 
with `StructType`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-06 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137281077
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -22,24 +22,49 @@ import java.io.Writer
 import com.fasterxml.jackson.core._
 
 import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
 import org.apache.spark.sql.catalyst.expressions.SpecializedGetters
 import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, 
MapData}
 import org.apache.spark.sql.types._
 
 private[sql] class JacksonGenerator(
--- End diff --

We should add comment for `JacksonGenerator`. E.g.,

`JackGenerator` can only be initialized with a `StructType` or a 
`MapType`.  Once it is initialized with
`StructType`, it can be used to write out a struct or an array of 
struct. Once it is initialized with `MapType`, it can be used to write out a 
map. An exception will be thrown if trying to write out a struct if it is 
initialized with a `MapType`, and vice verse.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-06 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137280095
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -22,24 +22,49 @@ import java.io.Writer
 import com.fasterxml.jackson.core._
 
 import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
 import org.apache.spark.sql.catalyst.expressions.SpecializedGetters
 import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, 
MapData}
 import org.apache.spark.sql.types._
 
 private[sql] class JacksonGenerator(
-schema: StructType,
+dataType: DataType,
 writer: Writer,
 options: JSONOptions) {
   // A `ValueWriter` is responsible for writing a field of an 
`InternalRow` to appropriate
   // JSON data. Here we are using `SpecializedGetters` rather than 
`InternalRow` so that
   // we can directly access data in `ArrayData` without the help of 
`SpecificMutableRow`.
   private type ValueWriter = (SpecializedGetters, Int) => Unit
 
+  // `JackGenerator` only supports to write out a struct, an array of 
struct or an arbitrary map
--- End diff --

`JackGenerator` can only be initialized with a StructType or a MapType. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-06 Thread goldmedal

Github user goldmedal commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137273266
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -22,24 +22,50 @@ import java.io.Writer
 import com.fasterxml.jackson.core._
 
 import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
 import org.apache.spark.sql.catalyst.expressions.SpecializedGetters
 import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, 
MapData}
 import org.apache.spark.sql.types._
 
 private[sql] class JacksonGenerator(
-schema: StructType,
+dataType: DataType,
 writer: Writer,
 options: JSONOptions) {
   // A `ValueWriter` is responsible for writing a field of an 
`InternalRow` to appropriate
   // JSON data. Here we are using `SpecializedGetters` rather than 
`InternalRow` so that
   // we can directly access data in `ArrayData` without the help of 
`SpecificMutableRow`.
   private type ValueWriter = (SpecializedGetters, Int) => Unit
 
+  // `JackGenerator` only supports to write out a struct, an array of 
struct or an arbitrary map
+  dataType match {
+case _: StructType | _: MapType =>
+  TypeCheckResult.TypeCheckSuccess
--- End diff --

ok


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-06 Thread goldmedal

Github user goldmedal commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137269586
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -127,7 +153,7 @@ private[sql] class JacksonGenerator(
   (row: SpecializedGetters, ordinal: Int) =>
 val v = row.get(ordinal, dataType)
 sys.error(s"Failed to convert value $v (class of ${v.getClass}}) " 
+
-  s"with the type of $dataType to JSON.")
+s"with the type of $dataType to JSON.")
--- End diff --

ummm.. I think it maybe modified by IDE automatically.  I'll recover it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-06 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137268446
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -127,7 +153,7 @@ private[sql] class JacksonGenerator(
   (row: SpecializedGetters, ordinal: Int) =>
 val v = row.get(ordinal, dataType)
 sys.error(s"Failed to convert value $v (class of ${v.getClass}}) " 
+
-  s"with the type of $dataType to JSON.")
+s"with the type of $dataType to JSON.")
--- End diff --

Why change this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-06 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137268116
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -22,24 +22,50 @@ import java.io.Writer
 import com.fasterxml.jackson.core._
 
 import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
 import org.apache.spark.sql.catalyst.expressions.SpecializedGetters
 import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, 
MapData}
 import org.apache.spark.sql.types._
 
 private[sql] class JacksonGenerator(
-schema: StructType,
+dataType: DataType,
 writer: Writer,
 options: JSONOptions) {
   // A `ValueWriter` is responsible for writing a field of an 
`InternalRow` to appropriate
   // JSON data. Here we are using `SpecializedGetters` rather than 
`InternalRow` so that
   // we can directly access data in `ArrayData` without the help of 
`SpecificMutableRow`.
   private type ValueWriter = (SpecializedGetters, Int) => Unit
 
+  // `JackGenerator` only supports to write out a struct, an array of 
struct or an arbitrary map
+  dataType match {
+case _: StructType | _: MapType =>
+  TypeCheckResult.TypeCheckSuccess
--- End diff --

We don't need `TypeCheckResult.TypeCheckSuccess`. Just no-op.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-06 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137227950
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -193,14 +213,27 @@ private[sql] class JacksonGenerator(
*
* @param row The row to convert
*/
-  def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, 
rootFieldWriters))
+  def write(row: InternalRow): Unit = {
+writeObject(writeFields(row, rowSchema.asInstanceOf[StructType], 
rootFieldWriters))
+  }
 
   /**
* Transforms multiple `InternalRow`s to JSON array using Jackson
*
* @param array The array of rows to convert
*/
-  def write(array: ArrayData): Unit = writeArray(writeArrayData(array, 
arrElementWriter))
+  def write(array: ArrayData): Unit = {
+writeArray(writeArrayData(array, arrElementWriter))
+  }
+
+  /**
+   * Transforms a `MapData` to JSON object using Jackson
+   *
+   * @param map a map to convert
+   */
+  def write(map: MapData): Unit = {
+writeObject(writeMapData(map, rowSchema.asInstanceOf[MapType], 
mapElementWriter))
--- End diff --

ditto.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-06 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137227920
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -193,14 +213,27 @@ private[sql] class JacksonGenerator(
*
* @param row The row to convert
*/
-  def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, 
rootFieldWriters))
+  def write(row: InternalRow): Unit = {
+writeObject(writeFields(row, rowSchema.asInstanceOf[StructType], 
rootFieldWriters))
+  }
 
   /**
* Transforms multiple `InternalRow`s to JSON array using Jackson
*
* @param array The array of rows to convert
*/
-  def write(array: ArrayData): Unit = writeArray(writeArrayData(array, 
arrElementWriter))
+  def write(array: ArrayData): Unit = {
+writeArray(writeArrayData(array, arrElementWriter))
--- End diff --

ditto.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-06 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137227883
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -193,14 +213,27 @@ private[sql] class JacksonGenerator(
*
* @param row The row to convert
*/
-  def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, 
rootFieldWriters))
+  def write(row: InternalRow): Unit = {
+writeObject(writeFields(row, rowSchema.asInstanceOf[StructType], 
rootFieldWriters))
--- End diff --

We should match `dataType` here too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-06 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137227592
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -22,24 +22,44 @@ import java.io.Writer
 import com.fasterxml.jackson.core._
 
 import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
 import org.apache.spark.sql.catalyst.expressions.SpecializedGetters
 import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, 
MapData}
 import org.apache.spark.sql.types._
 
 private[sql] class JacksonGenerator(
-schema: StructType,
+rowSchema: DataType,
 writer: Writer,
 options: JSONOptions) {
+
   // A `ValueWriter` is responsible for writing a field of an 
`InternalRow` to appropriate
   // JSON data. Here we are using `SpecializedGetters` rather than 
`InternalRow` so that
   // we can directly access data in `ArrayData` without the help of 
`SpecificMutableRow`.
   private type ValueWriter = (SpecializedGetters, Int) => Unit
 
+  // `JackGenerator` only supports to write out a struct, an array of 
struct or an arbitrary map
+  rowSchema match {
+case _: StructType | _: MapType =>
+  TypeCheckResult.TypeCheckSuccess
+case _ => TypeCheckResult.TypeCheckFailure(
+  s"Input type ${rowSchema.simpleString} must be a struct or a map")
+  }
+
   // `ValueWriter`s for all fields of the schema
-  private val rootFieldWriters: Array[ValueWriter] = 
schema.map(_.dataType).map(makeWriter).toArray
+  private lazy val rootFieldWriters: Array[ValueWriter] = {
+
rowSchema.asInstanceOf[StructType].map(_.dataType).map(makeWriter).toArray
+  }
+
   // `ValueWriter` for array data storing rows of the schema.
-  private val arrElementWriter: ValueWriter = (arr: SpecializedGetters, i: 
Int) => {
-writeObject(writeFields(arr.getStruct(i, schema.length), schema, 
rootFieldWriters))
+  private lazy val arrElementWriter: ValueWriter = {
--- End diff --

ditto.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-06 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137227713
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -22,24 +22,44 @@ import java.io.Writer
 import com.fasterxml.jackson.core._
 
 import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
 import org.apache.spark.sql.catalyst.expressions.SpecializedGetters
 import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, 
MapData}
 import org.apache.spark.sql.types._
 
 private[sql] class JacksonGenerator(
-schema: StructType,
+rowSchema: DataType,
 writer: Writer,
 options: JSONOptions) {
+
   // A `ValueWriter` is responsible for writing a field of an 
`InternalRow` to appropriate
   // JSON data. Here we are using `SpecializedGetters` rather than 
`InternalRow` so that
   // we can directly access data in `ArrayData` without the help of 
`SpecificMutableRow`.
   private type ValueWriter = (SpecializedGetters, Int) => Unit
 
+  // `JackGenerator` only supports to write out a struct, an array of 
struct or an arbitrary map
+  rowSchema match {
+case _: StructType | _: MapType =>
+  TypeCheckResult.TypeCheckSuccess
+case _ => TypeCheckResult.TypeCheckFailure(
+  s"Input type ${rowSchema.simpleString} must be a struct or a map")
+  }
+
   // `ValueWriter`s for all fields of the schema
-  private val rootFieldWriters: Array[ValueWriter] = 
schema.map(_.dataType).map(makeWriter).toArray
+  private lazy val rootFieldWriters: Array[ValueWriter] = {
+
rowSchema.asInstanceOf[StructType].map(_.dataType).map(makeWriter).toArray
+  }
+
   // `ValueWriter` for array data storing rows of the schema.
-  private val arrElementWriter: ValueWriter = (arr: SpecializedGetters, i: 
Int) => {
-writeObject(writeFields(arr.getStruct(i, schema.length), schema, 
rootFieldWriters))
+  private lazy val arrElementWriter: ValueWriter = {
+(arr: SpecializedGetters, i: Int) => {
+  val schema: StructType = rowSchema.asInstanceOf[StructType]
+  writeObject(writeFields(arr.getStruct(i, schema.length), schema, 
rootFieldWriters))
+}
+  }
+
+  private lazy val mapElementWriter: ValueWriter = {
+makeWriter(rowSchema.asInstanceOf[MapType].valueType)
--- End diff --

ditto.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-06 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137227541
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -22,24 +22,44 @@ import java.io.Writer
 import com.fasterxml.jackson.core._
 
 import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
 import org.apache.spark.sql.catalyst.expressions.SpecializedGetters
 import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, 
MapData}
 import org.apache.spark.sql.types._
 
 private[sql] class JacksonGenerator(
-schema: StructType,
+rowSchema: DataType,
 writer: Writer,
 options: JSONOptions) {
+
   // A `ValueWriter` is responsible for writing a field of an 
`InternalRow` to appropriate
   // JSON data. Here we are using `SpecializedGetters` rather than 
`InternalRow` so that
   // we can directly access data in `ArrayData` without the help of 
`SpecificMutableRow`.
   private type ValueWriter = (SpecializedGetters, Int) => Unit
 
+  // `JackGenerator` only supports to write out a struct, an array of 
struct or an arbitrary map
+  rowSchema match {
+case _: StructType | _: MapType =>
+  TypeCheckResult.TypeCheckSuccess
+case _ => TypeCheckResult.TypeCheckFailure(
+  s"Input type ${rowSchema.simpleString} must be a struct or a map")
+  }
+
   // `ValueWriter`s for all fields of the schema
-  private val rootFieldWriters: Array[ValueWriter] = 
schema.map(_.dataType).map(makeWriter).toArray
+  private lazy val rootFieldWriters: Array[ValueWriter] = {
+
rowSchema.asInstanceOf[StructType].map(_.dataType).map(makeWriter).toArray
--- End diff --

We should take care when accessing `rootFieldWriters` if the given data 
type is a map type. E.g.:

private lazy val rootFieldWriters: Array[ValueWriter] = dataType match {
  case st: StructType => st.map(_.dataType).map(makeWriter).toArray
  case mt: MapType => throw UnsupportedOperationException("...")
}




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-06 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137226892
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -22,24 +22,44 @@ import java.io.Writer
 import com.fasterxml.jackson.core._
 
 import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
 import org.apache.spark.sql.catalyst.expressions.SpecializedGetters
 import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, 
MapData}
 import org.apache.spark.sql.types._
 
 private[sql] class JacksonGenerator(
-schema: StructType,
+rowSchema: DataType,
 writer: Writer,
 options: JSONOptions) {
+
   // A `ValueWriter` is responsible for writing a field of an 
`InternalRow` to appropriate
   // JSON data. Here we are using `SpecializedGetters` rather than 
`InternalRow` so that
   // we can directly access data in `ArrayData` without the help of 
`SpecificMutableRow`.
   private type ValueWriter = (SpecializedGetters, Int) => Unit
 
+  // `JackGenerator` only supports to write out a struct, an array of 
struct or an arbitrary map
+  rowSchema match {
+case _: StructType | _: MapType =>
+  TypeCheckResult.TypeCheckSuccess
+case _ => TypeCheckResult.TypeCheckFailure(
+  s"Input type ${rowSchema.simpleString} must be a struct or a map")
--- End diff --

`TypeCheckResult` actually doesn't prevent you to construct this 
`JacksonGenerator`.  We can throw an `UnsupportedOperationException` for 
unsupported types.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-06 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137226390
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -22,24 +22,44 @@ import java.io.Writer
 import com.fasterxml.jackson.core._
 
 import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
 import org.apache.spark.sql.catalyst.expressions.SpecializedGetters
 import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, 
MapData}
 import org.apache.spark.sql.types._
 
 private[sql] class JacksonGenerator(
-schema: StructType,
+rowSchema: DataType,
--- End diff --

Because it now can be a map type and not a schema, may be just name it as 
`dataType: DataType`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-06 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137226222
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -22,24 +22,44 @@ import java.io.Writer
 import com.fasterxml.jackson.core._
 
 import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
 import org.apache.spark.sql.catalyst.expressions.SpecializedGetters
 import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, 
MapData}
 import org.apache.spark.sql.types._
 
 private[sql] class JacksonGenerator(
-schema: StructType,
+rowSchema: DataType,
 writer: Writer,
 options: JSONOptions) {
+
--- End diff --

Nit: We don't need to add this extra space.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-06 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137225157
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -640,6 +644,7 @@ case class StructsToJson(
   lazy val rowSchema = child.dataType match {
 case st: StructType => st
 case ArrayType(st: StructType, _) => st
+case MapType(_: DataType, _: DataType, _) => child.dataType
--- End diff --

We can add a todo: `//TODO: support ArrayType(MapType)`, if we don't plan 
to add the support in this PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-06 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137224911
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -640,6 +644,7 @@ case class StructsToJson(
   lazy val rowSchema = child.dataType match {
 case st: StructType => st
 case ArrayType(st: StructType, _) => st
+case MapType(_: DataType, _: DataType, _) => child.dataType
--- End diff --

case mt: MapType => mt


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-06 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137224595
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala
 ---
@@ -661,6 +666,10 @@ case class StructsToJson(
 (arr: Any) =>
   gen.write(arr.asInstanceOf[ArrayData])
   getAndReset()
+  case MapType(_: DataType, _: DataType, _: Boolean) =>
--- End diff --

For arbitrary map, we can simply write `case _: MapType => `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-05 Thread goldmedal

Github user goldmedal commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137048876
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -27,21 +27,45 @@ import org.apache.spark.sql.catalyst.util.{ArrayData, 
DateTimeUtils, MapData}
 import org.apache.spark.sql.types._
 
 private[sql] class JacksonGenerator(
-schema: StructType,
+childType: DataType,
+rowSchema: StructType,
--- End diff --

thanks for review =). I will follow your suggestion to change it. 
But I think `JacksonGenerator` only support write out an arbitrary map, it 
doesn't support to write out an array of map yet. Should I need to fix it? I 
think that maybe an issue for supporting arbitrary array?
Should I need to do some check for API `write(row: InternalRow)` calling?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-05 Thread goldmedal

Github user goldmedal commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r137044817
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -27,21 +27,45 @@ import org.apache.spark.sql.catalyst.util.{ArrayData, 
DateTimeUtils, MapData}
 import org.apache.spark.sql.types._
 
 private[sql] class JacksonGenerator(
-schema: StructType,
+childType: DataType,
--- End diff --

ok.  I recover it and use `rowSchema` to take what `MapType` needs.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-05 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r136997720
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -27,21 +27,45 @@ import org.apache.spark.sql.catalyst.util.{ArrayData, 
DateTimeUtils, MapData}
 import org.apache.spark.sql.types._
 
 private[sql] class JacksonGenerator(
-schema: StructType,
+childType: DataType,
+rowSchema: StructType,
--- End diff --

That's said, for now `JacksonGenerator` supports to write out a struct or 
an array of struct. After revising, it can also support to write out map, array 
of map.

Once a `JacksonGenerator` is initialized with a `MapType`, it is not valid 
to call the API `write(row: InternalRow)`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-05 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r136996807
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -27,21 +27,45 @@ import org.apache.spark.sql.catalyst.util.{ArrayData, 
DateTimeUtils, MapData}
 import org.apache.spark.sql.types._
 
 private[sql] class JacksonGenerator(
-schema: StructType,
+childType: DataType,
+rowSchema: StructType,
--- End diff --

We can loose the type of `rowSchema` from `StructType` to general 
`DataType`. Since only `StructType` and `MapType` are valid options, we can 
check this in `JacksonGenerator`'s constructor.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-05 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r136995614
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -27,21 +27,45 @@ import org.apache.spark.sql.catalyst.util.{ArrayData, 
DateTimeUtils, MapData}
 import org.apache.spark.sql.types._
 
 private[sql] class JacksonGenerator(
-schema: StructType,
+childType: DataType,
--- End diff --

This parameter looks weird to me. We don't want to tightly link 
`JacksonGenerator` to what this PR did.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-08-22 Thread goldmedal

Github user goldmedal commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r134469528
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -202,5 +203,9 @@ private[sql] class JacksonGenerator(
*/
   def write(array: ArrayData): Unit = writeArray(writeArrayData(array, 
arrElementWriter))
 
+  def write(map: MapData, mapType: MapType): Unit = {
--- End diff --

sorry to reply too late.
I thought that others `write` function don't take any type because 
`JacksonGenerator` has a `schema` member. Because `writeMapData` needs 
`mapType`, I made `write` function of `MapType` take it. 
I try to fix this problem by making `childType` matching move to 
`JacksonGenerator` in new commits a254f89 but I'm not sure if this way is 
better.
@HyukjinKwon @viirya How do you thinks about this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 >

1 - 100 of 128 matches

Mail list logo