[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18875 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user goldmedal commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r138501335 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -26,20 +26,50 @@ import org.apache.spark.sql.catalyst.expressions.SpecializedGetters import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, MapData} import org.apache.spark.sql.types._ +/** + * `JackGenerator` can only be initialized with a `StructType` or a `MapType`. + * Once it is initialized with `StructType`, it can be used to write out a struct or an array of + * struct. Once it is initialized with `MapType`, it can be used to write out a map or an array + * of map. An exception will be thrown if trying to write out a struct if it is initialized with + * a `MapType`, and vice verse. + */ private[sql] class JacksonGenerator( -schema: StructType, +dataType: DataType, writer: Writer, options: JSONOptions) { // A `ValueWriter` is responsible for writing a field of an `InternalRow` to appropriate // JSON data. Here we are using `SpecializedGetters` rather than `InternalRow` so that // we can directly access data in `ArrayData` without the help of `SpecificMutableRow`. private type ValueWriter = (SpecializedGetters, Int) => Unit + // `JackGenerator` can only be initialized with a `StructType` or a `MapType`. + require(dataType.isInstanceOf[StructType] | dataType.isInstanceOf[MapType], --- End diff -- oh. Yes, you're right. This is my mistake. :( --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r138500447 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -26,20 +26,50 @@ import org.apache.spark.sql.catalyst.expressions.SpecializedGetters import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, MapData} import org.apache.spark.sql.types._ +/** + * `JackGenerator` can only be initialized with a `StructType` or a `MapType`. + * Once it is initialized with `StructType`, it can be used to write out a struct or an array of + * struct. Once it is initialized with `MapType`, it can be used to write out a map or an array + * of map. An exception will be thrown if trying to write out a struct if it is initialized with + * a `MapType`, and vice verse. + */ private[sql] class JacksonGenerator( -schema: StructType, +dataType: DataType, writer: Writer, options: JSONOptions) { // A `ValueWriter` is responsible for writing a field of an `InternalRow` to appropriate // JSON data. Here we are using `SpecializedGetters` rather than `InternalRow` so that // we can directly access data in `ArrayData` without the help of `SpecificMutableRow`. private type ValueWriter = (SpecializedGetters, Int) => Unit + // `JackGenerator` can only be initialized with a `StructType` or a `MapType`. + require(dataType.isInstanceOf[StructType] | dataType.isInstanceOf[MapType], --- End diff -- Hm.. I think we should use `||` for short circuiting next time. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r138500520 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -617,6 +618,14 @@ case class JsonToStructs( {"time":"26/08/2015"} > SELECT _FUNC_(array(named_struct('a', 1, 'b', 2)); [{"a":1,"b":2}] + > SELECT _FUNC_(map('a',named_struct('b',1))); --- End diff -- little nit: `... ',n ...` -> `... ', n ...` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user goldmedal commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r138237482 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -193,14 +223,35 @@ private[sql] class JacksonGenerator( * * @param row The row to convert */ - def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, rootFieldWriters)) + def write(row: InternalRow): Unit = { +writeObject(writeFields( + fieldWriters = rootFieldWriters, + row = row, + schema = dataType.asInstanceOf[StructType])) + } + + + /** + * Transforms multiple `InternalRow`s or `MapData`s to JSON array using Jackson + * + * @param array The array of rows or maps to convert + */ + def write(array: ArrayData): Unit = writeArray(writeArrayData( +fieldWriter = arrElementWriter, +array = array + )) /** - * Transforms multiple `InternalRow`s to JSON array using Jackson + * Transforms a single `MapData` to JSON object using Jackson * - * @param array The array of rows to convert + * @param map a map to convert */ - def write(array: ArrayData): Unit = writeArray(writeArrayData(array, arrElementWriter)) + def write(map: MapData): Unit = { +writeObject(writeMapData( + fieldWriter = mapElementWriter, --- End diff -- Ok Thanks for review :) I'll update it tonight. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r138233095 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/json/JacksonGeneratorSuite.scala --- @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.json + +import java.io.CharArrayWriter + +import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, DateTimeUtils, GenericArrayData} +import org.apache.spark.sql.types._ + +class JacksonGeneratorSuite extends SparkFunSuite { + + val gmtId = DateTimeUtils.TimeZoneGMT.getID + val option = new JSONOptions(Map.empty, gmtId) + + test("initial with StructType and write out a row") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = InternalRow(1) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +gen.flush() +assert(writer.toString === """{"a":1}""") + } + + test("initial with StructType and write out rows") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = new GenericArrayData(InternalRow(1) :: InternalRow(2) :: Nil) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +gen.flush() +assert(writer.toString === """[{"a":1},{"a":2}]""") + } + + test("initial with StructType and write out an array with single empty row") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = new GenericArrayData(InternalRow(null) :: Nil) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +gen.flush() +assert(writer.toString === """[{}]""") + } + + test("initial with StructType and write out an empty array") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = new GenericArrayData(Nil) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +gen.flush() +assert(writer.toString === """[]""") + } + + test("initial with Map and write out a map data") { +val dataType = MapType(StringType, IntegerType) +val input = ArrayBasedMapData(Map("a" -> 1)) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +gen.flush() +assert(writer.toString === """{"a":1}""") + } + + test("initial with Map and write out an array of maps") { +val dataType = MapType(StringType, IntegerType) +val input = new GenericArrayData( + ArrayBasedMapData(Map("a" -> 1)) :: ArrayBasedMapData(Map("b" -> 2)) :: Nil) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +gen.flush() +assert(writer.toString === """[{"a":1},{"b":2}]""") + } + + test("error handling: initial with StructType but error calling write a map") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = ArrayBasedMapData(Map("a" -> 1)) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +intercept[Exception] { + gen.write(input) +} + } + + test("error handling: initial with StructType but error calling write an array of maps") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = new GenericArrayData( + ArrayBasedMapData(Map("a" -> 1)) :: ArrayBasedMapData(Map("b" -> 2)) :: Nil) +val writer = new
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r138232491 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -193,14 +223,35 @@ private[sql] class JacksonGenerator( * * @param row The row to convert */ - def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, rootFieldWriters)) + def write(row: InternalRow): Unit = { +writeObject(writeFields( + fieldWriters = rootFieldWriters, + row = row, + schema = dataType.asInstanceOf[StructType])) + } + + + /** + * Transforms multiple `InternalRow`s or `MapData`s to JSON array using Jackson + * + * @param array The array of rows or maps to convert + */ + def write(array: ArrayData): Unit = writeArray(writeArrayData( +fieldWriter = arrElementWriter, +array = array + )) /** - * Transforms multiple `InternalRow`s to JSON array using Jackson + * Transforms a single `MapData` to JSON object using Jackson * - * @param array The array of rows to convert + * @param map a map to convert */ - def write(array: ArrayData): Unit = writeArray(writeArrayData(array, arrElementWriter)) + def write(map: MapData): Unit = { +writeObject(writeMapData( + fieldWriter = mapElementWriter, --- End diff -- And also add a similar comment in `write(row: InternalRow)`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r138228117 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -193,14 +223,35 @@ private[sql] class JacksonGenerator( * * @param row The row to convert */ - def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, rootFieldWriters)) + def write(row: InternalRow): Unit = { +writeObject(writeFields( + fieldWriters = rootFieldWriters, + row = row, + schema = dataType.asInstanceOf[StructType])) + } + + + /** + * Transforms multiple `InternalRow`s or `MapData`s to JSON array using Jackson + * + * @param array The array of rows or maps to convert + */ + def write(array: ArrayData): Unit = writeArray(writeArrayData( +fieldWriter = arrElementWriter, +array = array + )) --- End diff -- Let's change this one back to: ```scala def write(array: ArrayData): Unit = writeArray(writeArrayData(array, arrElementWriter)) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r138228185 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -193,14 +223,35 @@ private[sql] class JacksonGenerator( * * @param row The row to convert */ - def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, rootFieldWriters)) + def write(row: InternalRow): Unit = { +writeObject(writeFields( + fieldWriters = rootFieldWriters, + row = row, + schema = dataType.asInstanceOf[StructType])) + } + + + /** + * Transforms multiple `InternalRow`s or `MapData`s to JSON array using Jackson + * + * @param array The array of rows or maps to convert + */ + def write(array: ArrayData): Unit = writeArray(writeArrayData( +fieldWriter = arrElementWriter, +array = array + )) /** - * Transforms multiple `InternalRow`s to JSON array using Jackson + * Transforms a single `MapData` to JSON object using Jackson * - * @param array The array of rows to convert + * @param map a map to convert */ - def write(array: ArrayData): Unit = writeArray(writeArrayData(array, arrElementWriter)) + def write(map: MapData): Unit = { +writeObject(writeMapData( + fieldWriter = mapElementWriter, --- End diff -- Let's add a small comment saying this actually triggers the proper validation. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r138228111 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -193,14 +223,35 @@ private[sql] class JacksonGenerator( * * @param row The row to convert */ - def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, rootFieldWriters)) + def write(row: InternalRow): Unit = { +writeObject(writeFields( + fieldWriters = rootFieldWriters, + row = row, + schema = dataType.asInstanceOf[StructType])) + } + + + /** + * Transforms multiple `InternalRow`s or `MapData`s to JSON array using Jackson + * + * @param array The array of rows or maps to convert + */ + def write(array: ArrayData): Unit = writeArray(writeArrayData( +fieldWriter = arrElementWriter, +array = array + )) --- End diff -- Let's change this one back to: ```scala def write(array: ArrayData): Unit = writeArray(writeArrayData(array, arrElementWriter)) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user goldmedal commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r138110523 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/json/JacksonGeneratorSuite.scala --- @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.json + +import java.io.CharArrayWriter + +import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, DateTimeUtils, GenericArrayData} +import org.apache.spark.sql.types._ + +class JacksonGeneratorSuite extends SparkFunSuite { + + val gmtId = DateTimeUtils.TimeZoneGMT.getID + val option = new JSONOptions(Map.empty, gmtId) + + test("initial with StructType and write out a row") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = InternalRow(1) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +gen.flush() +assert(writer.toString === """{"a":1}""") + } + + test("initial with StructType and write out rows") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = new GenericArrayData(InternalRow(1) :: InternalRow(2) :: Nil) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +gen.flush() +assert(writer.toString === """[{"a":1},{"a":2}]""") + } + + test("initial with StructType and write out an array with single empty row") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = new GenericArrayData(InternalRow(null) :: Nil) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +gen.flush() +assert(writer.toString === """[{}]""") + } + + test("initial with StructType and write out an empty array") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = new GenericArrayData(Nil) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +gen.flush() +assert(writer.toString === """[]""") + } + + test("initial with Map and write out a map data") { +val dataType = MapType(StringType, IntegerType) +val input = ArrayBasedMapData(Map("a" -> 1)) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +gen.flush() +assert(writer.toString === """{"a":1}""") + } + + test("initial with Map and write out an array of maps") { +val dataType = MapType(StringType, IntegerType) +val input = new GenericArrayData( + ArrayBasedMapData(Map("a" -> 1)) :: ArrayBasedMapData(Map("b" -> 2)) :: Nil) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +gen.flush() +assert(writer.toString === """[{"a":1},{"b":2}]""") + } + + test("error handling: initial with StructType but error calling write a map") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = ArrayBasedMapData(Map("a" -> 1)) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +intercept[Exception] { + gen.write(input) +} + } + + test("error handling: initial with StructType but error calling write an array of maps") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = new GenericArrayData( + ArrayBasedMapData(Map("a" -> 1)) :: ArrayBasedMapData(Map("b" -> 2)) :: Nil) +val writer = new
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r138107641 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/json/JacksonGeneratorSuite.scala --- @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.json + +import java.io.CharArrayWriter + +import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, DateTimeUtils, GenericArrayData} +import org.apache.spark.sql.types._ + +class JacksonGeneratorSuite extends SparkFunSuite { + + val gmtId = DateTimeUtils.TimeZoneGMT.getID + val option = new JSONOptions(Map.empty, gmtId) + + test("initial with StructType and write out a row") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = InternalRow(1) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +gen.flush() +assert(writer.toString === """{"a":1}""") + } + + test("initial with StructType and write out rows") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = new GenericArrayData(InternalRow(1) :: InternalRow(2) :: Nil) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +gen.flush() +assert(writer.toString === """[{"a":1},{"a":2}]""") + } + + test("initial with StructType and write out an array with single empty row") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = new GenericArrayData(InternalRow(null) :: Nil) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +gen.flush() +assert(writer.toString === """[{}]""") + } + + test("initial with StructType and write out an empty array") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = new GenericArrayData(Nil) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +gen.flush() +assert(writer.toString === """[]""") + } + + test("initial with Map and write out a map data") { +val dataType = MapType(StringType, IntegerType) +val input = ArrayBasedMapData(Map("a" -> 1)) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +gen.flush() +assert(writer.toString === """{"a":1}""") + } + + test("initial with Map and write out an array of maps") { +val dataType = MapType(StringType, IntegerType) +val input = new GenericArrayData( + ArrayBasedMapData(Map("a" -> 1)) :: ArrayBasedMapData(Map("b" -> 2)) :: Nil) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +gen.flush() +assert(writer.toString === """[{"a":1},{"b":2}]""") + } + + test("error handling: initial with StructType but error calling write a map") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = ArrayBasedMapData(Map("a" -> 1)) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +intercept[Exception] { + gen.write(input) +} + } + + test("error handling: initial with StructType but error calling write an array of maps") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = new GenericArrayData( + ArrayBasedMapData(Map("a" -> 1)) :: ArrayBasedMapData(Map("b" -> 2)) :: Nil) +val writer = new
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r138104414 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala --- @@ -180,10 +180,30 @@ class JsonFunctionsSuite extends QueryTest with SharedSQLContext { test("to_json - array") { val df = Seq(Tuple1(Tuple1(1) :: Nil)).toDF("a") +val df2 = Seq(Tuple1(Map("a" -> 1) :: Nil)).toDF("a") checkAnswer( df.select(to_json($"a")), Row("""[{"_1":1}]""") :: Nil) +checkAnswer( + df2.select(to_json($"a")), + Row("""[{"a":1}]""") :: Nil) + } + + test("to_json - map") { +val df1 = Seq(Map("a" -> Tuple1(1))).toDF("a") +val df2 = Seq(Map(Tuple1(1) -> Tuple1(1))).toDF("a") +val df3 = Seq(Map("a" -> 1)).toDF("a") + +checkAnswer( + df1.select(to_json($"a")), + Row("""{"a":{"_1":1}}""") :: Nil) +checkAnswer( + df2.select(to_json($"a")), + Row("""{"[0,1]":{"_1":1}}""") :: Nil) --- End diff -- Yeah, I agree to remove it for now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r138103265 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala --- @@ -180,10 +180,30 @@ class JsonFunctionsSuite extends QueryTest with SharedSQLContext { test("to_json - array") { val df = Seq(Tuple1(Tuple1(1) :: Nil)).toDF("a") +val df2 = Seq(Tuple1(Map("a" -> 1) :: Nil)).toDF("a") checkAnswer( df.select(to_json($"a")), Row("""[{"_1":1}]""") :: Nil) +checkAnswer( + df2.select(to_json($"a")), + Row("""[{"a":1}]""") :: Nil) + } + + test("to_json - map") { +val df1 = Seq(Map("a" -> Tuple1(1))).toDF("a") +val df2 = Seq(Map(Tuple1(1) -> Tuple1(1))).toDF("a") +val df3 = Seq(Map("a" -> 1)).toDF("a") + +checkAnswer( + df1.select(to_json($"a")), + Row("""{"a":{"_1":1}}""") :: Nil) +checkAnswer( + df2.select(to_json($"a")), + Row("""{"[0,1]":{"_1":1}}""") :: Nil) --- End diff -- I get your point and I know that also makes sense. How about removing this case out away? I think we might have to discuss more if this is a bug or not but want to rather take out what we are not sure of for now (as it is strictly not related with this PR, if I understood correctly). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r138102302 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala --- @@ -180,10 +180,30 @@ class JsonFunctionsSuite extends QueryTest with SharedSQLContext { test("to_json - array") { val df = Seq(Tuple1(Tuple1(1) :: Nil)).toDF("a") +val df2 = Seq(Tuple1(Map("a" -> 1) :: Nil)).toDF("a") checkAnswer( df.select(to_json($"a")), Row("""[{"_1":1}]""") :: Nil) +checkAnswer( + df2.select(to_json($"a")), + Row("""[{"a":1}]""") :: Nil) + } + + test("to_json - map") { +val df1 = Seq(Map("a" -> Tuple1(1))).toDF("a") +val df2 = Seq(Map(Tuple1(1) -> Tuple1(1))).toDF("a") +val df3 = Seq(Map("a" -> 1)).toDF("a") + +checkAnswer( + df1.select(to_json($"a")), + Row("""{"a":{"_1":1}}""") :: Nil) +checkAnswer( + df2.select(to_json($"a")), + Row("""{"[0,1]":{"_1":1}}""") :: Nil) --- End diff -- Yes, I was thinking that showing debugging (internal formatted) strings is rather a bug and actually we might have to fix if possible, as I think we have pretty `toString` format for public API `Row` class. I remember we eventually decided to fix `TimestampType` and `DataType` in JSON, which were printed as longs and integers, to human readable formats, SPARK-16216. I think there was a discussion that they should be written as are (long and int) vs human readable format at that time if I understood correctly. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r138095618 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala --- @@ -180,10 +180,30 @@ class JsonFunctionsSuite extends QueryTest with SharedSQLContext { test("to_json - array") { val df = Seq(Tuple1(Tuple1(1) :: Nil)).toDF("a") +val df2 = Seq(Tuple1(Map("a" -> 1) :: Nil)).toDF("a") checkAnswer( df.select(to_json($"a")), Row("""[{"_1":1}]""") :: Nil) +checkAnswer( + df2.select(to_json($"a")), + Row("""[{"a":1}]""") :: Nil) + } + + test("to_json - map") { +val df1 = Seq(Map("a" -> Tuple1(1))).toDF("a") +val df2 = Seq(Map(Tuple1(1) -> Tuple1(1))).toDF("a") +val df3 = Seq(Map("a" -> 1)).toDF("a") + +checkAnswer( + df1.select(to_json($"a")), + Row("""{"a":{"_1":1}}""") :: Nil) +checkAnswer( + df2.select(to_json($"a")), + Row("""{"[0,1]":{"_1":1}}""") :: Nil) --- End diff -- I think we don't have a `toString` method for `UnsafeRow` to show up well-format string representation. The current `toString` for it is just for debugging. It simply gets the hex string for each 8 bytes. Consider the nature of `UnsafeRow`, this `toString` might make sense to me. We can remove this test case if the result could be confusing, otherwise we should add a comment to explain why the JSON looks like it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user goldmedal commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r138094337 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/json/JacksonGeneratorSuite.scala --- @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.json + +import java.io.CharArrayWriter + +import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, DateTimeUtils, GenericArrayData} +import org.apache.spark.sql.types._ + +class JacksonGeneratorSuite extends SparkFunSuite { + + val gmtId = DateTimeUtils.TimeZoneGMT.getID + val option = new JSONOptions(Map.empty, gmtId) + + test("initial with StructType and write out a row") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = InternalRow(1) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +gen.flush() +assert(writer.toString === """{"a":1}""") + } + + test("initial with StructType and write out rows") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = new GenericArrayData(InternalRow(1) :: InternalRow(2) :: Nil) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +gen.flush() +assert(writer.toString === """[{"a":1},{"a":2}]""") + } + + test("initial with StructType and write out an array with single empty row") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = new GenericArrayData(InternalRow(null) :: Nil) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +gen.flush() +assert(writer.toString === """[{}]""") + } + + test("initial with StructType and write out an empty array") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = new GenericArrayData(Nil) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +gen.flush() +assert(writer.toString === """[]""") + } + + test("initial with Map and write out a map data") { +val dataType = MapType(StringType, IntegerType) +val input = ArrayBasedMapData(Map("a" -> 1)) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +gen.flush() +assert(writer.toString === """{"a":1}""") + } + + test("initial with Map and write out an array of maps") { +val dataType = MapType(StringType, IntegerType) +val input = new GenericArrayData( + ArrayBasedMapData(Map("a" -> 1)) :: ArrayBasedMapData(Map("b" -> 2)) :: Nil) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +gen.flush() +assert(writer.toString === """[{"a":1},{"b":2}]""") + } + + test("error handling: initial with StructType but error calling write a map") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = ArrayBasedMapData(Map("a" -> 1)) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +intercept[Exception] { + gen.write(input) +} + } + + test("error handling: initial with StructType but error calling write an array of maps") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = new GenericArrayData( + ArrayBasedMapData(Map("a" -> 1)) :: ArrayBasedMapData(Map("b" -> 2)) :: Nil) +val writer = new
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user goldmedal commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r138087980 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/json/JacksonGeneratorSuite.scala --- @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.json + +import java.io.CharArrayWriter + +import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, DateTimeUtils, GenericArrayData} +import org.apache.spark.sql.types._ + +class JacksonGeneratorSuite extends SparkFunSuite { + + val gmtId = DateTimeUtils.TimeZoneGMT.getID + val option = new JSONOptions(Map.empty, gmtId) + + test("initial with StructType and write out a row") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = InternalRow(1) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +gen.flush() +assert(writer.toString === """{"a":1}""") + } + + test("initial with StructType and write out rows") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = new GenericArrayData(InternalRow(1) :: InternalRow(2) :: Nil) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +gen.flush() +assert(writer.toString === """[{"a":1},{"a":2}]""") + } + + test("initial with StructType and write out an array with single empty row") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = new GenericArrayData(InternalRow(null) :: Nil) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +gen.flush() +assert(writer.toString === """[{}]""") + } + + test("initial with StructType and write out an empty array") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = new GenericArrayData(Nil) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +gen.flush() +assert(writer.toString === """[]""") + } + + test("initial with Map and write out a map data") { +val dataType = MapType(StringType, IntegerType) +val input = ArrayBasedMapData(Map("a" -> 1)) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +gen.flush() +assert(writer.toString === """{"a":1}""") + } + + test("initial with Map and write out an array of maps") { +val dataType = MapType(StringType, IntegerType) +val input = new GenericArrayData( + ArrayBasedMapData(Map("a" -> 1)) :: ArrayBasedMapData(Map("b" -> 2)) :: Nil) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +gen.flush() +assert(writer.toString === """[{"a":1},{"b":2}]""") + } + + test("error handling: initial with StructType but error calling write a map") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = ArrayBasedMapData(Map("a" -> 1)) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +intercept[Exception] { + gen.write(input) +} + } + + test("error handling: initial with StructType but error calling write an array of maps") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = new GenericArrayData( + ArrayBasedMapData(Map("a" -> 1)) :: ArrayBasedMapData(Map("b" -> 2)) :: Nil) +val writer = new
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r138087162 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/json/JacksonGeneratorSuite.scala --- @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.json + +import java.io.CharArrayWriter + +import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, DateTimeUtils, GenericArrayData} +import org.apache.spark.sql.types._ + +class JacksonGeneratorSuite extends SparkFunSuite { + + val gmtId = DateTimeUtils.TimeZoneGMT.getID + val option = new JSONOptions(Map.empty, gmtId) + + test("initial with StructType and write out a row") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = InternalRow(1) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +gen.flush() +assert(writer.toString === """{"a":1}""") + } + + test("initial with StructType and write out rows") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = new GenericArrayData(InternalRow(1) :: InternalRow(2) :: Nil) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +gen.flush() +assert(writer.toString === """[{"a":1},{"a":2}]""") + } + + test("initial with StructType and write out an array with single empty row") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = new GenericArrayData(InternalRow(null) :: Nil) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +gen.flush() +assert(writer.toString === """[{}]""") + } + + test("initial with StructType and write out an empty array") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = new GenericArrayData(Nil) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +gen.flush() +assert(writer.toString === """[]""") + } + + test("initial with Map and write out a map data") { +val dataType = MapType(StringType, IntegerType) +val input = ArrayBasedMapData(Map("a" -> 1)) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +gen.flush() +assert(writer.toString === """{"a":1}""") + } + + test("initial with Map and write out an array of maps") { +val dataType = MapType(StringType, IntegerType) +val input = new GenericArrayData( + ArrayBasedMapData(Map("a" -> 1)) :: ArrayBasedMapData(Map("b" -> 2)) :: Nil) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +gen.flush() +assert(writer.toString === """[{"a":1},{"b":2}]""") + } + + test("error handling: initial with StructType but error calling write a map") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = ArrayBasedMapData(Map("a" -> 1)) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +intercept[Exception] { + gen.write(input) +} + } + + test("error handling: initial with StructType but error calling write an array of maps") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = new GenericArrayData( + ArrayBasedMapData(Map("a" -> 1)) :: ArrayBasedMapData(Map("b" -> 2)) :: Nil) +val writer = new
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r138080916 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/json/JacksonGeneratorSuite.scala --- @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.json + +import java.io.CharArrayWriter + +import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, DateTimeUtils, GenericArrayData} +import org.apache.spark.sql.types._ + +class JacksonGeneratorSuite extends SparkFunSuite { + + val gmtId = DateTimeUtils.TimeZoneGMT.getID + val option = new JSONOptions(Map.empty, gmtId) + + test("initial with StructType and write out a row") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = InternalRow(1) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +gen.flush() +assert(writer.toString === """{"a":1}""") + } + + test("initial with StructType and write out rows") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = new GenericArrayData(InternalRow(1) :: InternalRow(2) :: Nil) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +gen.flush() +assert(writer.toString === """[{"a":1},{"a":2}]""") + } + + test("initial with StructType and write out an array with single empty row") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = new GenericArrayData(InternalRow(null) :: Nil) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +gen.flush() +assert(writer.toString === """[{}]""") + } + + test("initial with StructType and write out an empty array") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = new GenericArrayData(Nil) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +gen.flush() +assert(writer.toString === """[]""") + } + + test("initial with Map and write out a map data") { +val dataType = MapType(StringType, IntegerType) +val input = ArrayBasedMapData(Map("a" -> 1)) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +gen.flush() +assert(writer.toString === """{"a":1}""") + } + + test("initial with Map and write out an array of maps") { +val dataType = MapType(StringType, IntegerType) +val input = new GenericArrayData( + ArrayBasedMapData(Map("a" -> 1)) :: ArrayBasedMapData(Map("b" -> 2)) :: Nil) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +gen.flush() +assert(writer.toString === """[{"a":1},{"b":2}]""") + } + + test("error handling: initial with StructType but error calling write a map") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = ArrayBasedMapData(Map("a" -> 1)) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +intercept[Exception] { + gen.write(input) +} + } + + test("error handling: initial with StructType but error calling write an array of maps") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = new GenericArrayData( + ArrayBasedMapData(Map("a" -> 1)) :: ArrayBasedMapData(Map("b" -> 2)) :: Nil) +val writer = new
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r138078894 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/JsonExpressionsSuite.scala --- @@ -612,6 +612,54 @@ class JsonExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { ) } + test("SPARK-21513: to_json support map[string, struct] to json") { +val schema = MapType(StringType, StructType(StructField("a", IntegerType) :: Nil)) +val input = Literal.create(ArrayBasedMapData(Map("test" -> InternalRow(1))), schema) +checkEvaluation( + StructsToJson(Map.empty, input), + """{"test":{"a":1}}""" +) + } + + test("SPARK-21513: to_json support map[struct, struct] to json") { +val schema = MapType(StructType(StructField("a", IntegerType) :: Nil), + StructType(StructField("b", IntegerType) :: Nil)) +val input = Literal.create(ArrayBasedMapData(Map(InternalRow(1) -> InternalRow(2))), schema) +checkEvaluation( + StructsToJson(Map.empty, input), + """{"[1]":{"b":2}}""" +) + } + + test("SPARK-21513: to_json support map[string, integer] to json") { +val schema = MapType(StringType, IntegerType) +val input = Literal.create(ArrayBasedMapData(Map("a" -> 1)), schema) +checkEvaluation( + StructsToJson(Map.empty, input), + """{"a":1}""" +) + } + + test("to_json - array with maps") { +val inputSchema = ArrayType(MapType(StringType, IntegerType)) +val input = new GenericArrayData(ArrayBasedMapData( + Map("a" -> 1)) :: ArrayBasedMapData(Map("b" -> 2)) :: Nil) +val output = """[{"a":1},{"b":2}]""" +checkEvaluation( + StructsToJson(Map.empty, Literal.create(input, inputSchema), gmtId), + output) + } + + test("to_json - array with single map") { +val inputSchema = ArrayType(MapType(StringType, IntegerType)) +val input = new GenericArrayData(ArrayBasedMapData( + Map("a" -> 1)) :: Nil) --- End diff -- Let's make this inlined. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r138079126 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -193,14 +226,26 @@ private[sql] class JacksonGenerator( * * @param row The row to convert */ - def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, rootFieldWriters)) + def write(row: InternalRow): Unit = { +writeObject(writeFields(row, dataType.asInstanceOf[StructType], rootFieldWriters)) + } + /** - * Transforms multiple `InternalRow`s to JSON array using Jackson + * Transforms multiple `InternalRow`s or `MapData`s to JSON array using Jackson * - * @param array The array of rows to convert + * @param array The array of rows or maps to convert */ def write(array: ArrayData): Unit = writeArray(writeArrayData(array, arrElementWriter)) + /** + * Transforms a `MapData` to JSON object using Jackson --- End diff -- Not a big deal but `` a `MapData` `` -> `` a single `MapData` `` just for consistency with ` write(row: InternalRow)`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r138077808 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala --- @@ -180,10 +180,30 @@ class JsonFunctionsSuite extends QueryTest with SharedSQLContext { test("to_json - array") { val df = Seq(Tuple1(Tuple1(1) :: Nil)).toDF("a") +val df2 = Seq(Tuple1(Map("a" -> 1) :: Nil)).toDF("a") checkAnswer( df.select(to_json($"a")), Row("""[{"_1":1}]""") :: Nil) +checkAnswer( + df2.select(to_json($"a")), + Row("""[{"a":1}]""") :: Nil) + } + + test("to_json - map") { +val df1 = Seq(Map("a" -> Tuple1(1))).toDF("a") +val df2 = Seq(Map(Tuple1(1) -> Tuple1(1))).toDF("a") +val df3 = Seq(Map("a" -> 1)).toDF("a") + +checkAnswer( + df1.select(to_json($"a")), + Row("""{"a":{"_1":1}}""") :: Nil) +checkAnswer( + df2.select(to_json($"a")), + Row("""{"[0,1]":{"_1":1}}""") :: Nil) --- End diff -- This case looks rather a string representation from `UnsafeRow` and a bug. Let's remove this test case for now and fix it later together in another PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r138057979 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -677,14 +696,25 @@ case class StructsToJson( override def checkInputDataTypes(): TypeCheckResult = child.dataType match { case _: StructType | ArrayType(_: StructType, _) => try { -JacksonUtils.verifySchema(rowSchema) +JacksonUtils.verifySchema(rowSchema.asInstanceOf[StructType]) +TypeCheckResult.TypeCheckSuccess + } catch { +case e: UnsupportedOperationException => + TypeCheckResult.TypeCheckFailure(e.getMessage) + } +case _: MapType | ArrayType(_: MapType, _) => + // TODO: let `JacksonUtils.verifySchema` verify a `MapType` + try { +val st = StructType(StructField("a", rowSchema.asInstanceOf[MapType]) :: Nil) +JacksonUtils.verifySchema(st) TypeCheckResult.TypeCheckSuccess } catch { case e: UnsupportedOperationException => TypeCheckResult.TypeCheckFailure(e.getMessage) } case _ => TypeCheckResult.TypeCheckFailure( - s"Input type ${child.dataType.simpleString} must be a struct or array of structs.") + s"Input type ${child.dataType.simpleString} must be a struct, array of structs or " + + s"a map or array of map.") --- End diff -- Looks `s` could be removed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r138073822 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/json/JacksonGeneratorSuite.scala --- @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.json + +import java.io.CharArrayWriter + +import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, DateTimeUtils, GenericArrayData} +import org.apache.spark.sql.types._ + +class JacksonGeneratorSuite extends SparkFunSuite { + + val gmtId = DateTimeUtils.TimeZoneGMT.getID + val option = new JSONOptions(Map.empty, gmtId) + + test("initial with StructType and write out a row") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = InternalRow(1) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +gen.flush() +assert(writer.toString === """{"a":1}""") + } + + test("initial with StructType and write out rows") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = new GenericArrayData(InternalRow(1) :: InternalRow(2) :: Nil) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +gen.flush() +assert(writer.toString === """[{"a":1},{"a":2}]""") + } + + test("initial with StructType and write out an array with single empty row") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = new GenericArrayData(InternalRow(null) :: Nil) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +gen.flush() +assert(writer.toString === """[{}]""") + } + + test("initial with StructType and write out an empty array") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = new GenericArrayData(Nil) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +gen.flush() +assert(writer.toString === """[]""") + } + + test("initial with Map and write out a map data") { +val dataType = MapType(StringType, IntegerType) +val input = ArrayBasedMapData(Map("a" -> 1)) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +gen.flush() +assert(writer.toString === """{"a":1}""") + } + + test("initial with Map and write out an array of maps") { +val dataType = MapType(StringType, IntegerType) +val input = new GenericArrayData( + ArrayBasedMapData(Map("a" -> 1)) :: ArrayBasedMapData(Map("b" -> 2)) :: Nil) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +gen.flush() +assert(writer.toString === """[{"a":1},{"b":2}]""") + } + + test("error handling: initial with StructType but error calling write a map") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = ArrayBasedMapData(Map("a" -> 1)) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +intercept[Exception] { + gen.write(input) +} + } + + test("error handling: initial with StructType but error calling write an array of maps") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = new GenericArrayData( + ArrayBasedMapData(Map("a" -> 1)) :: ArrayBasedMapData(Map("b" -> 2)) :: Nil) +val writer = new
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r138058970 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -26,20 +26,53 @@ import org.apache.spark.sql.catalyst.expressions.SpecializedGetters import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, MapData} import org.apache.spark.sql.types._ +/** + * `JackGenerator` can only be initialized with a `StructType` or a `MapType`. + * Once it is initialized with `StructType`, it can be used to write out a struct or an array of + * struct. Once it is initialized with ``MapType``, it can be used to write out a map or an array + * of map. An exception will be thrown if trying to write out a struct if it is initialized with + * a `MapType`, and vice verse. + */ private[sql] class JacksonGenerator( -schema: StructType, +dataType: DataType, writer: Writer, options: JSONOptions) { // A `ValueWriter` is responsible for writing a field of an `InternalRow` to appropriate // JSON data. Here we are using `SpecializedGetters` rather than `InternalRow` so that // we can directly access data in `ArrayData` without the help of `SpecificMutableRow`. private type ValueWriter = (SpecializedGetters, Int) => Unit + // `JackGenerator` can only be initialized with a `StructType` or a `MapType`. + dataType match { +case _: StructType | _: MapType => +case _ => throw new UnsupportedOperationException( + s"`JacksonGenerator` only supports to be initialized with a `StructType` " + --- End diff -- This `s` could be removed too. Let's avoid backquoting classes in the error messages. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r138058695 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -26,20 +26,53 @@ import org.apache.spark.sql.catalyst.expressions.SpecializedGetters import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, MapData} import org.apache.spark.sql.types._ +/** + * `JackGenerator` can only be initialized with a `StructType` or a `MapType`. + * Once it is initialized with `StructType`, it can be used to write out a struct or an array of + * struct. Once it is initialized with ``MapType``, it can be used to write out a map or an array + * of map. An exception will be thrown if trying to write out a struct if it is initialized with + * a `MapType`, and vice verse. + */ private[sql] class JacksonGenerator( -schema: StructType, +dataType: DataType, writer: Writer, options: JSONOptions) { // A `ValueWriter` is responsible for writing a field of an `InternalRow` to appropriate // JSON data. Here we are using `SpecializedGetters` rather than `InternalRow` so that // we can directly access data in `ArrayData` without the help of `SpecificMutableRow`. private type ValueWriter = (SpecializedGetters, Int) => Unit + // `JackGenerator` can only be initialized with a `StructType` or a `MapType`. + dataType match { --- End diff -- I'd just do a `require(... .isInstanceOf... || ..., "...")` instead. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r138058166 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -26,20 +26,53 @@ import org.apache.spark.sql.catalyst.expressions.SpecializedGetters import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, MapData} import org.apache.spark.sql.types._ +/** + * `JackGenerator` can only be initialized with a `StructType` or a `MapType`. + * Once it is initialized with `StructType`, it can be used to write out a struct or an array of + * struct. Once it is initialized with ``MapType``, it can be used to write out a map or an array --- End diff -- nit: ``` ``MapType`` ``` -> `` `MapType` `` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137997910 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -26,20 +26,55 @@ import org.apache.spark.sql.catalyst.expressions.SpecializedGetters import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, MapData} import org.apache.spark.sql.types._ +/** + * `JackGenerator` can only be initialized with a `StructType` or a `MapType`. + * Once it is initialized with `StructType`, it can be used to write out a struct or an array of + * struct. Once it is initialized with ``MapType``, it can be used to write out a map or an array + * of map. An exception will be thrown if trying to write out a struct if it is initialized with + * a `MapType`, and vice verse. + */ private[sql] class JacksonGenerator( -schema: StructType, +dataType: DataType, writer: Writer, options: JSONOptions) { // A `ValueWriter` is responsible for writing a field of an `InternalRow` to appropriate // JSON data. Here we are using `SpecializedGetters` rather than `InternalRow` so that // we can directly access data in `ArrayData` without the help of `SpecificMutableRow`. private type ValueWriter = (SpecializedGetters, Int) => Unit + // `JackGenerator` can only be initialized with a `StructType` or a `MapType`. + dataType match { +case _: StructType | _: MapType => +case _ => throw new UnsupportedOperationException( + s"`JacksonGenerator` only supports to be initialized with a `StructType` " + + s"or `MapType` but got ${dataType.simpleString}") + } + // `ValueWriter`s for all fields of the schema - private val rootFieldWriters: Array[ValueWriter] = schema.map(_.dataType).map(makeWriter).toArray + private lazy val rootFieldWriters: Array[ValueWriter] = dataType match { +case st: StructType => st.map(_.dataType).map(makeWriter).toArray +case _ => throw new UnsupportedOperationException( + s"Initial type ${dataType.simpleString} must be a struct") + } + // `ValueWriter` for array data storing rows of the schema. - private val arrElementWriter: ValueWriter = (arr: SpecializedGetters, i: Int) => { -writeObject(writeFields(arr.getStruct(i, schema.length), schema, rootFieldWriters)) + private lazy val arrElementWriter: ValueWriter = dataType match { +case st: StructType => + (arr: SpecializedGetters, i: Int) => { +writeObject(writeFields(arr.getStruct(i, st.length), st, rootFieldWriters)) + } +case mt: MapType => + (arr: SpecializedGetters, i: Int) => { +writeObject(writeMapData(arr.getMap(i), mt, mapElementWriter)) + } +case _ => throw new UnsupportedOperationException( --- End diff -- We don't need this. `dataType` can only be `StructType` or `MapType`, as we checked it in constructor. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137988061 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/json/JacksonGeneratorSuite.scala --- @@ -0,0 +1,92 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.json + +import java.io.CharArrayWriter + +import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, DateTimeUtils, GenericArrayData} +import org.apache.spark.sql.types._ + +class JacksonGeneratorSuite extends SparkFunSuite { + + val gmtId = DateTimeUtils.TimeZoneGMT.getID + val option = new JSONOptions(Map.empty, gmtId) + + test("initial with StructType and write out a row") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = InternalRow(1) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +gen.flush() +assert(writer.toString === """{"a":1}""") + } + + test("initial with StructType and write out rows") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = new GenericArrayData(InternalRow(1) :: InternalRow(2) :: Nil) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +gen.flush() +assert(writer.toString === """[{"a":1},{"a":2}]""") + } + + test("initial with StructType and write out an array with single empty row") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = new GenericArrayData(InternalRow(null) :: Nil) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +gen.flush() +assert(writer.toString === """[{}]""") + } + + test("initial with StructType and write out an empty array") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = new GenericArrayData(Nil) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +gen.flush() +assert(writer.toString === """[]""") + } + + test("initial with Map and write out a map data") { +val dataType = MapType(StringType, IntegerType) +val input = ArrayBasedMapData(Map("a" -> 1)) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +gen.flush() +assert(writer.toString === """{"a":1}""") + } + + test("initial with Map and write out an array of maps") { +val dataType = MapType(StringType, IntegerType) +val input = new GenericArrayData( + ArrayBasedMapData(Map("a" -> 1)) :: ArrayBasedMapData(Map("b" -> 2)) :: Nil) +val writer = new CharArrayWriter() +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +gen.flush() +assert(writer.toString === """[{"a":1},{"b":2}]""") + } --- End diff -- We need negative test cases, i.e, `JacksonGenerator` initialized with `StructType` but used to write map, array of map. Also `JacksonGenerator` initialized with `MapType` but used to write struct, array of struct. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user goldmedal commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137987843 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -193,14 +228,32 @@ private[sql] class JacksonGenerator( * * @param row The row to convert */ - def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, rootFieldWriters)) + def write(row: InternalRow): Unit = dataType match { +case st: StructType => + writeObject(writeFields(row, st, rootFieldWriters)) +case _ => throw new UnsupportedOperationException( --- End diff -- It will throw `ClassCastException`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137987691 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -193,14 +228,32 @@ private[sql] class JacksonGenerator( * * @param row The row to convert */ - def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, rootFieldWriters)) + def write(row: InternalRow): Unit = dataType match { +case st: StructType => + writeObject(writeFields(row, st, rootFieldWriters)) +case _ => throw new UnsupportedOperationException( --- End diff -- hmm, what type of exception thrown now if call `write(row)` when the generator is initialized with `MapType`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user goldmedal commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137987296 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -193,14 +228,32 @@ private[sql] class JacksonGenerator( * * @param row The row to convert */ - def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, rootFieldWriters)) + def write(row: InternalRow): Unit = dataType match { +case st: StructType => + writeObject(writeFields(row, st, rootFieldWriters)) +case _ => throw new UnsupportedOperationException( --- End diff -- oh I got you wrong. I thought you mean the matching in `rootFieldWeriters`. So we need to keep both of them? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137986494 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -193,14 +228,32 @@ private[sql] class JacksonGenerator( * * @param row The row to convert */ - def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, rootFieldWriters)) + def write(row: InternalRow): Unit = dataType match { +case st: StructType => + writeObject(writeFields(row, st, rootFieldWriters)) +case _ => throw new UnsupportedOperationException( --- End diff -- I saw you remove the pattern matching? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137986264 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -677,14 +696,25 @@ case class StructsToJson( override def checkInputDataTypes(): TypeCheckResult = child.dataType match { case _: StructType | ArrayType(_: StructType, _) => try { -JacksonUtils.verifySchema(rowSchema) +JacksonUtils.verifySchema(rowSchema.asInstanceOf[StructType]) +TypeCheckResult.TypeCheckSuccess + } catch { +case e: UnsupportedOperationException => + TypeCheckResult.TypeCheckFailure(e.getMessage) + } +case _: MapType | ArrayType(_: MapType, _) => + // TODO: let `JacksonUtils.verifySchema` verify a `MapType` + try { +val st = StructType(StructField("a", rowSchema.asInstanceOf[MapType]) :: Nil) +JacksonUtils.verifySchema(st) TypeCheckResult.TypeCheckSuccess } catch { case e: UnsupportedOperationException => TypeCheckResult.TypeCheckFailure(e.getMessage) } case _ => TypeCheckResult.TypeCheckFailure( - s"Input type ${child.dataType.simpleString} must be a struct or array of structs.") + s"Input type ${child.dataType.simpleString} must be a struct, array of structs or " + + s"an arbitrary map.") --- End diff -- `an arbitrary map` -> `a map or array of map.` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137986124 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -677,14 +696,42 @@ case class StructsToJson( override def checkInputDataTypes(): TypeCheckResult = child.dataType match { case _: StructType | ArrayType(_: StructType, _) => try { -JacksonUtils.verifySchema(rowSchema) +JacksonUtils.verifySchema(rowSchema.asInstanceOf[StructType]) +TypeCheckResult.TypeCheckSuccess + } catch { +case e: UnsupportedOperationException => + TypeCheckResult.TypeCheckFailure(e.getMessage) + } +case ArrayType(mt: MapType, _) => + try { +val st = StructType(StructField("a", mt) :: Nil) +JacksonUtils.verifySchema(st) +TypeCheckResult.TypeCheckSuccess + } catch { +case e: UnsupportedOperationException => + TypeCheckResult.TypeCheckFailure(e.getMessage) + } +case MapType(_: DataType, st: StructType, _: Boolean) => --- End diff -- Yeah, as you support arbitrary map. Not all map types actually, it leaves to `JacksonUtils` to verify the schema. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user goldmedal commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137984767 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -677,14 +696,42 @@ case class StructsToJson( override def checkInputDataTypes(): TypeCheckResult = child.dataType match { case _: StructType | ArrayType(_: StructType, _) => try { -JacksonUtils.verifySchema(rowSchema) +JacksonUtils.verifySchema(rowSchema.asInstanceOf[StructType]) +TypeCheckResult.TypeCheckSuccess + } catch { +case e: UnsupportedOperationException => + TypeCheckResult.TypeCheckFailure(e.getMessage) + } +case ArrayType(mt: MapType, _) => + try { +val st = StructType(StructField("a", mt) :: Nil) +JacksonUtils.verifySchema(st) +TypeCheckResult.TypeCheckSuccess + } catch { +case e: UnsupportedOperationException => + TypeCheckResult.TypeCheckFailure(e.getMessage) + } +case MapType(_: DataType, st: StructType, _: Boolean) => --- End diff -- @viirya I think if we have `case mt: MapType`, we don't need this pattern to verify schema, right? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user goldmedal commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137980084 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -677,14 +696,27 @@ case class StructsToJson( override def checkInputDataTypes(): TypeCheckResult = child.dataType match { case _: StructType | ArrayType(_: StructType, _) => try { -JacksonUtils.verifySchema(rowSchema) +JacksonUtils.verifySchema(rowSchema.asInstanceOf[StructType]) +TypeCheckResult.TypeCheckSuccess + } catch { +case e: UnsupportedOperationException => + TypeCheckResult.TypeCheckFailure(e.getMessage) + } +case ArrayType(_: MapType, _) => + TypeCheckResult.TypeCheckSuccess +case MapType(_: DataType, st: StructType, _: Boolean) => + try { +JacksonUtils.verifySchema(st) TypeCheckResult.TypeCheckSuccess } catch { case e: UnsupportedOperationException => TypeCheckResult.TypeCheckFailure(e.getMessage) } +case _: MapType => --- End diff -- ok. I just leave a TODO comment for it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137979202 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -677,14 +696,27 @@ case class StructsToJson( override def checkInputDataTypes(): TypeCheckResult = child.dataType match { case _: StructType | ArrayType(_: StructType, _) => try { -JacksonUtils.verifySchema(rowSchema) +JacksonUtils.verifySchema(rowSchema.asInstanceOf[StructType]) +TypeCheckResult.TypeCheckSuccess + } catch { +case e: UnsupportedOperationException => + TypeCheckResult.TypeCheckFailure(e.getMessage) + } +case ArrayType(_: MapType, _) => + TypeCheckResult.TypeCheckSuccess +case MapType(_: DataType, st: StructType, _: Boolean) => + try { +JacksonUtils.verifySchema(st) TypeCheckResult.TypeCheckSuccess } catch { case e: UnsupportedOperationException => TypeCheckResult.TypeCheckFailure(e.getMessage) } +case _: MapType => --- End diff -- Basically I don't like to increase the size of this PR anymore. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137979158 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -677,14 +696,27 @@ case class StructsToJson( override def checkInputDataTypes(): TypeCheckResult = child.dataType match { case _: StructType | ArrayType(_: StructType, _) => try { -JacksonUtils.verifySchema(rowSchema) +JacksonUtils.verifySchema(rowSchema.asInstanceOf[StructType]) +TypeCheckResult.TypeCheckSuccess + } catch { +case e: UnsupportedOperationException => + TypeCheckResult.TypeCheckFailure(e.getMessage) + } +case ArrayType(_: MapType, _) => + TypeCheckResult.TypeCheckSuccess +case MapType(_: DataType, st: StructType, _: Boolean) => + try { +JacksonUtils.verifySchema(st) TypeCheckResult.TypeCheckSuccess } catch { case e: UnsupportedOperationException => TypeCheckResult.TypeCheckFailure(e.getMessage) } +case _: MapType => --- End diff -- Yeah, it is. Maybe we should revise `JacksonUtils` to be able to verify `MapType` directly. But I don't see `JacksonUtils.verifySchema` used in other place. We can leave it as TODO if you want to. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user goldmedal commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137979014 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -677,14 +696,27 @@ case class StructsToJson( override def checkInputDataTypes(): TypeCheckResult = child.dataType match { case _: StructType | ArrayType(_: StructType, _) => try { -JacksonUtils.verifySchema(rowSchema) +JacksonUtils.verifySchema(rowSchema.asInstanceOf[StructType]) +TypeCheckResult.TypeCheckSuccess + } catch { +case e: UnsupportedOperationException => + TypeCheckResult.TypeCheckFailure(e.getMessage) + } +case ArrayType(_: MapType, _) => + TypeCheckResult.TypeCheckSuccess +case MapType(_: DataType, st: StructType, _: Boolean) => + try { +JacksonUtils.verifySchema(st) TypeCheckResult.TypeCheckSuccess } catch { case e: UnsupportedOperationException => TypeCheckResult.TypeCheckFailure(e.getMessage) } +case _: MapType => --- End diff -- oh yes. It's a tricky way but make sense. =D --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user goldmedal commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137978728 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -193,14 +228,32 @@ private[sql] class JacksonGenerator( * * @param row The row to convert */ - def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, rootFieldWriters)) + def write(row: InternalRow): Unit = dataType match { +case st: StructType => + writeObject(writeFields(row, st, rootFieldWriters)) +case _ => throw new UnsupportedOperationException( --- End diff -- ok got it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137978375 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -677,14 +696,27 @@ case class StructsToJson( override def checkInputDataTypes(): TypeCheckResult = child.dataType match { case _: StructType | ArrayType(_: StructType, _) => try { -JacksonUtils.verifySchema(rowSchema) +JacksonUtils.verifySchema(rowSchema.asInstanceOf[StructType]) +TypeCheckResult.TypeCheckSuccess + } catch { +case e: UnsupportedOperationException => + TypeCheckResult.TypeCheckFailure(e.getMessage) + } +case ArrayType(_: MapType, _) => + TypeCheckResult.TypeCheckSuccess +case MapType(_: DataType, st: StructType, _: Boolean) => + try { +JacksonUtils.verifySchema(st) TypeCheckResult.TypeCheckSuccess } catch { case e: UnsupportedOperationException => TypeCheckResult.TypeCheckFailure(e.getMessage) } +case _: MapType => --- End diff -- Maybe we can just do something like: case mt: MapType => val st = StructType(StructField("a", mt) :: Nil) JacksonUtils.verifySchema(st) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user goldmedal commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137978096 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -677,14 +696,27 @@ case class StructsToJson( override def checkInputDataTypes(): TypeCheckResult = child.dataType match { case _: StructType | ArrayType(_: StructType, _) => try { -JacksonUtils.verifySchema(rowSchema) +JacksonUtils.verifySchema(rowSchema.asInstanceOf[StructType]) +TypeCheckResult.TypeCheckSuccess + } catch { +case e: UnsupportedOperationException => + TypeCheckResult.TypeCheckFailure(e.getMessage) + } +case ArrayType(_: MapType, _) => + TypeCheckResult.TypeCheckSuccess +case MapType(_: DataType, st: StructType, _: Boolean) => + try { +JacksonUtils.verifySchema(st) TypeCheckResult.TypeCheckSuccess } catch { case e: UnsupportedOperationException => TypeCheckResult.TypeCheckFailure(e.getMessage) } +case _: MapType => --- End diff -- @viirya But the `JacksonUtils.verifySchema` only verify a `StructType`, so we need to add a new one? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user goldmedal commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137977643 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/json/JacksonGeneratorSuite.scala --- @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.json + +import java.io.CharArrayWriter + +import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, DateTimeUtils, GenericArrayData} +import org.apache.spark.sql.types._ +import org.apache.spark.unsafe.types.UTF8String + +class JacksonGeneratorSuite extends SparkFunSuite { + + val gmtId = DateTimeUtils.TimeZoneGMT.getID + val option = new JSONOptions(Map.empty, gmtId) + val writer = new CharArrayWriter() + def getAndReset(gen: JacksonGenerator): UTF8String = { +gen.flush() +val json = writer.toString +writer.reset() +UTF8String.fromString(json) + } + + test("initial with StructType and write out a row") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = InternalRow(1) +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +assert(getAndReset(gen) === UTF8String.fromString("""{"a":1}""")) --- End diff -- @viirya I think we also need to do `gen.flush()`, right? So maybe we can keep `getAndReset` and modify it as below. Will it be better? ``` def getAndReset(gen: JacksonGenerator, writer: Writer): String = { gen.flush() writer.toString } ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137977638 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -193,14 +228,32 @@ private[sql] class JacksonGenerator( * * @param row The row to convert */ - def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, rootFieldWriters)) + def write(row: InternalRow): Unit = dataType match { +case st: StructType => + writeObject(writeFields(row, st, rootFieldWriters)) +case _ => throw new UnsupportedOperationException( --- End diff -- `ClassCastExceptoin` may not informative as `UnsupportedOperationException` here. I'd prefer to see `UnsupportedOperationException`. Let's keep this pattern matching for now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137977826 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/json/JacksonGeneratorSuite.scala --- @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.json + +import java.io.CharArrayWriter + +import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, DateTimeUtils, GenericArrayData} +import org.apache.spark.sql.types._ +import org.apache.spark.unsafe.types.UTF8String + +class JacksonGeneratorSuite extends SparkFunSuite { + + val gmtId = DateTimeUtils.TimeZoneGMT.getID + val option = new JSONOptions(Map.empty, gmtId) + val writer = new CharArrayWriter() + def getAndReset(gen: JacksonGenerator): UTF8String = { +gen.flush() +val json = writer.toString +writer.reset() +UTF8String.fromString(json) + } + + test("initial with StructType and write out a row") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = InternalRow(1) +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +assert(getAndReset(gen) === UTF8String.fromString("""{"a":1}""")) --- End diff -- Yeah, but it's just two lines. Depends on you to have a function or not. At least it shouldn't be called `getAndReset` now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137977281 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -193,14 +228,32 @@ private[sql] class JacksonGenerator( * * @param row The row to convert */ - def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, rootFieldWriters)) + def write(row: InternalRow): Unit = dataType match { +case st: StructType => + writeObject(writeFields(row, st, rootFieldWriters)) +case _ => throw new UnsupportedOperationException( --- End diff -- Do you mean `dataType` now is a map type but `writeFields`'s 2nd parameter is a `StructType`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user goldmedal commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137977153 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/json/JacksonGeneratorSuite.scala --- @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.json + +import java.io.CharArrayWriter + +import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, DateTimeUtils, GenericArrayData} +import org.apache.spark.sql.types._ +import org.apache.spark.unsafe.types.UTF8String + +class JacksonGeneratorSuite extends SparkFunSuite { + + val gmtId = DateTimeUtils.TimeZoneGMT.getID + val option = new JSONOptions(Map.empty, gmtId) + val writer = new CharArrayWriter() + def getAndReset(gen: JacksonGenerator): UTF8String = { +gen.flush() +val json = writer.toString +writer.reset() +UTF8String.fromString(json) + } + + test("initial with StructType and write out a row") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = InternalRow(1) +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +assert(getAndReset(gen) === UTF8String.fromString("""{"a":1}""")) --- End diff -- Yes, maybe we can just `writer.toString` and assert it with the normal string. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user goldmedal commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137977056 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/json/JacksonGeneratorSuite.scala --- @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.json + +import java.io.CharArrayWriter + +import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, DateTimeUtils, GenericArrayData} +import org.apache.spark.sql.types._ +import org.apache.spark.unsafe.types.UTF8String + +class JacksonGeneratorSuite extends SparkFunSuite { + + val gmtId = DateTimeUtils.TimeZoneGMT.getID + val option = new JSONOptions(Map.empty, gmtId) + val writer = new CharArrayWriter() + def getAndReset(gen: JacksonGenerator): UTF8String = { +gen.flush() +val json = writer.toString +writer.reset() --- End diff -- ok. I got it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user goldmedal commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137976953 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -193,14 +228,32 @@ private[sql] class JacksonGenerator( * * @param row The row to convert */ - def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, rootFieldWriters)) + def write(row: InternalRow): Unit = dataType match { +case st: StructType => + writeObject(writeFields(row, st, rootFieldWriters)) +case _ => throw new UnsupportedOperationException( --- End diff -- @viirya I found if we don't check it in here, it will throw `ClassCastExceptoin` from `writeObject()` firstly. So, maybe we don't need to check type in `rootFieldWriters` ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137976262 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -677,14 +696,27 @@ case class StructsToJson( override def checkInputDataTypes(): TypeCheckResult = child.dataType match { case _: StructType | ArrayType(_: StructType, _) => try { -JacksonUtils.verifySchema(rowSchema) +JacksonUtils.verifySchema(rowSchema.asInstanceOf[StructType]) +TypeCheckResult.TypeCheckSuccess + } catch { +case e: UnsupportedOperationException => + TypeCheckResult.TypeCheckFailure(e.getMessage) + } +case ArrayType(_: MapType, _) => + TypeCheckResult.TypeCheckSuccess +case MapType(_: DataType, st: StructType, _: Boolean) => + try { +JacksonUtils.verifySchema(st) TypeCheckResult.TypeCheckSuccess } catch { case e: UnsupportedOperationException => TypeCheckResult.TypeCheckFailure(e.getMessage) } +case _: MapType => --- End diff -- We need to revise `JacksonUtils` to check map type too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137976065 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -677,14 +696,27 @@ case class StructsToJson( override def checkInputDataTypes(): TypeCheckResult = child.dataType match { case _: StructType | ArrayType(_: StructType, _) => try { -JacksonUtils.verifySchema(rowSchema) +JacksonUtils.verifySchema(rowSchema.asInstanceOf[StructType]) +TypeCheckResult.TypeCheckSuccess + } catch { +case e: UnsupportedOperationException => + TypeCheckResult.TypeCheckFailure(e.getMessage) + } +case ArrayType(_: MapType, _) => + TypeCheckResult.TypeCheckSuccess +case MapType(_: DataType, st: StructType, _: Boolean) => + try { +JacksonUtils.verifySchema(st) TypeCheckResult.TypeCheckSuccess } catch { case e: UnsupportedOperationException => TypeCheckResult.TypeCheckFailure(e.getMessage) } +case _: MapType => --- End diff -- Not all map types are valid for JSON. We still need verify the type. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user goldmedal commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137975716 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -193,14 +228,32 @@ private[sql] class JacksonGenerator( * * @param row The row to convert */ - def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, rootFieldWriters)) + def write(row: InternalRow): Unit = dataType match { +case st: StructType => + writeObject(writeFields(row, st, rootFieldWriters)) +case _ => throw new UnsupportedOperationException( --- End diff -- ok. I'll remove it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137975624 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/json/JacksonGeneratorSuite.scala --- @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.json + +import java.io.CharArrayWriter + +import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, DateTimeUtils, GenericArrayData} +import org.apache.spark.sql.types._ +import org.apache.spark.unsafe.types.UTF8String + +class JacksonGeneratorSuite extends SparkFunSuite { + + val gmtId = DateTimeUtils.TimeZoneGMT.getID + val option = new JSONOptions(Map.empty, gmtId) + val writer = new CharArrayWriter() + def getAndReset(gen: JacksonGenerator): UTF8String = { +gen.flush() +val json = writer.toString +writer.reset() --- End diff -- I don't prefer tests depending each others. I think we can just have a writer for each test, instead of reusing it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137975414 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/json/JacksonGeneratorSuite.scala --- @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.json + +import java.io.CharArrayWriter + +import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, DateTimeUtils, GenericArrayData} +import org.apache.spark.sql.types._ +import org.apache.spark.unsafe.types.UTF8String + +class JacksonGeneratorSuite extends SparkFunSuite { + + val gmtId = DateTimeUtils.TimeZoneGMT.getID + val option = new JSONOptions(Map.empty, gmtId) + val writer = new CharArrayWriter() + def getAndReset(gen: JacksonGenerator): UTF8String = { +gen.flush() +val json = writer.toString +writer.reset() +UTF8String.fromString(json) + } + + test("initial with StructType and write out a row") { +val dataType = StructType(StructField("a", IntegerType) :: Nil) +val input = InternalRow(1) +val gen = new JacksonGenerator(dataType, writer, option) +gen.write(input) +assert(getAndReset(gen) === UTF8String.fromString("""{"a":1}""")) --- End diff -- Can we simply compare the strings? I.e., don't need to do `UTF8String.fromString` in `getAndReset` and here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137975214 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -193,14 +228,30 @@ private[sql] class JacksonGenerator( * * @param row The row to convert */ - def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, rootFieldWriters)) + def write(row: InternalRow): Unit = dataType match { +case st: StructType => + writeObject(writeFields(row, st, rootFieldWriters)) +case _ => throw new UnsupportedOperationException( + s"`JacksonGenerator` can only be used to write out a row when initialized with `StructType`.") + } /** - * Transforms multiple `InternalRow`s to JSON array using Jackson + * Transforms multiple `InternalRow`s or `MapData`s to JSON array using Jackson * - * @param array The array of rows to convert + * @param array The array of rows or maps to convert */ def write(array: ArrayData): Unit = writeArray(writeArrayData(array, arrElementWriter)) + /** + * Transforms a `MapData` to JSON object using Jackson + * + * @param map a map to convert + */ + def write(map: MapData): Unit = dataType match { +case mt: MapType => writeObject(writeMapData(map, mt, mapElementWriter)) +case _ => throw new UnsupportedOperationException( --- End diff -- Similar to https://github.com/apache/spark/pull/18875/files#r137975134, we can avoid this. Doing pattern matching per writing seems too burden. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137975134 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -193,14 +228,32 @@ private[sql] class JacksonGenerator( * * @param row The row to convert */ - def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, rootFieldWriters)) + def write(row: InternalRow): Unit = dataType match { +case st: StructType => + writeObject(writeFields(row, st, rootFieldWriters)) +case _ => throw new UnsupportedOperationException( --- End diff -- Rethinking about this, I think we can avoid matching the data type and simply do `writeObject(writeFields(row, schema, rootFieldWriters))` like before. When accessing `rootFieldWriters`, there is an exception thrown then. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137974036 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -193,14 +228,32 @@ private[sql] class JacksonGenerator( * * @param row The row to convert */ - def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, rootFieldWriters)) + def write(row: InternalRow): Unit = dataType match { +case st: StructType => + writeObject(writeFields(row, st, rootFieldWriters)) +case _ => throw new UnsupportedOperationException( + s"`JacksonGenerator` can only be used to write out a row when initialized with `StructType`.") + } /** - * Transforms multiple `InternalRow`s to JSON array using Jackson + * Transforms multiple `InternalRow`s or `MapData`s to JSON array using Jackson * - * @param array The array of rows to convert + * @param array The array of rows or maps to convert */ - def write(array: ArrayData): Unit = writeArray(writeArrayData(array, arrElementWriter)) + def write(array: ArrayData): Unit = { +writeArray(writeArrayData(array, arrElementWriter)) + } --- End diff -- We don't need to change this line. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137973919 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -26,20 +26,55 @@ import org.apache.spark.sql.catalyst.expressions.SpecializedGetters import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, MapData} import org.apache.spark.sql.types._ +/** + * `JackGenerator` can only be initialized with a `StructType` or a `MapType`. + * Once it is initialized with `StructType`, it can be used to write out a struct or an array of + * struct. Once it is initialized with `MapType`, it can be used to write out a map. An exception --- End diff -- `Once it is initialized with ``MapType``, it can be used to write out a map or an array of map.` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user goldmedal commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137959771 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -193,14 +228,58 @@ private[sql] class JacksonGenerator( * * @param row The row to convert */ - def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, rootFieldWriters)) + def write(row: InternalRow): Unit = dataType match { +case st: StructType => + writeObject(writeFields(row, st, rootFieldWriters)) +case _ => throw new UnsupportedOperationException( + s"`JacksonGenerator` can only be used to write out a row when initialized with `StructType`.") + } /** - * Transforms multiple `InternalRow`s to JSON array using Jackson + * Transforms multiple `InternalRow`s or `MapData`s to JSON array using Jackson * - * @param array The array of rows to convert + * @param array The array of rows or maps to convert */ - def write(array: ArrayData): Unit = writeArray(writeArrayData(array, arrElementWriter)) + def write(array: ArrayData): Unit = dataType match { +case st: StructType => + try { --- End diff -- I think you're right. If the caller has an error calling, it will also throw a `ClassCastException` to remind the caller. So we're unnecessary to check it. I'll remove it. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user goldmedal commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137958902 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -193,14 +228,58 @@ private[sql] class JacksonGenerator( * * @param row The row to convert */ - def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, rootFieldWriters)) + def write(row: InternalRow): Unit = dataType match { +case st: StructType => + writeObject(writeFields(row, st, rootFieldWriters)) +case _ => throw new UnsupportedOperationException( + s"`JacksonGenerator` can only be used to write out a row when initialized with `StructType`.") + } /** - * Transforms multiple `InternalRow`s to JSON array using Jackson + * Transforms multiple `InternalRow`s or `MapData`s to JSON array using Jackson * - * @param array The array of rows to convert + * @param array The array of rows or maps to convert */ - def write(array: ArrayData): Unit = writeArray(writeArrayData(array, arrElementWriter)) + def write(array: ArrayData): Unit = dataType match { +case st: StructType => + try { +if (array.numElements() > 0) { + array.getStruct(0, st.length) +} + } catch { +case cce: ClassCastException => + throw new UnsupportedOperationException( +s"`JacksonGenerator` can only be used to write out an array of struct " + + s"when initialized with `StructType`") + } + writeArray(writeArrayData(array, arrElementWriter)) +case _: MapType => + try { +if (array.numElements() > 0) { + array.getMap(0) +} + } catch { +case cce: ClassCastException => + throw new UnsupportedOperationException( +s"`JacksonGenerator` can only be used to write out an array of map when initialized" + +s"with `MapType`") + } + writeArray(writeArrayData(array, arrElementWriter)) +case _ => throw new UnsupportedOperationException( --- End diff -- Yes, you're right. I'll remove this pattern case. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137950179 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -193,14 +228,58 @@ private[sql] class JacksonGenerator( * * @param row The row to convert */ - def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, rootFieldWriters)) + def write(row: InternalRow): Unit = dataType match { +case st: StructType => + writeObject(writeFields(row, st, rootFieldWriters)) +case _ => throw new UnsupportedOperationException( + s"`JacksonGenerator` can only be used to write out a row when initialized with `StructType`.") + } /** - * Transforms multiple `InternalRow`s to JSON array using Jackson + * Transforms multiple `InternalRow`s or `MapData`s to JSON array using Jackson * - * @param array The array of rows to convert + * @param array The array of rows or maps to convert */ - def write(array: ArrayData): Unit = writeArray(writeArrayData(array, arrElementWriter)) + def write(array: ArrayData): Unit = dataType match { +case st: StructType => + try { +if (array.numElements() > 0) { + array.getStruct(0, st.length) +} + } catch { +case cce: ClassCastException => + throw new UnsupportedOperationException( +s"`JacksonGenerator` can only be used to write out an array of struct " + + s"when initialized with `StructType`") + } + writeArray(writeArrayData(array, arrElementWriter)) +case _: MapType => + try { +if (array.numElements() > 0) { + array.getMap(0) +} + } catch { +case cce: ClassCastException => + throw new UnsupportedOperationException( +s"`JacksonGenerator` can only be used to write out an array of map when initialized" + +s"with `MapType`") + } + writeArray(writeArrayData(array, arrElementWriter)) +case _ => throw new UnsupportedOperationException( --- End diff -- I think we already check the `dataType` is valid when constructing this. We don't need this pattern case. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137950158 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -193,14 +228,58 @@ private[sql] class JacksonGenerator( * * @param row The row to convert */ - def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, rootFieldWriters)) + def write(row: InternalRow): Unit = dataType match { +case st: StructType => + writeObject(writeFields(row, st, rootFieldWriters)) +case _ => throw new UnsupportedOperationException( + s"`JacksonGenerator` can only be used to write out a row when initialized with `StructType`.") + } /** - * Transforms multiple `InternalRow`s to JSON array using Jackson + * Transforms multiple `InternalRow`s or `MapData`s to JSON array using Jackson * - * @param array The array of rows to convert + * @param array The array of rows or maps to convert */ - def write(array: ArrayData): Unit = writeArray(writeArrayData(array, arrElementWriter)) + def write(array: ArrayData): Unit = dataType match { +case st: StructType => + try { +if (array.numElements() > 0) { + array.getStruct(0, st.length) +} + } catch { +case cce: ClassCastException => + throw new UnsupportedOperationException( +s"`JacksonGenerator` can only be used to write out an array of struct " + + s"when initialized with `StructType`") + } + writeArray(writeArrayData(array, arrElementWriter)) +case _: MapType => + try { --- End diff -- ditto. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137950153 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -193,14 +228,58 @@ private[sql] class JacksonGenerator( * * @param row The row to convert */ - def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, rootFieldWriters)) + def write(row: InternalRow): Unit = dataType match { +case st: StructType => + writeObject(writeFields(row, st, rootFieldWriters)) +case _ => throw new UnsupportedOperationException( + s"`JacksonGenerator` can only be used to write out a row when initialized with `StructType`.") + } /** - * Transforms multiple `InternalRow`s to JSON array using Jackson + * Transforms multiple `InternalRow`s or `MapData`s to JSON array using Jackson * - * @param array The array of rows to convert + * @param array The array of rows or maps to convert */ - def write(array: ArrayData): Unit = writeArray(writeArrayData(array, arrElementWriter)) + def write(array: ArrayData): Unit = dataType match { +case st: StructType => + try { --- End diff -- If we assume that the caller know what it's doing, we can get rid of this check. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user goldmedal commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137710147 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -26,20 +26,50 @@ import org.apache.spark.sql.catalyst.expressions.SpecializedGetters import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, MapData} import org.apache.spark.sql.types._ +// `JackGenerator` can only be initialized with a `StructType` or a `MapType`. +// Once it is initialized with `StructType`, it can be used to write out a struct or an array of +// struct. Once it is initialized with `MapType`, it can be used to write out a map. An exception +// will be thrown if trying to write out a struct if it is initialized with a `MapType`, +// and vice verse. --- End diff -- ok. I'll modify it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137706345 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -26,20 +26,50 @@ import org.apache.spark.sql.catalyst.expressions.SpecializedGetters import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, MapData} import org.apache.spark.sql.types._ +// `JackGenerator` can only be initialized with a `StructType` or a `MapType`. +// Once it is initialized with `StructType`, it can be used to write out a struct or an array of +// struct. Once it is initialized with `MapType`, it can be used to write out a map. An exception +// will be thrown if trying to write out a struct if it is initialized with a `MapType`, +// and vice verse. --- End diff -- For this kind of comment, we use the style like: /** * Code comments... * */ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user goldmedal commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137288735 --- Diff: sql/core/src/test/resources/sql-tests/results/cross-join.sql.out --- @@ -128,6 +128,7 @@ two 2 two 2 one 1 two 2 two2 two 2 three 3 two 2 two2 two 2 two 2 two 2 + --- End diff -- ummm. That's so weird. I'm not sure about it but maybe this while line was be generated when I regenerated the golden file. I'll recover it. =P --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137287045 --- Diff: sql/core/src/test/resources/sql-tests/results/cross-join.sql.out --- @@ -128,6 +128,7 @@ two 2 two 2 one 1 two 2 two2 two 2 three 3 two 2 two2 two 2 two 2 two 2 + --- End diff -- I don't saw you commit change to `cross-join.sql`, why we have change to `cross-join.sql.out`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user goldmedal commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137285989 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -669,14 +679,25 @@ case class StructsToJson( override def checkInputDataTypes(): TypeCheckResult = child.dataType match { case _: StructType | ArrayType(_: StructType, _) => try { -JacksonUtils.verifySchema(rowSchema) +JacksonUtils.verifySchema(rowSchema.asInstanceOf[StructType]) +TypeCheckResult.TypeCheckSuccess + } catch { +case e: UnsupportedOperationException => + TypeCheckResult.TypeCheckFailure(e.getMessage) + } +case MapType(_: DataType, st: StructType, _: Boolean) => + try { +JacksonUtils.verifySchema(st) TypeCheckResult.TypeCheckSuccess } catch { case e: UnsupportedOperationException => TypeCheckResult.TypeCheckFailure(e.getMessage) } +case _: MapType => + TypeCheckResult.TypeCheckSuccess --- End diff -- ok. thanks for review =) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137284596 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -22,24 +22,49 @@ import java.io.Writer import com.fasterxml.jackson.core._ import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult --- End diff -- Don't forget to remove this useless import. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137284389 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -669,14 +679,25 @@ case class StructsToJson( override def checkInputDataTypes(): TypeCheckResult = child.dataType match { case _: StructType | ArrayType(_: StructType, _) => try { -JacksonUtils.verifySchema(rowSchema) +JacksonUtils.verifySchema(rowSchema.asInstanceOf[StructType]) +TypeCheckResult.TypeCheckSuccess + } catch { +case e: UnsupportedOperationException => + TypeCheckResult.TypeCheckFailure(e.getMessage) + } +case MapType(_: DataType, st: StructType, _: Boolean) => + try { +JacksonUtils.verifySchema(st) TypeCheckResult.TypeCheckSuccess } catch { case e: UnsupportedOperationException => TypeCheckResult.TypeCheckFailure(e.getMessage) } +case _: MapType => + TypeCheckResult.TypeCheckSuccess --- End diff -- This part of change looks verbose as `MapType` shows in two pattern cases. I will look into this part tomorrow. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137283687 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -22,24 +22,49 @@ import java.io.Writer import com.fasterxml.jackson.core._ import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult import org.apache.spark.sql.catalyst.expressions.SpecializedGetters import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, MapData} import org.apache.spark.sql.types._ private[sql] class JacksonGenerator( --- End diff -- The change to `JacksonGenerator` needs tests. Looks like we don't have existing tests for `JacksonGenerator` too. I think we are better to have one. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137282534 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -193,14 +218,34 @@ private[sql] class JacksonGenerator( * * @param row The row to convert */ - def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, rootFieldWriters)) + def write(row: InternalRow): Unit = dataType match { +case st: StructType => + writeObject(writeFields(row, st, rootFieldWriters)) +case _ => throw new UnsupportedOperationException( + s"this api is only used when `JacksonGenerator` is initialized with `StructType`") + } /** * Transforms multiple `InternalRow`s to JSON array using Jackson * * @param array The array of rows to convert */ - def write(array: ArrayData): Unit = writeArray(writeArrayData(array, arrElementWriter)) + def write(array: ArrayData): Unit = dataType match { +case _: StructType => writeArray(writeArrayData(array, arrElementWriter)) +case _ => throw new UnsupportedOperationException( + s"this api is only used when `JacksonGenerator` is initialized with `StructType`") --- End diff -- `JacksonGenerator` can only be used to write out an array when initialized with `StructType`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user goldmedal commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137282128 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -22,24 +22,49 @@ import java.io.Writer import com.fasterxml.jackson.core._ import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult import org.apache.spark.sql.catalyst.expressions.SpecializedGetters import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, MapData} import org.apache.spark.sql.types._ private[sql] class JacksonGenerator( -schema: StructType, +dataType: DataType, writer: Writer, options: JSONOptions) { // A `ValueWriter` is responsible for writing a field of an `InternalRow` to appropriate // JSON data. Here we are using `SpecializedGetters` rather than `InternalRow` so that // we can directly access data in `ArrayData` without the help of `SpecificMutableRow`. private type ValueWriter = (SpecializedGetters, Int) => Unit + // `JackGenerator` only supports to write out a struct, an array of struct or an arbitrary map --- End diff -- ok. I'll change it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137282296 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -193,14 +218,34 @@ private[sql] class JacksonGenerator( * * @param row The row to convert */ - def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, rootFieldWriters)) + def write(row: InternalRow): Unit = dataType match { +case st: StructType => + writeObject(writeFields(row, st, rootFieldWriters)) +case _ => throw new UnsupportedOperationException( + s"this api is only used when `JacksonGenerator` is initialized with `StructType`") --- End diff -- `JacksonGenerator` can only be used to write out a row when initialized with `StructType`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137281077 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -22,24 +22,49 @@ import java.io.Writer import com.fasterxml.jackson.core._ import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult import org.apache.spark.sql.catalyst.expressions.SpecializedGetters import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, MapData} import org.apache.spark.sql.types._ private[sql] class JacksonGenerator( --- End diff -- We should add comment for `JacksonGenerator`. E.g., `JackGenerator` can only be initialized with a `StructType` or a `MapType`. Once it is initialized with `StructType`, it can be used to write out a struct or an array of struct. Once it is initialized with `MapType`, it can be used to write out a map. An exception will be thrown if trying to write out a struct if it is initialized with a `MapType`, and vice verse. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137280095 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -22,24 +22,49 @@ import java.io.Writer import com.fasterxml.jackson.core._ import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult import org.apache.spark.sql.catalyst.expressions.SpecializedGetters import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, MapData} import org.apache.spark.sql.types._ private[sql] class JacksonGenerator( -schema: StructType, +dataType: DataType, writer: Writer, options: JSONOptions) { // A `ValueWriter` is responsible for writing a field of an `InternalRow` to appropriate // JSON data. Here we are using `SpecializedGetters` rather than `InternalRow` so that // we can directly access data in `ArrayData` without the help of `SpecificMutableRow`. private type ValueWriter = (SpecializedGetters, Int) => Unit + // `JackGenerator` only supports to write out a struct, an array of struct or an arbitrary map --- End diff -- `JackGenerator` can only be initialized with a StructType or a MapType. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user goldmedal commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137273266 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -22,24 +22,50 @@ import java.io.Writer import com.fasterxml.jackson.core._ import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult import org.apache.spark.sql.catalyst.expressions.SpecializedGetters import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, MapData} import org.apache.spark.sql.types._ private[sql] class JacksonGenerator( -schema: StructType, +dataType: DataType, writer: Writer, options: JSONOptions) { // A `ValueWriter` is responsible for writing a field of an `InternalRow` to appropriate // JSON data. Here we are using `SpecializedGetters` rather than `InternalRow` so that // we can directly access data in `ArrayData` without the help of `SpecificMutableRow`. private type ValueWriter = (SpecializedGetters, Int) => Unit + // `JackGenerator` only supports to write out a struct, an array of struct or an arbitrary map + dataType match { +case _: StructType | _: MapType => + TypeCheckResult.TypeCheckSuccess --- End diff -- ok --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user goldmedal commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137269586 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -127,7 +153,7 @@ private[sql] class JacksonGenerator( (row: SpecializedGetters, ordinal: Int) => val v = row.get(ordinal, dataType) sys.error(s"Failed to convert value $v (class of ${v.getClass}}) " + - s"with the type of $dataType to JSON.") +s"with the type of $dataType to JSON.") --- End diff -- ummm.. I think it maybe modified by IDE automatically. I'll recover it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137268446 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -127,7 +153,7 @@ private[sql] class JacksonGenerator( (row: SpecializedGetters, ordinal: Int) => val v = row.get(ordinal, dataType) sys.error(s"Failed to convert value $v (class of ${v.getClass}}) " + - s"with the type of $dataType to JSON.") +s"with the type of $dataType to JSON.") --- End diff -- Why change this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137268116 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -22,24 +22,50 @@ import java.io.Writer import com.fasterxml.jackson.core._ import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult import org.apache.spark.sql.catalyst.expressions.SpecializedGetters import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, MapData} import org.apache.spark.sql.types._ private[sql] class JacksonGenerator( -schema: StructType, +dataType: DataType, writer: Writer, options: JSONOptions) { // A `ValueWriter` is responsible for writing a field of an `InternalRow` to appropriate // JSON data. Here we are using `SpecializedGetters` rather than `InternalRow` so that // we can directly access data in `ArrayData` without the help of `SpecificMutableRow`. private type ValueWriter = (SpecializedGetters, Int) => Unit + // `JackGenerator` only supports to write out a struct, an array of struct or an arbitrary map + dataType match { +case _: StructType | _: MapType => + TypeCheckResult.TypeCheckSuccess --- End diff -- We don't need `TypeCheckResult.TypeCheckSuccess`. Just no-op. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137227950 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -193,14 +213,27 @@ private[sql] class JacksonGenerator( * * @param row The row to convert */ - def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, rootFieldWriters)) + def write(row: InternalRow): Unit = { +writeObject(writeFields(row, rowSchema.asInstanceOf[StructType], rootFieldWriters)) + } /** * Transforms multiple `InternalRow`s to JSON array using Jackson * * @param array The array of rows to convert */ - def write(array: ArrayData): Unit = writeArray(writeArrayData(array, arrElementWriter)) + def write(array: ArrayData): Unit = { +writeArray(writeArrayData(array, arrElementWriter)) + } + + /** + * Transforms a `MapData` to JSON object using Jackson + * + * @param map a map to convert + */ + def write(map: MapData): Unit = { +writeObject(writeMapData(map, rowSchema.asInstanceOf[MapType], mapElementWriter)) --- End diff -- ditto. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137227920 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -193,14 +213,27 @@ private[sql] class JacksonGenerator( * * @param row The row to convert */ - def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, rootFieldWriters)) + def write(row: InternalRow): Unit = { +writeObject(writeFields(row, rowSchema.asInstanceOf[StructType], rootFieldWriters)) + } /** * Transforms multiple `InternalRow`s to JSON array using Jackson * * @param array The array of rows to convert */ - def write(array: ArrayData): Unit = writeArray(writeArrayData(array, arrElementWriter)) + def write(array: ArrayData): Unit = { +writeArray(writeArrayData(array, arrElementWriter)) --- End diff -- ditto. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137227883 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -193,14 +213,27 @@ private[sql] class JacksonGenerator( * * @param row The row to convert */ - def write(row: InternalRow): Unit = writeObject(writeFields(row, schema, rootFieldWriters)) + def write(row: InternalRow): Unit = { +writeObject(writeFields(row, rowSchema.asInstanceOf[StructType], rootFieldWriters)) --- End diff -- We should match `dataType` here too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137227592 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -22,24 +22,44 @@ import java.io.Writer import com.fasterxml.jackson.core._ import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult import org.apache.spark.sql.catalyst.expressions.SpecializedGetters import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, MapData} import org.apache.spark.sql.types._ private[sql] class JacksonGenerator( -schema: StructType, +rowSchema: DataType, writer: Writer, options: JSONOptions) { + // A `ValueWriter` is responsible for writing a field of an `InternalRow` to appropriate // JSON data. Here we are using `SpecializedGetters` rather than `InternalRow` so that // we can directly access data in `ArrayData` without the help of `SpecificMutableRow`. private type ValueWriter = (SpecializedGetters, Int) => Unit + // `JackGenerator` only supports to write out a struct, an array of struct or an arbitrary map + rowSchema match { +case _: StructType | _: MapType => + TypeCheckResult.TypeCheckSuccess +case _ => TypeCheckResult.TypeCheckFailure( + s"Input type ${rowSchema.simpleString} must be a struct or a map") + } + // `ValueWriter`s for all fields of the schema - private val rootFieldWriters: Array[ValueWriter] = schema.map(_.dataType).map(makeWriter).toArray + private lazy val rootFieldWriters: Array[ValueWriter] = { + rowSchema.asInstanceOf[StructType].map(_.dataType).map(makeWriter).toArray + } + // `ValueWriter` for array data storing rows of the schema. - private val arrElementWriter: ValueWriter = (arr: SpecializedGetters, i: Int) => { -writeObject(writeFields(arr.getStruct(i, schema.length), schema, rootFieldWriters)) + private lazy val arrElementWriter: ValueWriter = { --- End diff -- ditto. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137227713 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -22,24 +22,44 @@ import java.io.Writer import com.fasterxml.jackson.core._ import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult import org.apache.spark.sql.catalyst.expressions.SpecializedGetters import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, MapData} import org.apache.spark.sql.types._ private[sql] class JacksonGenerator( -schema: StructType, +rowSchema: DataType, writer: Writer, options: JSONOptions) { + // A `ValueWriter` is responsible for writing a field of an `InternalRow` to appropriate // JSON data. Here we are using `SpecializedGetters` rather than `InternalRow` so that // we can directly access data in `ArrayData` without the help of `SpecificMutableRow`. private type ValueWriter = (SpecializedGetters, Int) => Unit + // `JackGenerator` only supports to write out a struct, an array of struct or an arbitrary map + rowSchema match { +case _: StructType | _: MapType => + TypeCheckResult.TypeCheckSuccess +case _ => TypeCheckResult.TypeCheckFailure( + s"Input type ${rowSchema.simpleString} must be a struct or a map") + } + // `ValueWriter`s for all fields of the schema - private val rootFieldWriters: Array[ValueWriter] = schema.map(_.dataType).map(makeWriter).toArray + private lazy val rootFieldWriters: Array[ValueWriter] = { + rowSchema.asInstanceOf[StructType].map(_.dataType).map(makeWriter).toArray + } + // `ValueWriter` for array data storing rows of the schema. - private val arrElementWriter: ValueWriter = (arr: SpecializedGetters, i: Int) => { -writeObject(writeFields(arr.getStruct(i, schema.length), schema, rootFieldWriters)) + private lazy val arrElementWriter: ValueWriter = { +(arr: SpecializedGetters, i: Int) => { + val schema: StructType = rowSchema.asInstanceOf[StructType] + writeObject(writeFields(arr.getStruct(i, schema.length), schema, rootFieldWriters)) +} + } + + private lazy val mapElementWriter: ValueWriter = { +makeWriter(rowSchema.asInstanceOf[MapType].valueType) --- End diff -- ditto. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137227541 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -22,24 +22,44 @@ import java.io.Writer import com.fasterxml.jackson.core._ import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult import org.apache.spark.sql.catalyst.expressions.SpecializedGetters import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, MapData} import org.apache.spark.sql.types._ private[sql] class JacksonGenerator( -schema: StructType, +rowSchema: DataType, writer: Writer, options: JSONOptions) { + // A `ValueWriter` is responsible for writing a field of an `InternalRow` to appropriate // JSON data. Here we are using `SpecializedGetters` rather than `InternalRow` so that // we can directly access data in `ArrayData` without the help of `SpecificMutableRow`. private type ValueWriter = (SpecializedGetters, Int) => Unit + // `JackGenerator` only supports to write out a struct, an array of struct or an arbitrary map + rowSchema match { +case _: StructType | _: MapType => + TypeCheckResult.TypeCheckSuccess +case _ => TypeCheckResult.TypeCheckFailure( + s"Input type ${rowSchema.simpleString} must be a struct or a map") + } + // `ValueWriter`s for all fields of the schema - private val rootFieldWriters: Array[ValueWriter] = schema.map(_.dataType).map(makeWriter).toArray + private lazy val rootFieldWriters: Array[ValueWriter] = { + rowSchema.asInstanceOf[StructType].map(_.dataType).map(makeWriter).toArray --- End diff -- We should take care when accessing `rootFieldWriters` if the given data type is a map type. E.g.: private lazy val rootFieldWriters: Array[ValueWriter] = dataType match { case st: StructType => st.map(_.dataType).map(makeWriter).toArray case mt: MapType => throw UnsupportedOperationException("...") } --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137226892 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -22,24 +22,44 @@ import java.io.Writer import com.fasterxml.jackson.core._ import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult import org.apache.spark.sql.catalyst.expressions.SpecializedGetters import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, MapData} import org.apache.spark.sql.types._ private[sql] class JacksonGenerator( -schema: StructType, +rowSchema: DataType, writer: Writer, options: JSONOptions) { + // A `ValueWriter` is responsible for writing a field of an `InternalRow` to appropriate // JSON data. Here we are using `SpecializedGetters` rather than `InternalRow` so that // we can directly access data in `ArrayData` without the help of `SpecificMutableRow`. private type ValueWriter = (SpecializedGetters, Int) => Unit + // `JackGenerator` only supports to write out a struct, an array of struct or an arbitrary map + rowSchema match { +case _: StructType | _: MapType => + TypeCheckResult.TypeCheckSuccess +case _ => TypeCheckResult.TypeCheckFailure( + s"Input type ${rowSchema.simpleString} must be a struct or a map") --- End diff -- `TypeCheckResult` actually doesn't prevent you to construct this `JacksonGenerator`. We can throw an `UnsupportedOperationException` for unsupported types. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137226390 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -22,24 +22,44 @@ import java.io.Writer import com.fasterxml.jackson.core._ import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult import org.apache.spark.sql.catalyst.expressions.SpecializedGetters import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, MapData} import org.apache.spark.sql.types._ private[sql] class JacksonGenerator( -schema: StructType, +rowSchema: DataType, --- End diff -- Because it now can be a map type and not a schema, may be just name it as `dataType: DataType`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137226222 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -22,24 +22,44 @@ import java.io.Writer import com.fasterxml.jackson.core._ import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult import org.apache.spark.sql.catalyst.expressions.SpecializedGetters import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, MapData} import org.apache.spark.sql.types._ private[sql] class JacksonGenerator( -schema: StructType, +rowSchema: DataType, writer: Writer, options: JSONOptions) { + --- End diff -- Nit: We don't need to add this extra space. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137225157 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -640,6 +644,7 @@ case class StructsToJson( lazy val rowSchema = child.dataType match { case st: StructType => st case ArrayType(st: StructType, _) => st +case MapType(_: DataType, _: DataType, _) => child.dataType --- End diff -- We can add a todo: `//TODO: support ArrayType(MapType)`, if we don't plan to add the support in this PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137224911 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -640,6 +644,7 @@ case class StructsToJson( lazy val rowSchema = child.dataType match { case st: StructType => st case ArrayType(st: StructType, _) => st +case MapType(_: DataType, _: DataType, _) => child.dataType --- End diff -- case mt: MapType => mt --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137224595 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -661,6 +666,10 @@ case class StructsToJson( (arr: Any) => gen.write(arr.asInstanceOf[ArrayData]) getAndReset() + case MapType(_: DataType, _: DataType, _: Boolean) => --- End diff -- For arbitrary map, we can simply write `case _: MapType => ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user goldmedal commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137048876 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -27,21 +27,45 @@ import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, MapData} import org.apache.spark.sql.types._ private[sql] class JacksonGenerator( -schema: StructType, +childType: DataType, +rowSchema: StructType, --- End diff -- thanks for review =). I will follow your suggestion to change it. But I think `JacksonGenerator` only support write out an arbitrary map, it doesn't support to write out an array of map yet. Should I need to fix it? I think that maybe an issue for supporting arbitrary array? Should I need to do some check for API `write(row: InternalRow)` calling? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user goldmedal commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r137044817 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -27,21 +27,45 @@ import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, MapData} import org.apache.spark.sql.types._ private[sql] class JacksonGenerator( -schema: StructType, +childType: DataType, --- End diff -- ok. I recover it and use `rowSchema` to take what `MapType` needs. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r136997720 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -27,21 +27,45 @@ import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, MapData} import org.apache.spark.sql.types._ private[sql] class JacksonGenerator( -schema: StructType, +childType: DataType, +rowSchema: StructType, --- End diff -- That's said, for now `JacksonGenerator` supports to write out a struct or an array of struct. After revising, it can also support to write out map, array of map. Once a `JacksonGenerator` is initialized with a `MapType`, it is not valid to call the API `write(row: InternalRow)`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r136996807 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -27,21 +27,45 @@ import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, MapData} import org.apache.spark.sql.types._ private[sql] class JacksonGenerator( -schema: StructType, +childType: DataType, +rowSchema: StructType, --- End diff -- We can loose the type of `rowSchema` from `StructType` to general `DataType`. Since only `StructType` and `MapType` are valid options, we can check this in `JacksonGenerator`'s constructor. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r136995614 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -27,21 +27,45 @@ import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, MapData} import org.apache.spark.sql.types._ private[sql] class JacksonGenerator( -schema: StructType, +childType: DataType, --- End diff -- This parameter looks weird to me. We don't want to tightly link `JacksonGenerator` to what this PR did. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...
Github user goldmedal commented on a diff in the pull request: https://github.com/apache/spark/pull/18875#discussion_r134469528 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala --- @@ -202,5 +203,9 @@ private[sql] class JacksonGenerator( */ def write(array: ArrayData): Unit = writeArray(writeArrayData(array, arrElementWriter)) + def write(map: MapData, mapType: MapType): Unit = { --- End diff -- sorry to reply too late. I thought that others `write` function don't take any type because `JacksonGenerator` has a `schema` member. Because `writeMapData` needs `mapType`, I made `write` function of `MapType` take it. I try to fix this problem by making `childType` matching move to `JacksonGenerator` in new commits a254f89 but I'm not sure if this way is better. @HyukjinKwon @viirya How do you thinks about this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org