[GitHub] spark pull request #22251: [SPARK-25260][SQL] Fix namespace handling in Sche...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22251 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22251: [SPARK-25260][SQL] Fix namespace handling in Sche...
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/spark/pull/22251#discussion_r213407599 --- Diff: external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala --- @@ -1099,6 +1098,27 @@ class AvroSuite extends QueryTest with SharedSQLContext with SQLTestUtils { } } + test("check namespace - toAvroType") { +val sparkSchema = StructType(Seq( + StructField("name", StringType, nullable = false), + StructField("address", StructType(Seq( +StructField("city", StringType, nullable = false), +StructField("state", StringType, nullable = false))), +nullable = false))) +val employeeType = SchemaConverters.toAvroType(sparkSchema, + recordName = "employee", + nameSpace = "foo.bar") --- End diff -- Added a test case for toAvroType with empty namespace --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22251: [SPARK-25260][SQL] Fix namespace handling in Sche...
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/spark/pull/22251#discussion_r213407441 --- Diff: external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala --- @@ -1099,6 +1098,27 @@ class AvroSuite extends QueryTest with SharedSQLContext with SQLTestUtils { } } + test("check namespace - toAvroType") { --- End diff -- Its sort of covered in the below existing cases. Do you think we need more? [Validate namespace in avro file that has nested records with the same name](https://github.com/apache/spark/blob/master/external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala#L1078) [conversion to avro and back with namespace](https://github.com/apache/spark/blob/master/external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala#L510) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22251: [SPARK-25260][SQL] Fix namespace handling in Sche...
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/22251#discussion_r213233392 --- Diff: external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala --- @@ -1099,6 +1098,27 @@ class AvroSuite extends QueryTest with SharedSQLContext with SQLTestUtils { } } + test("check namespace - toAvroType") { +val sparkSchema = StructType(Seq( + StructField("name", StringType, nullable = false), + StructField("address", StructType(Seq( +StructField("city", StringType, nullable = false), +StructField("state", StringType, nullable = false))), +nullable = false))) +val employeeType = SchemaConverters.toAvroType(sparkSchema, + recordName = "employee", + nameSpace = "foo.bar") --- End diff -- nit: could you also add a case for `nameSpace` as `""` ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22251: [SPARK-25260][SQL] Fix namespace handling in Sche...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22251#discussion_r213232867 --- Diff: external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala --- @@ -1099,6 +1098,27 @@ class AvroSuite extends QueryTest with SharedSQLContext with SQLTestUtils { } } + test("check namespace - toAvroType") { --- End diff -- @arunmahadevan, can we add a simple end-to-end test as well? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22251: [SPARK-25260][SQL] Fix namespace handling in Sche...
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/22251#discussion_r213231887 --- Diff: external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala --- @@ -143,29 +143,25 @@ object SchemaConverters { val avroType = LogicalTypes.decimal(d.precision, d.scale) val fixedSize = minBytesForPrecision(d.precision) // Need to avoid naming conflict for the fixed fields -val name = prevNameSpace match { +val name = nameSpace match { case "" => s"$recordName.fixed" - case _ => s"$prevNameSpace.$recordName.fixed" + case _ => s"$nameSpace.$recordName.fixed" } avroType.addToSchema(SchemaBuilder.fixed(name).size(fixedSize)) case BinaryType => builder.bytesType() case ArrayType(et, containsNull) => builder.array() - .items(toAvroType(et, containsNull, recordName, prevNameSpace)) + .items(toAvroType(et, containsNull, recordName, nameSpace)) case MapType(StringType, vt, valueContainsNull) => builder.map() - .values(toAvroType(vt, valueContainsNull, recordName, prevNameSpace)) + .values(toAvroType(vt, valueContainsNull, recordName, nameSpace)) case st: StructType => -val nameSpace = prevNameSpace match { - case "" => recordName - case _ => s"$prevNameSpace.$recordName" -} - +val childNameSpace = if (nameSpace != "") s"$nameSpace.$recordName" else recordName val fieldsAssembler = builder.record(recordName).namespace(nameSpace).fields() --- End diff -- +1, this line is the only difference for the whole code change. The namespace here should not be the one with `recordName` at the end. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22251: [SPARK-25260][SQL] Fix namespace handling in Sche...
GitHub user arunmahadevan opened a pull request: https://github.com/apache/spark/pull/22251 [SPARK-25260][SQL] Fix namespace handling in SchemaConverters.toAvroType ## What changes were proposed in this pull request? `toAvroType` converts spark data type to avro schema. It always appends the record name to namespace so its impossible to have an Avro namespace independent of the record name. When invoked with a spark data type like, ```java val sparkSchema = StructType(Seq( StructField("name", StringType, nullable = false), StructField("address", StructType(Seq( StructField("city", StringType, nullable = false), StructField("state", StringType, nullable = false))), nullable = false))) // map it to an avro schema with record name "employee" and top level namespace "foo.bar", val avroSchema = SchemaConverters.toAvroType(sparkSchema, false, "employee", "foo.bar") // result is // avroSchema.getName = employee // avroSchema.getNamespace = foo.bar.employee // avroSchema.getFullname = foo.bar.employee.employee ``` The patch proposes to fix this so that the result is ``` avroSchema.getName = employee avroSchema.getNamespace = foo.bar avroSchema.getFullname = foo.bar.employee ``` ## How was this patch tested? New and existing unit tests. Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/arunmahadevan/spark avro-fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22251.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22251 commit f47483951e12d563b7696940a2cfc2fdc3b27ab2 Author: Arun Mahadevan Date: 2018-08-28T08:00:17Z [SPARK-25260][SQL] Fix namespace handling in SchemaConverters.toAvroType --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org