[GitHub] spark pull request #22251: [SPARK-25260][SQL] Fix namespace handling in Sche...

2018-08-28 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22251


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22251: [SPARK-25260][SQL] Fix namespace handling in Sche...

2018-08-28 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22251#discussion_r213407599
  
--- Diff: 
external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala ---
@@ -1099,6 +1098,27 @@ class AvroSuite extends QueryTest with 
SharedSQLContext with SQLTestUtils {
 }
   }
 
+  test("check namespace - toAvroType") {
+val sparkSchema = StructType(Seq(
+  StructField("name", StringType, nullable = false),
+  StructField("address", StructType(Seq(
+StructField("city", StringType, nullable = false),
+StructField("state", StringType, nullable = false))),
+nullable = false)))
+val employeeType = SchemaConverters.toAvroType(sparkSchema,
+  recordName = "employee",
+  nameSpace = "foo.bar")
--- End diff --

Added a test case for toAvroType with empty namespace


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22251: [SPARK-25260][SQL] Fix namespace handling in Sche...

2018-08-28 Thread arunmahadevan
Github user arunmahadevan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22251#discussion_r213407441
  
--- Diff: 
external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala ---
@@ -1099,6 +1098,27 @@ class AvroSuite extends QueryTest with 
SharedSQLContext with SQLTestUtils {
 }
   }
 
+  test("check namespace - toAvroType") {
--- End diff --

Its sort of covered in the below existing cases. Do you think we need more?

[Validate namespace in avro file that has nested records with the same 
name](https://github.com/apache/spark/blob/master/external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala#L1078)
[conversion to avro and back with 
namespace](https://github.com/apache/spark/blob/master/external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala#L510)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22251: [SPARK-25260][SQL] Fix namespace handling in Sche...

2018-08-28 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/22251#discussion_r213233392
  
--- Diff: 
external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala ---
@@ -1099,6 +1098,27 @@ class AvroSuite extends QueryTest with 
SharedSQLContext with SQLTestUtils {
 }
   }
 
+  test("check namespace - toAvroType") {
+val sparkSchema = StructType(Seq(
+  StructField("name", StringType, nullable = false),
+  StructField("address", StructType(Seq(
+StructField("city", StringType, nullable = false),
+StructField("state", StringType, nullable = false))),
+nullable = false)))
+val employeeType = SchemaConverters.toAvroType(sparkSchema,
+  recordName = "employee",
+  nameSpace = "foo.bar")
--- End diff --

nit: could you also add a case for `nameSpace` as `""` ? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22251: [SPARK-25260][SQL] Fix namespace handling in Sche...

2018-08-28 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22251#discussion_r213232867
  
--- Diff: 
external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala ---
@@ -1099,6 +1098,27 @@ class AvroSuite extends QueryTest with 
SharedSQLContext with SQLTestUtils {
 }
   }
 
+  test("check namespace - toAvroType") {
--- End diff --

@arunmahadevan, can we add a simple end-to-end test as well?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22251: [SPARK-25260][SQL] Fix namespace handling in Sche...

2018-08-28 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/22251#discussion_r213231887
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala 
---
@@ -143,29 +143,25 @@ object SchemaConverters {
 val avroType = LogicalTypes.decimal(d.precision, d.scale)
 val fixedSize = minBytesForPrecision(d.precision)
 // Need to avoid naming conflict for the fixed fields
-val name = prevNameSpace match {
+val name = nameSpace match {
   case "" => s"$recordName.fixed"
-  case _ => s"$prevNameSpace.$recordName.fixed"
+  case _ => s"$nameSpace.$recordName.fixed"
 }
 avroType.addToSchema(SchemaBuilder.fixed(name).size(fixedSize))
 
   case BinaryType => builder.bytesType()
   case ArrayType(et, containsNull) =>
 builder.array()
-  .items(toAvroType(et, containsNull, recordName, prevNameSpace))
+  .items(toAvroType(et, containsNull, recordName, nameSpace))
   case MapType(StringType, vt, valueContainsNull) =>
 builder.map()
-  .values(toAvroType(vt, valueContainsNull, recordName, 
prevNameSpace))
+  .values(toAvroType(vt, valueContainsNull, recordName, nameSpace))
   case st: StructType =>
-val nameSpace = prevNameSpace match {
-  case "" => recordName
-  case _ => s"$prevNameSpace.$recordName"
-}
-
+val childNameSpace = if (nameSpace != "") 
s"$nameSpace.$recordName" else recordName
 val fieldsAssembler = 
builder.record(recordName).namespace(nameSpace).fields()
--- End diff --

+1, this line is the only difference for the whole code change. The 
namespace here should not be the one with `recordName` at the end.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22251: [SPARK-25260][SQL] Fix namespace handling in Sche...

2018-08-28 Thread arunmahadevan
GitHub user arunmahadevan opened a pull request:

https://github.com/apache/spark/pull/22251

[SPARK-25260][SQL] Fix namespace handling in SchemaConverters.toAvroType

## What changes were proposed in this pull request?

`toAvroType` converts spark data type to avro schema. It always appends the 
record name to namespace so its impossible to have an Avro namespace 
independent of the record name.

 
When invoked with a spark data type like,

```java
val sparkSchema = StructType(Seq(
StructField("name", StringType, nullable = false),
StructField("address", StructType(Seq(
StructField("city", StringType, nullable = false),
StructField("state", StringType, nullable = false))),
nullable = false)))
 
// map it to an avro schema with record name "employee" and top level 
namespace "foo.bar",
val avroSchema = SchemaConverters.toAvroType(sparkSchema,  false, 
"employee", "foo.bar")

// result is
// avroSchema.getName = employee
// avroSchema.getNamespace = foo.bar.employee
// avroSchema.getFullname = foo.bar.employee.employee
```
The patch proposes to fix this so that the result is

```
avroSchema.getName = employee
avroSchema.getNamespace = foo.bar
avroSchema.getFullname = foo.bar.employee
```
## How was this patch tested?

New and existing unit tests.

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/arunmahadevan/spark avro-fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22251.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22251


commit f47483951e12d563b7696940a2cfc2fdc3b27ab2
Author: Arun Mahadevan 
Date:   2018-08-28T08:00:17Z

[SPARK-25260][SQL] Fix namespace handling in SchemaConverters.toAvroType




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org