[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #1046: ISSUE-189: Add support for union record type

2020-05-26 Thread GitBox


rdblue commented on a change in pull request #1046:
URL: https://github.com/apache/incubator-iceberg/pull/1046#discussion_r430012232



##
File path: api/src/main/java/org/apache/iceberg/types/Types.java
##
@@ -526,25 +526,25 @@ public static StructType of(List fields) {
   return new StructType(fields, false);
 }
 
-public static StructType of(List fields, boolean 
isUnionSchema) {
-  return new StructType(fields, isUnionSchema);
+public static StructType of(List fields, boolean 
convertedFromUnionSchema) {
+  return new StructType(fields, convertedFromUnionSchema);

Review comment:
   I still don't think that Iceberg needs to model that this was converted. 
The conversion back to a union schema is not something that Iceberg should 
support. Iceberg must only _write_ Iceberg data files and those are not allowed 
to contain Avro unions. The use case that we aim to support is reading data 
files with existing unions, if I understand correctly.

##
File path: core/src/main/java/org/apache/iceberg/avro/SchemaToType.java
##
@@ -106,11 +106,27 @@ public Type record(Schema record, List names, 
List fieldTypes) {
   public Type union(Schema union, List options) {
 Preconditions.checkArgument(AvroSchemaUtil.isOptionSchema(union),
 "Unsupported type: non-option union: %s", union);
-// records, arrays, and maps will check nullability later
-if (options.get(0) == null) {
-  return options.get(1);
-} else {
+if (options.size() == 1) {
   return options.get(0);
+} else if (options.size() == 2) {
+  if (options.get(0) == null) {
+return options.get(1);
+  } else {
+return options.get(0);
+  }
+} else {
+  // Convert complex unions to struct types where field names are member0, 
member1, etc.
+  // This is consistent with the behavior of the spark Avro SchemaConverter
+  List fields = 
Lists.newArrayListWithExpectedSize(options.size());
+  for (int i = 0; i < options.size(); i += 1) {
+Type fieldType = options.get(i);
+if (fieldType == null) {

Review comment:
   I think I see. The reason is that all branches of the union are optional 
and there is no way to encode whether one branch (let alone only one branch) 
will be non-null?

##
File path: 
core/src/test/java/org/apache/iceberg/avro/AvroDataUnionRecordTest.java
##
@@ -0,0 +1,183 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.avro;
+
+import com.google.common.collect.Lists;
+import java.io.File;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecordBuilder;
+import org.apache.iceberg.Files;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.io.FileAppender;
+import org.junit.Assert;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+
+public class AvroDataUnionRecordTest {
+
+  @Rule
+  public TemporaryFolder temp = new TemporaryFolder();
+
+  protected void writeAndValidate(
+  List actualWrite,
+  List expectedRead,
+  Schema icebergSchema) throws IOException {
+File testFile = temp.newFile();
+Assert.assertTrue("Delete should succeed", testFile.delete());
+
+try (FileAppender writer = 
Avro.write(Files.localOutput(testFile))
+.schema(icebergSchema)
+.named("test")
+.build()) {
+  for (GenericData.Record rec : actualWrite) {
+writer.add(rec);
+  }
+}
+
+List rows;
+try (AvroIterable reader = 
Avro.read(Files.localInput(testFile))
+.project(icebergSchema)
+.build()) {
+  rows = Lists.newArrayList(reader);
+}
+
+for (int i = 0; i < expectedRead.size(); i += 1) {
+  AvroTestHelpers.assertEquals(icebergSchema.asStruct(), 
expectedRead.get(i), rows.get(i));
+}
+  }
+
+  @Test
+  public void testMapOfUnionValues() throws IOException {
+String schema1 = "{\n" +
+"  \"name\": \"MapOfUnion\",\n" +
+"  \"type\": 

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #1046: ISSUE-189: Add support for union record type

2020-05-22 Thread GitBox


rdblue commented on a change in pull request #1046:
URL: https://github.com/apache/incubator-iceberg/pull/1046#discussion_r429310052



##
File path: 
core/src/test/java/org/apache/iceberg/avro/AvroDataUnionRecordTest.java
##
@@ -0,0 +1,183 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.avro;
+
+import com.google.common.collect.Lists;
+import java.io.File;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecordBuilder;
+import org.apache.iceberg.Files;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.io.FileAppender;
+import org.junit.Assert;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+
+public class AvroDataUnionRecordTest {
+
+  @Rule
+  public TemporaryFolder temp = new TemporaryFolder();
+
+  protected void writeAndValidate(
+  List actualWrite,
+  List expectedRead,
+  Schema icebergSchema) throws IOException {
+File testFile = temp.newFile();
+Assert.assertTrue("Delete should succeed", testFile.delete());
+
+try (FileAppender writer = 
Avro.write(Files.localOutput(testFile))
+.schema(icebergSchema)
+.named("test")
+.build()) {
+  for (GenericData.Record rec : actualWrite) {
+writer.add(rec);
+  }
+}
+
+List rows;
+try (AvroIterable reader = 
Avro.read(Files.localInput(testFile))
+.project(icebergSchema)
+.build()) {
+  rows = Lists.newArrayList(reader);
+}
+
+for (int i = 0; i < expectedRead.size(); i += 1) {
+  AvroTestHelpers.assertEquals(icebergSchema.asStruct(), 
expectedRead.get(i), rows.get(i));
+}
+  }
+
+  @Test
+  public void testMapOfUnionValues() throws IOException {
+String schema1 = "{\n" +
+"  \"name\": \"MapOfUnion\",\n" +
+"  \"type\": \"record\",\n" +
+"  \"fields\": [\n" +
+"{\n" +
+"  \"name\": \"map\",\n" +
+"  \"type\": [\n" +
+"\"null\",\n" +
+"{\n" +
+"  \"type\": \"map\",\n" +
+"  \"values\": [\n" +
+"\"null\",\n" +
+"\"boolean\",\n" +
+"\"int\",\n" +
+"\"long\",\n" +
+"\"float\",\n" +
+"\"double\",\n" +
+"\"bytes\",\n" +
+"\"string\"\n" +
+"  ]\n" +
+"}\n" +
+"  ]\n" +
+"}\n" +
+"  ]\n" +
+"}";
+org.apache.avro.Schema avroSchema = new 
org.apache.avro.Schema.Parser().parse(schema1);
+org.apache.iceberg.Schema icebergSchema = 
AvroSchemaUtil.toIceberg(avroSchema);
+org.apache.avro.Schema avroSchemaUnionRecord = 
AvroSchemaUtil.convert(icebergSchema, "test");
+org.apache.avro.Schema unionRecordSchema =
+
avroSchemaUnionRecord.getFields().get(0).schema().getTypes().get(1).getValueType().getTypes().get(1);
+
+List expectedRead = new ArrayList<>();
+List actualWrite = new ArrayList<>();
+
+for (long i = 0; i < 10; i++) {
+  Map map = new HashMap<>();
+  Map mapRead = new HashMap<>();
+  updateMapsForUnionSchema(unionRecordSchema, map, mapRead, i);
+  GenericData.Record recordRead = new GenericRecordBuilder(avroSchema)
+  .set("map", mapRead)
+  .build();
+  GenericData.Record record = new GenericRecordBuilder(avroSchema)
+  .set("map", map)
+  .build();
+  actualWrite.add(record);
+  expectedRead.add(recordRead);
+}
+writeAndValidate(actualWrite, expectedRead, icebergSchema);
+  }
+
+  private void updateMapsForUnionSchema(
+  org.apache.avro.Schema unionRecordSchema,
+  Map map,
+  Map mapRead,
+  Long index) {
+map.put("boolean", index % 2 == 0);
+map.put("int", index.intValue());
+map.put("long", index);
+map.put("float", index.floatValue());
+map.put("double", index.doubleValue());
+map.put("bytes", 

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #1046: ISSUE-189: Add support for union record type

2020-05-21 Thread GitBox


rdblue commented on a change in pull request #1046:
URL: https://github.com/apache/incubator-iceberg/pull/1046#discussion_r428879377



##
File path: 
core/src/test/java/org/apache/iceberg/avro/AvroDataUnionRecordTest.java
##
@@ -0,0 +1,183 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.avro;
+
+import com.google.common.collect.Lists;
+import java.io.File;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecordBuilder;
+import org.apache.iceberg.Files;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.io.FileAppender;
+import org.junit.Assert;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+
+public class AvroDataUnionRecordTest {
+
+  @Rule
+  public TemporaryFolder temp = new TemporaryFolder();
+
+  protected void writeAndValidate(
+  List actualWrite,
+  List expectedRead,
+  Schema icebergSchema) throws IOException {
+File testFile = temp.newFile();
+Assert.assertTrue("Delete should succeed", testFile.delete());
+
+try (FileAppender writer = 
Avro.write(Files.localOutput(testFile))
+.schema(icebergSchema)
+.named("test")
+.build()) {
+  for (GenericData.Record rec : actualWrite) {
+writer.add(rec);
+  }
+}
+
+List rows;
+try (AvroIterable reader = 
Avro.read(Files.localInput(testFile))
+.project(icebergSchema)
+.build()) {
+  rows = Lists.newArrayList(reader);
+}
+
+for (int i = 0; i < expectedRead.size(); i += 1) {
+  AvroTestHelpers.assertEquals(icebergSchema.asStruct(), 
expectedRead.get(i), rows.get(i));
+}
+  }
+
+  @Test
+  public void testMapOfUnionValues() throws IOException {
+String schema1 = "{\n" +
+"  \"name\": \"MapOfUnion\",\n" +
+"  \"type\": \"record\",\n" +
+"  \"fields\": [\n" +
+"{\n" +
+"  \"name\": \"map\",\n" +
+"  \"type\": [\n" +
+"\"null\",\n" +
+"{\n" +
+"  \"type\": \"map\",\n" +
+"  \"values\": [\n" +
+"\"null\",\n" +
+"\"boolean\",\n" +
+"\"int\",\n" +
+"\"long\",\n" +
+"\"float\",\n" +
+"\"double\",\n" +
+"\"bytes\",\n" +
+"\"string\"\n" +
+"  ]\n" +
+"}\n" +
+"  ]\n" +
+"}\n" +
+"  ]\n" +
+"}";
+org.apache.avro.Schema avroSchema = new 
org.apache.avro.Schema.Parser().parse(schema1);
+org.apache.iceberg.Schema icebergSchema = 
AvroSchemaUtil.toIceberg(avroSchema);
+org.apache.avro.Schema avroSchemaUnionRecord = 
AvroSchemaUtil.convert(icebergSchema, "test");
+org.apache.avro.Schema unionRecordSchema =
+
avroSchemaUnionRecord.getFields().get(0).schema().getTypes().get(1).getValueType().getTypes().get(1);
+
+List expectedRead = new ArrayList<>();
+List actualWrite = new ArrayList<>();
+
+for (long i = 0; i < 10; i++) {
+  Map map = new HashMap<>();
+  Map mapRead = new HashMap<>();
+  updateMapsForUnionSchema(unionRecordSchema, map, mapRead, i);
+  GenericData.Record recordRead = new GenericRecordBuilder(avroSchema)
+  .set("map", mapRead)
+  .build();
+  GenericData.Record record = new GenericRecordBuilder(avroSchema)
+  .set("map", map)
+  .build();
+  actualWrite.add(record);
+  expectedRead.add(recordRead);
+}
+writeAndValidate(actualWrite, expectedRead, icebergSchema);
+  }
+
+  private void updateMapsForUnionSchema(
+  org.apache.avro.Schema unionRecordSchema,
+  Map map,
+  Map mapRead,
+  Long index) {
+map.put("boolean", index % 2 == 0);
+map.put("int", index.intValue());
+map.put("long", index);
+map.put("float", index.floatValue());
+map.put("double", index.doubleValue());
+map.put("bytes", 

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #1046: ISSUE-189: Add support for union record type

2020-05-21 Thread GitBox


rdblue commented on a change in pull request #1046:
URL: https://github.com/apache/incubator-iceberg/pull/1046#discussion_r428877247



##
File path: 
core/src/test/java/org/apache/iceberg/avro/AvroDataUnionRecordTest.java
##
@@ -0,0 +1,183 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.avro;
+
+import com.google.common.collect.Lists;
+import java.io.File;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import org.apache.avro.generic.GenericData;
+import org.apache.avro.generic.GenericRecordBuilder;
+import org.apache.iceberg.Files;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.io.FileAppender;
+import org.junit.Assert;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.TemporaryFolder;
+
+public class AvroDataUnionRecordTest {
+
+  @Rule
+  public TemporaryFolder temp = new TemporaryFolder();
+
+  protected void writeAndValidate(
+  List actualWrite,
+  List expectedRead,
+  Schema icebergSchema) throws IOException {
+File testFile = temp.newFile();
+Assert.assertTrue("Delete should succeed", testFile.delete());
+
+try (FileAppender writer = 
Avro.write(Files.localOutput(testFile))
+.schema(icebergSchema)
+.named("test")
+.build()) {
+  for (GenericData.Record rec : actualWrite) {
+writer.add(rec);
+  }
+}
+
+List rows;
+try (AvroIterable reader = 
Avro.read(Files.localInput(testFile))
+.project(icebergSchema)
+.build()) {
+  rows = Lists.newArrayList(reader);
+}
+
+for (int i = 0; i < expectedRead.size(); i += 1) {
+  AvroTestHelpers.assertEquals(icebergSchema.asStruct(), 
expectedRead.get(i), rows.get(i));
+}
+  }
+
+  @Test
+  public void testMapOfUnionValues() throws IOException {
+String schema1 = "{\n" +
+"  \"name\": \"MapOfUnion\",\n" +
+"  \"type\": \"record\",\n" +
+"  \"fields\": [\n" +
+"{\n" +
+"  \"name\": \"map\",\n" +
+"  \"type\": [\n" +
+"\"null\",\n" +
+"{\n" +
+"  \"type\": \"map\",\n" +
+"  \"values\": [\n" +
+"\"null\",\n" +
+"\"boolean\",\n" +
+"\"int\",\n" +
+"\"long\",\n" +
+"\"float\",\n" +
+"\"double\",\n" +
+"\"bytes\",\n" +
+"\"string\"\n" +
+"  ]\n" +
+"}\n" +
+"  ]\n" +
+"}\n" +
+"  ]\n" +
+"}";
+org.apache.avro.Schema avroSchema = new 
org.apache.avro.Schema.Parser().parse(schema1);
+org.apache.iceberg.Schema icebergSchema = 
AvroSchemaUtil.toIceberg(avroSchema);
+org.apache.avro.Schema avroSchemaUnionRecord = 
AvroSchemaUtil.convert(icebergSchema, "test");
+org.apache.avro.Schema unionRecordSchema =
+
avroSchemaUnionRecord.getFields().get(0).schema().getTypes().get(1).getValueType().getTypes().get(1);
+
+List expectedRead = new ArrayList<>();
+List actualWrite = new ArrayList<>();
+
+for (long i = 0; i < 10; i++) {
+  Map map = new HashMap<>();
+  Map mapRead = new HashMap<>();
+  updateMapsForUnionSchema(unionRecordSchema, map, mapRead, i);
+  GenericData.Record recordRead = new GenericRecordBuilder(avroSchema)
+  .set("map", mapRead)
+  .build();
+  GenericData.Record record = new GenericRecordBuilder(avroSchema)
+  .set("map", map)
+  .build();
+  actualWrite.add(record);
+  expectedRead.add(recordRead);
+}
+writeAndValidate(actualWrite, expectedRead, icebergSchema);
+  }
+
+  private void updateMapsForUnionSchema(
+  org.apache.avro.Schema unionRecordSchema,
+  Map map,
+  Map mapRead,
+  Long index) {
+map.put("boolean", index % 2 == 0);
+map.put("int", index.intValue());
+map.put("long", index);
+map.put("float", index.floatValue());
+map.put("double", index.doubleValue());
+map.put("bytes", 

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #1046: ISSUE-189: Add support for union record type

2020-05-21 Thread GitBox


rdblue commented on a change in pull request #1046:
URL: https://github.com/apache/incubator-iceberg/pull/1046#discussion_r428876705



##
File path: core/src/main/java/org/apache/iceberg/avro/TypeToSchema.java
##
@@ -112,7 +112,9 @@ public Schema struct(Types.StructType struct, List 
fieldSchemas) {
 }
 
 recordSchema = Schema.createRecord(recordName, null, null, false, fields);
-
+if (struct.isUnionSchema()) {

Review comment:
   Is there some reason why this needs to be a round-trip conversion? I 
think it should be fine to convert back to a record instead of a union schema. 
That's what we should be writing into a table with a schema converted from an 
Avro union schema anyway.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #1046: ISSUE-189: Add support for union record type

2020-05-21 Thread GitBox


rdblue commented on a change in pull request #1046:
URL: https://github.com/apache/incubator-iceberg/pull/1046#discussion_r428875887



##
File path: core/src/main/java/org/apache/iceberg/avro/SchemaToType.java
##
@@ -169,18 +184,15 @@ public Type primitive(Schema primitive) {
 return Types.DecimalType.of(
 ((LogicalTypes.Decimal) logical).getPrecision(),
 ((LogicalTypes.Decimal) logical).getScale());
-
   } else if (logical instanceof LogicalTypes.Date) {
 return Types.DateType.get();
-
   } else if (
   logical instanceof LogicalTypes.TimeMillis ||
-  logical instanceof LogicalTypes.TimeMicros) {
+  logical instanceof LogicalTypes.TimeMicros) {

Review comment:
   Minor: this file also has lots of formatting changes that should be 
reverted before committing.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #1046: ISSUE-189: Add support for union record type

2020-05-21 Thread GitBox


rdblue commented on a change in pull request #1046:
URL: https://github.com/apache/incubator-iceberg/pull/1046#discussion_r428875652



##
File path: core/src/main/java/org/apache/iceberg/avro/SchemaToType.java
##
@@ -106,11 +106,27 @@ public Type record(Schema record, List names, 
List fieldTypes) {
   public Type union(Schema union, List options) {
 Preconditions.checkArgument(AvroSchemaUtil.isOptionSchema(union),
 "Unsupported type: non-option union: %s", union);
-// records, arrays, and maps will check nullability later
-if (options.get(0) == null) {
-  return options.get(1);
-} else {
+if (options.size() == 1) {
   return options.get(0);
+} else if (options.size() == 2) {
+  if (options.get(0) == null) {
+return options.get(1);
+  } else {
+return options.get(0);
+  }
+} else {
+  // Convert complex unions to struct types where field names are member0, 
member1, etc.
+  // This is consistent with the behavior of the spark Avro SchemaConverter
+  List fields = 
Lists.newArrayListWithExpectedSize(options.size());
+  for (int i = 0; i < options.size(); i += 1) {
+Type fieldType = options.get(i);
+if (fieldType == null) {

Review comment:
   Why would `fieldType` be null?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #1046: ISSUE-189: Add support for union record type

2020-05-21 Thread GitBox


rdblue commented on a change in pull request #1046:
URL: https://github.com/apache/incubator-iceberg/pull/1046#discussion_r428869358



##
File path: core/src/main/java/org/apache/iceberg/avro/GenericAvroWriter.java
##
@@ -133,4 +141,38 @@ private WriteBuilder() {
   }
 }
   }
+
+  public static class UnionSchemaWriter implements 
ValueWriter {

Review comment:
   Files written by Iceberg must always conform to the Iceberg spec. 
Iceberg tables can allow reading data that is imported, but should _never_ 
write data that cannot be read by a generic implementation of the Iceberg spec.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #1046: ISSUE-189: Add support for union record type

2020-05-21 Thread GitBox


rdblue commented on a change in pull request #1046:
URL: https://github.com/apache/incubator-iceberg/pull/1046#discussion_r428868296



##
File path: core/src/main/java/org/apache/iceberg/avro/AvroSchemaUtil.java
##
@@ -120,7 +124,7 @@ public static boolean isTimestamptz(Schema schema) {
   }
 
   public static boolean isOptionSchema(Schema schema) {
-if (schema.getType() == UNION && schema.getTypes().size() == 2) {
+if (schema.getType() == UNION && schema.getTypes().size() >= 2) {

Review comment:
   This doesn't handle the case where a union is `["int", "float", 
"null"]`. What about keeping union-as-struct handling in separate methods? That 
way existing uses don't need to account for different behavior depending on 
whether a schema is a union or an option. It's concerning that this changes the 
contract of this method for existing callers.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #1046: ISSUE-189: Add support for union record type

2020-05-21 Thread GitBox


rdblue commented on a change in pull request #1046:
URL: https://github.com/apache/incubator-iceberg/pull/1046#discussion_r428867073



##
File path: core/src/main/java/org/apache/iceberg/avro/AvroSchemaUtil.java
##
@@ -47,20 +47,23 @@ private AvroSchemaUtil() {}
   public static final String VALUE_ID_PROP = "value-id";
   public static final String ELEMENT_ID_PROP = "element-id";
   public static final String ADJUST_TO_UTC_PROP = "adjust-to-utc";
+  public static final String UNION_SCHEMA_TO_RECORD = "union-schema-to-record";
 
   private static final Schema NULL = Schema.create(Schema.Type.NULL);
   private static final Schema.Type MAP = Schema.Type.MAP;
   private static final Schema.Type ARRAY = Schema.Type.ARRAY;
   private static final Schema.Type UNION = Schema.Type.UNION;
   private static final Schema.Type RECORD = Schema.Type.RECORD;
 
-  public static Schema convert(org.apache.iceberg.Schema schema,
-   String tableName) {
+  public static Schema convert(
+  org.apache.iceberg.Schema schema,
+  String tableName) {

Review comment:
   Minor: please revert non-functional changes like this one. That makes it 
easier to review and less likely to conflict with other commits.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #1046: ISSUE-189: Add support for union record type

2020-05-21 Thread GitBox


rdblue commented on a change in pull request #1046:
URL: https://github.com/apache/incubator-iceberg/pull/1046#discussion_r42887



##
File path: api/src/main/java/org/apache/iceberg/types/Types.java
##
@@ -641,6 +650,13 @@ public int hashCode() {
   }
   return fieldsById;
 }
+
+/**
+ * @return true if struct represents union schema
+ */
+public boolean isUnionSchema() {

Review comment:
   I agree. Iceberg's type system should have no knowledge of union types 
because unions aren't supported.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org