[GitHub] [incubator-iceberg] rdsr opened a new issue #540: A Table should optionally allow for automatic schema evolution

2019-10-13 Thread GitBox
rdsr opened a new issue #540: A Table should optionally allow for automatic schema evolution URL: https://github.com/apache/incubator-iceberg/issues/540 Related to #244 . We should allow for automatic schema evolution when writing to an Iceberg table. This feature should be controlled by

[GitHub] [incubator-iceberg] rdblue commented on issue #529: Add hadoop table catalog (WIP)

2019-10-13 Thread GitBox
rdblue commented on issue #529: Add hadoop table catalog (WIP) URL: https://github.com/apache/incubator-iceberg/pull/529#issuecomment-541442319 @chenjunjiedada, can you update the PR description with a summary of what you're proposing here? If I understand correctly, the idea is to

[GitHub] [incubator-iceberg] rdsr commented on issue #540: Iceberg tables should optionally allow for automatic schema evolution

2019-10-13 Thread GitBox
rdsr commented on issue #540: Iceberg tables should optionally allow for automatic schema evolution URL: https://github.com/apache/incubator-iceberg/issues/540#issuecomment-541442220 > It's easy to add columns with the wrong type when you're doing that. Wouldn't `UpdateSchema`

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #526: Add Baseline to iceberg-parquet

2019-10-13 Thread GitBox
rdblue commented on a change in pull request #526: Add Baseline to iceberg-parquet URL: https://github.com/apache/incubator-iceberg/pull/526#discussion_r334289512 ## File path: parquet/src/main/java/org/apache/iceberg/parquet/ParquetSchemaUtil.java ## @@ -29,6 +29,11 @@

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #526: Add Baseline to iceberg-parquet

2019-10-13 Thread GitBox
rdblue commented on a change in pull request #526: Add Baseline to iceberg-parquet URL: https://github.com/apache/incubator-iceberg/pull/526#discussion_r334289534 ## File path: parquet/src/main/java/org/apache/iceberg/parquet/ParquetTypeVisitor.java ## @@ -29,9 +29,8 @@

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #526: Add Baseline to iceberg-parquet

2019-10-13 Thread GitBox
rdblue commented on a change in pull request #526: Add Baseline to iceberg-parquet URL: https://github.com/apache/incubator-iceberg/pull/526#discussion_r334289657 ## File path: parquet/src/main/java/org/apache/iceberg/parquet/ParquetValueReaders.java ## @@ -584,9 +585,9

[GitHub] [incubator-iceberg] rdblue closed pull request #309: Add mechanism to expire table metadata

2019-10-13 Thread GitBox
rdblue closed pull request #309: Add mechanism to expire table metadata URL: https://github.com/apache/incubator-iceberg/pull/309 This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [incubator-iceberg] rdblue commented on issue #309: Add mechanism to expire table metadata

2019-10-13 Thread GitBox
rdblue commented on issue #309: Add mechanism to expire table metadata URL: https://github.com/apache/incubator-iceberg/pull/309#issuecomment-541449375 I'm closing this for now, since the approach is probably not what we will go with in the end, but I'd like to get a solution for this

[GitHub] [incubator-iceberg] rdsr commented on issue #540: Iceberg tables should optionally allow for automatic schema evolution

2019-10-13 Thread GitBox
rdsr commented on issue #540: Iceberg tables should optionally allow for automatic schema evolution URL: https://github.com/apache/incubator-iceberg/issues/540#issuecomment-541441863 Yes, if the columns are valid as per the schema evolution rules We've seen use-cases of it where users

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #525: Apply Baseline to iceberg-pig

2019-10-13 Thread GitBox
rdblue commented on a change in pull request #525: Apply Baseline to iceberg-pig URL: https://github.com/apache/incubator-iceberg/pull/525#discussion_r334289942 ## File path: pig/src/main/java/org/apache/iceberg/pig/IcebergStorage.java ## @@ -193,80 +166,81 @@ public void

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #525: Apply Baseline to iceberg-pig

2019-10-13 Thread GitBox
rdblue commented on a change in pull request #525: Apply Baseline to iceberg-pig URL: https://github.com/apache/incubator-iceberg/pull/525#discussion_r334290084 ## File path: pig/src/main/java/org/apache/iceberg/pig/PigParquetReader.java ## @@ -366,16 +383,16 @@ protected

[GitHub] [incubator-iceberg] rdsr opened a new issue #541: Iceberg Pig reader should support both catalogs

2019-10-13 Thread GitBox
rdsr opened a new issue #541: Iceberg Pig reader should support both catalogs URL: https://github.com/apache/incubator-iceberg/issues/541 In `IcebergStorage` we are currently only using the `Tables` api which is the filesystem or Hadoop catalog. We should support both. Or better yet, have

[GitHub] [incubator-iceberg] rdblue commented on issue #181: Add mechanism to expire old metadata versions

2019-10-13 Thread GitBox
rdblue commented on issue #181: Add mechanism to expire old metadata versions URL: https://github.com/apache/incubator-iceberg/issues/181#issuecomment-541449685 From discussion on #309, I think the right way to expire old metadata is to keep a list of the N previous metadata files in each

[GitHub] [incubator-iceberg] rdblue commented on issue #540: Iceberg tables should optionally allow for automatic schema evolution

2019-10-13 Thread GitBox
rdblue commented on issue #540: Iceberg tables should optionally allow for automatic schema evolution URL: https://github.com/apache/incubator-iceberg/issues/540#issuecomment-541440371 What do you mean by automatic schema evolution? When you write to a table, any new columns are

[GitHub] [incubator-iceberg] rdblue commented on issue #540: Iceberg tables should optionally allow for automatic schema evolution

2019-10-13 Thread GitBox
rdblue commented on issue #540: Iceberg tables should optionally allow for automatic schema evolution URL: https://github.com/apache/incubator-iceberg/issues/540#issuecomment-541442093 Yeah, I think that could be a good idea. We'd want to document some of the risks, though. It's easy to

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #526: Add Baseline to iceberg-parquet

2019-10-13 Thread GitBox
rdblue commented on a change in pull request #526: Add Baseline to iceberg-parquet URL: https://github.com/apache/incubator-iceberg/pull/526#discussion_r334289245 ## File path: parquet/src/main/java/org/apache/iceberg/parquet/Parquet.java ## @@ -98,18 +98,18 @@ public

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #526: Add Baseline to iceberg-parquet

2019-10-13 Thread GitBox
rdblue commented on a change in pull request #526: Add Baseline to iceberg-parquet URL: https://github.com/apache/incubator-iceberg/pull/526#discussion_r334289220 ## File path: parquet/src/main/java/org/apache/iceberg/parquet/ParquetDictionaryRowGroupFilter.java ## @@

[GitHub] [incubator-iceberg] rdsr commented on a change in pull request #525: Apply Baseline to iceberg-pig

2019-10-13 Thread GitBox
rdsr commented on a change in pull request #525: Apply Baseline to iceberg-pig URL: https://github.com/apache/incubator-iceberg/pull/525#discussion_r334289841 ## File path: pig/src/main/java/org/apache/iceberg/pig/IcebergStorage.java ## @@ -290,8 +265,8 @@ public

[GitHub] [incubator-iceberg] rdsr commented on a change in pull request #525: Apply Baseline to iceberg-pig

2019-10-13 Thread GitBox
rdsr commented on a change in pull request #525: Apply Baseline to iceberg-pig URL: https://github.com/apache/incubator-iceberg/pull/525#discussion_r334289743 ## File path: pig/src/main/java/org/apache/iceberg/pig/SchemaUtil.java ## @@ -73,6 +60,16 @@ private static

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #526: Add Baseline to iceberg-parquet

2019-10-13 Thread GitBox
rdblue commented on a change in pull request #526: Add Baseline to iceberg-parquet URL: https://github.com/apache/incubator-iceberg/pull/526#discussion_r334289375 ## File path: parquet/src/main/java/org/apache/iceberg/parquet/ParquetDictionaryRowGroupFilter.java ## @@

[GitHub] [incubator-iceberg] rdsr commented on a change in pull request #525: Apply Baseline to iceberg-pig

2019-10-13 Thread GitBox
rdsr commented on a change in pull request #525: Apply Baseline to iceberg-pig URL: https://github.com/apache/incubator-iceberg/pull/525#discussion_r334289804 ## File path: pig/src/main/java/org/apache/iceberg/pig/IcebergPigInputFormat.java ## @@ -29,7 +29,7 @@ import

[GitHub] [incubator-iceberg] rdsr commented on a change in pull request #525: Apply Baseline to iceberg-pig

2019-10-13 Thread GitBox
rdsr commented on a change in pull request #525: Apply Baseline to iceberg-pig URL: https://github.com/apache/incubator-iceberg/pull/525#discussion_r334279185 ## File path: pig/src/main/java/org/apache/iceberg/pig/PigParquetReader.java ## @@ -366,16 +383,16 @@ protected

[GitHub] [incubator-iceberg] rdblue commented on issue #540: Iceberg tables should optionally allow for automatic schema evolution

2019-10-13 Thread GitBox
rdblue commented on issue #540: Iceberg tables should optionally allow for automatic schema evolution URL: https://github.com/apache/incubator-iceberg/issues/540#issuecomment-541446146 Yes, it will. But we would want to fail earlier than that. The case I'm thinking about is when you're

[GitHub] [incubator-iceberg] rdblue commented on issue #539: Replace StringBuffer by StringBuilder

2019-10-13 Thread GitBox
rdblue commented on issue #539: Replace StringBuffer by StringBuilder URL: https://github.com/apache/incubator-iceberg/pull/539#issuecomment-541441529 Is the behavior of `setLength` the same for both classes? This section is tricky because some of the lengths are measured in bytes, some in

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #537: Docs: Fix typos

2019-10-13 Thread GitBox
rdblue commented on a change in pull request #537: Docs: Fix typos URL: https://github.com/apache/incubator-iceberg/pull/537#discussion_r334288518 ## File path: site/docs/api-quickstart.md ## @@ -113,7 +113,7 @@ import org.apache.avro.Schema.Parser val avroSchema = new

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #526: Add Baseline to iceberg-parquet

2019-10-13 Thread GitBox
rdblue commented on a change in pull request #526: Add Baseline to iceberg-parquet URL: https://github.com/apache/incubator-iceberg/pull/526#discussion_r334289415 ## File path: parquet/src/main/java/org/apache/iceberg/parquet/ParquetFilters.java ## @@ -168,9 +169,9 @@

[GitHub] [incubator-iceberg] rdsr opened a new issue #542: Support other data formats in Iceberg Pig reader

2019-10-13 Thread GitBox
rdsr opened a new issue #542: Support other data formats in Iceberg Pig reader URL: https://github.com/apache/incubator-iceberg/issues/542 Today `IcebergRecordReader` only supports Parquet This is an automated message from

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #525: Apply Baseline to iceberg-pig

2019-10-13 Thread GitBox
rdblue commented on a change in pull request #525: Apply Baseline to iceberg-pig URL: https://github.com/apache/incubator-iceberg/pull/525#discussion_r334290142 ## File path: build.gradle ## @@ -451,6 +451,8 @@ project(':iceberg-pig') { compile