[GitHub] [incubator-iceberg] rdsr commented on a change in pull request #207: Add external schema mappings for files written with name-based schemas #40

2019-10-03 Thread GitBox
rdsr commented on a change in pull request #207: Add external schema mappings for files written with name-based schemas #40 URL: https://github.com/apache/incubator-iceberg/pull/207#discussion_r331326703 ## File path: core/src/main/java/org/apache/iceberg/avro/PruneColumns.java

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #207: Add external schema mappings for files written with name-based schemas #40

2019-10-03 Thread GitBox
rdblue commented on a change in pull request #207: Add external schema mappings for files written with name-based schemas #40 URL: https://github.com/apache/incubator-iceberg/pull/207#discussion_r331300121 ## File path: core/src/main/java/org/apache/iceberg/avro/PruneColumns.java

[GitHub] [incubator-iceberg] andrei-ionescu commented on issue #510: Cannot update an Iceberg dataset from a Parquet file due to "field should be required, but is optional"

2019-10-03 Thread GitBox
andrei-ionescu commented on issue #510: Cannot update an Iceberg dataset from a Parquet file due to "field should be required, but is optional" URL: https://github.com/apache/incubator-iceberg/issues/510#issuecomment-538101847 Thanks @rdblue.

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #207: Add external schema mappings for files written with name-based schemas #40

2019-10-03 Thread GitBox
rdblue commented on a change in pull request #207: Add external schema mappings for files written with name-based schemas #40 URL: https://github.com/apache/incubator-iceberg/pull/207#discussion_r331299760 ## File path: core/src/main/java/org/apache/iceberg/avro/PruneColumns.java

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #207: Add external schema mappings for files written with name-based schemas #40

2019-10-03 Thread GitBox
rdblue commented on a change in pull request #207: Add external schema mappings for files written with name-based schemas #40 URL: https://github.com/apache/incubator-iceberg/pull/207#discussion_r331299871 ## File path: core/src/main/java/org/apache/iceberg/avro/PruneColumns.java

[GitHub] [incubator-iceberg] danielcweeks merged pull request #407: [python] Parquet read path

2019-10-03 Thread GitBox
danielcweeks merged pull request #407: [python] Parquet read path URL: https://github.com/apache/incubator-iceberg/pull/407 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [incubator-iceberg] aokolnychyi commented on a change in pull request #513: Fix concurrency issue in HiveTableOperations when Table is reused

2019-10-03 Thread GitBox
aokolnychyi commented on a change in pull request #513: Fix concurrency issue in HiveTableOperations when Table is reused URL: https://github.com/apache/incubator-iceberg/pull/513#discussion_r331288756 ## File path:

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #207: Add external schema mappings for files written with name-based schemas #40

2019-10-03 Thread GitBox
rdblue commented on a change in pull request #207: Add external schema mappings for files written with name-based schemas #40 URL: https://github.com/apache/incubator-iceberg/pull/207#discussion_r331300533 ## File path:

[GitHub] [incubator-iceberg] rdblue commented on issue #207: Add external schema mappings for files written with name-based schemas #40

2019-10-03 Thread GitBox
rdblue commented on issue #207: Add external schema mappings for files written with name-based schemas #40 URL: https://github.com/apache/incubator-iceberg/pull/207#issuecomment-538175051 Thanks @rdsr! I think this is about ready. Just a few things to fix, like map keys in `PruneColumns`.

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #207: Add external schema mappings for files written with name-based schemas #40

2019-10-03 Thread GitBox
rdblue commented on a change in pull request #207: Add external schema mappings for files written with name-based schemas #40 URL: https://github.com/apache/incubator-iceberg/pull/207#discussion_r331300461 ## File path: core/src/main/java/org/apache/iceberg/avro/PruneColumns.java

[GitHub] [incubator-iceberg] manishmalhotrawork commented on issue #499: Add persistent IDs to partition fields (WIP)

2019-10-03 Thread GitBox
manishmalhotrawork commented on issue #499: Add persistent IDs to partition fields (WIP) URL: https://github.com/apache/incubator-iceberg/pull/499#issuecomment-538155832 @rdblue can you please check this one, thanks ! This

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #207: Add external schema mappings for files written with name-based schemas #40

2019-10-03 Thread GitBox
rdblue commented on a change in pull request #207: Add external schema mappings for files written with name-based schemas #40 URL: https://github.com/apache/incubator-iceberg/pull/207#discussion_r331299510 ## File path: core/src/main/java/org/apache/iceberg/avro/PruneColumns.java

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #207: Add external schema mappings for files written with name-based schemas #40

2019-10-03 Thread GitBox
rdblue commented on a change in pull request #207: Add external schema mappings for files written with name-based schemas #40 URL: https://github.com/apache/incubator-iceberg/pull/207#discussion_r331299559 ## File path: core/src/main/java/org/apache/iceberg/avro/PruneColumns.java

[GitHub] [incubator-iceberg] yathindranath commented on a change in pull request #497: Support retaining last N snapshots

2019-10-03 Thread GitBox
yathindranath commented on a change in pull request #497: Support retaining last N snapshots URL: https://github.com/apache/incubator-iceberg/pull/497#discussion_r331308932 ## File path: api/src/main/java/org/apache/iceberg/ExpireSnapshots.java ## @@ -55,6 +55,14 @@

[GitHub] [incubator-iceberg] yathindranath commented on a change in pull request #497: Support retaining last N snapshots

2019-10-03 Thread GitBox
yathindranath commented on a change in pull request #497: Support retaining last N snapshots URL: https://github.com/apache/incubator-iceberg/pull/497#discussion_r331309609 ## File path: core/src/main/java/org/apache/iceberg/RemoveSnapshots.java ## @@ -82,6 +83,18 @@

[GitHub] [incubator-iceberg] rdsr commented on issue #207: Add external schema mappings for files written with name-based schemas #40

2019-10-03 Thread GitBox
rdsr commented on issue #207: Add external schema mappings for files written with name-based schemas #40 URL: https://github.com/apache/incubator-iceberg/pull/207#issuecomment-538191265 No problem. Thanks for taking a look at it @rdblue

[GitHub] [incubator-iceberg] rdsr commented on a change in pull request #207: Add external schema mappings for files written with name-based schemas #40

2019-10-03 Thread GitBox
rdsr commented on a change in pull request #207: Add external schema mappings for files written with name-based schemas #40 URL: https://github.com/apache/incubator-iceberg/pull/207#discussion_r331312694 ## File path:

[GitHub] [incubator-iceberg] rdsr commented on a change in pull request #207: Add external schema mappings for files written with name-based schemas #40

2019-10-03 Thread GitBox
rdsr commented on a change in pull request #207: Add external schema mappings for files written with name-based schemas #40 URL: https://github.com/apache/incubator-iceberg/pull/207#discussion_r331326220 ## File path: core/src/main/java/org/apache/iceberg/avro/PruneColumns.java

[GitHub] [incubator-iceberg] rdsr commented on a change in pull request #207: Add external schema mappings for files written with name-based schemas #40

2019-10-03 Thread GitBox
rdsr commented on a change in pull request #207: Add external schema mappings for files written with name-based schemas #40 URL: https://github.com/apache/incubator-iceberg/pull/207#discussion_r331326009 ## File path: core/src/main/java/org/apache/iceberg/avro/PruneColumns.java

[GitHub] [incubator-iceberg] rdsr commented on a change in pull request #207: Add external schema mappings for files written with name-based schemas #40

2019-10-03 Thread GitBox
rdsr commented on a change in pull request #207: Add external schema mappings for files written with name-based schemas #40 URL: https://github.com/apache/incubator-iceberg/pull/207#discussion_r331326703 ## File path: core/src/main/java/org/apache/iceberg/avro/PruneColumns.java

[GitHub] [incubator-iceberg] TGooch44 commented on a change in pull request #407: [python] Parquet read path

2019-10-03 Thread GitBox
TGooch44 commented on a change in pull request #407: [python] Parquet read path URL: https://github.com/apache/incubator-iceberg/pull/407#discussion_r331016952 ## File path: python/iceberg/api/file_format.py ## @@ -20,16 +20,18 @@ @unique class FileFormat(Enum): - -

[GitHub] [incubator-iceberg] TGooch44 commented on a change in pull request #407: [python] Parquet read path

2019-10-03 Thread GitBox
TGooch44 commented on a change in pull request #407: [python] Parquet read path URL: https://github.com/apache/incubator-iceberg/pull/407#discussion_r331021025 ## File path: python/iceberg/api/file_format.py ## @@ -20,16 +20,18 @@ @unique class FileFormat(Enum): - -

[GitHub] [incubator-iceberg] TGooch44 commented on a change in pull request #407: [python] Parquet read path

2019-10-03 Thread GitBox
TGooch44 commented on a change in pull request #407: [python] Parquet read path URL: https://github.com/apache/incubator-iceberg/pull/407#discussion_r331028923 ## File path: python/iceberg/core/avro/avro_to_iceberg.py ## @@ -228,37 +232,41 @@ def

[GitHub] [incubator-iceberg] gcdata44 commented on a change in pull request #407: [python] Parquet read path

2019-10-03 Thread GitBox
gcdata44 commented on a change in pull request #407: [python] Parquet read path URL: https://github.com/apache/incubator-iceberg/pull/407#discussion_r331015026 ## File path: python/iceberg/core/avro/avro_to_iceberg.py ## @@ -228,37 +232,41 @@ def

[GitHub] [incubator-iceberg] gcdata44 commented on a change in pull request #407: [python] Parquet read path

2019-10-03 Thread GitBox
gcdata44 commented on a change in pull request #407: [python] Parquet read path URL: https://github.com/apache/incubator-iceberg/pull/407#discussion_r331014660 ## File path: python/iceberg/api/file_format.py ## @@ -20,16 +20,18 @@ @unique class FileFormat(Enum): - -

[GitHub] [incubator-iceberg] gcdata44 commented on a change in pull request #407: [python] Parquet read path

2019-10-03 Thread GitBox
gcdata44 commented on a change in pull request #407: [python] Parquet read path URL: https://github.com/apache/incubator-iceberg/pull/407#discussion_r331015026 ## File path: python/iceberg/core/avro/avro_to_iceberg.py ## @@ -228,37 +232,41 @@ def

[GitHub] [incubator-iceberg] gcdata44 commented on a change in pull request #407: [python] Parquet read path

2019-10-03 Thread GitBox
gcdata44 commented on a change in pull request #407: [python] Parquet read path URL: https://github.com/apache/incubator-iceberg/pull/407#discussion_r331014660 ## File path: python/iceberg/api/file_format.py ## @@ -20,16 +20,18 @@ @unique class FileFormat(Enum): - -

[GitHub] [incubator-iceberg] andrei-ionescu opened a new issue #510: Cannot update an Iceberg dataset from a Parquet file due to "field should be required, but is optional"

2019-10-03 Thread GitBox
andrei-ionescu opened a new issue #510: Cannot update an Iceberg dataset from a Parquet file due to "field should be required, but is optional" URL: https://github.com/apache/incubator-iceberg/issues/510 Given an Iceberg dataset found at `targetPath` with the following schema named

[GitHub] [incubator-iceberg] aokolnychyi opened a new pull request #512: Extend RewriteManifests with a way to add/delete manifests

2019-10-03 Thread GitBox
aokolnychyi opened a new pull request #512: Extend RewriteManifests with a way to add/delete manifests URL: https://github.com/apache/incubator-iceberg/pull/512 This PR extends `RewriteManifests` with a way to directly delete/add manifests, which enables us to rewrite manifests using an

[GitHub] [incubator-iceberg] aokolnychyi opened a new pull request #511: Expose partition spec info

2019-10-03 Thread GitBox
aokolnychyi opened a new pull request #511: Expose partition spec info URL: https://github.com/apache/incubator-iceberg/pull/511 This PR exposes partition spec info from the `Table` API so that utilities can make use of that. As an alternative, utilities can cast `Table` to

[GitHub] [incubator-iceberg] aokolnychyi commented on a change in pull request #512: Extend RewriteManifests with a way to add/delete manifests

2019-10-03 Thread GitBox
aokolnychyi commented on a change in pull request #512: Extend RewriteManifests with a way to add/delete manifests URL: https://github.com/apache/incubator-iceberg/pull/512#discussion_r331102214 ## File path: core/src/main/java/org/apache/iceberg/BaseRewriteManifests.java

[GitHub] [incubator-iceberg] aokolnychyi commented on issue #512: Extend RewriteManifests with a way to add/delete manifests

2019-10-03 Thread GitBox
aokolnychyi commented on issue #512: Extend RewriteManifests with a way to add/delete manifests URL: https://github.com/apache/incubator-iceberg/pull/512#issuecomment-537994844 @bryanck could you take a look? This is an

[GitHub] [incubator-iceberg] rdblue commented on issue #510: Cannot update an Iceberg dataset from a Parquet file due to "field should be required, but is optional"

2019-10-03 Thread GitBox
rdblue commented on issue #510: Cannot update an Iceberg dataset from a Parquet file due to "field should be required, but is optional" URL: https://github.com/apache/incubator-iceberg/issues/510#issuecomment-538035616 > Please note that both have the same schema. The problem is

[GitHub] [incubator-iceberg] rdblue commented on issue #514: Fix for cannot update an Iceberg dataset from a Parquet file (#510)

2019-10-03 Thread GitBox
rdblue commented on issue #514: Fix for cannot update an Iceberg dataset from a Parquet file (#510) URL: https://github.com/apache/incubator-iceberg/pull/514#issuecomment-538036232 Thanks for working on this, @andrei-ionescu! Unfortunately, I think the approach here is incorrect.

[GitHub] [incubator-iceberg] rdblue edited a comment on issue #514: Fix for cannot update an Iceberg dataset from a Parquet file (#510)

2019-10-03 Thread GitBox
rdblue edited a comment on issue #514: Fix for cannot update an Iceberg dataset from a Parquet file (#510) URL: https://github.com/apache/incubator-iceberg/pull/514#issuecomment-538036232 Thanks for working on this, @andrei-ionescu! Unfortunately, I think the approach here is

[GitHub] [incubator-iceberg] rdblue edited a comment on issue #510: Cannot update an Iceberg dataset from a Parquet file due to "field should be required, but is optional"

2019-10-03 Thread GitBox
rdblue edited a comment on issue #510: Cannot update an Iceberg dataset from a Parquet file due to "field should be required, but is optional" URL: https://github.com/apache/incubator-iceberg/issues/510#issuecomment-538035616 > Please note that both have the same schema. I think

[GitHub] [incubator-iceberg] rdblue commented on issue #510: Cannot update an Iceberg dataset from a Parquet file due to "field should be required, but is optional"

2019-10-03 Thread GitBox
rdblue commented on issue #510: Cannot update an Iceberg dataset from a Parquet file due to "field should be required, but is optional" URL: https://github.com/apache/incubator-iceberg/issues/510#issuecomment-538045553 Sorry, I was wrong about that in the comment above. Since this is

[GitHub] [incubator-iceberg] aokolnychyi opened a new pull request #513: Fix concurrency issue in HiveTableOperations when Table is reused

2019-10-03 Thread GitBox
aokolnychyi opened a new pull request #513: Fix concurrency issue in HiveTableOperations when Table is reused URL: https://github.com/apache/incubator-iceberg/pull/513 The same `Table` instance can be shared by multiple jobs meaning that `TableOperations` can be reused. Right now,

[GitHub] [incubator-iceberg] aokolnychyi commented on a change in pull request #513: Fix concurrency issue in HiveTableOperations when Table is reused

2019-10-03 Thread GitBox
aokolnychyi commented on a change in pull request #513: Fix concurrency issue in HiveTableOperations when Table is reused URL: https://github.com/apache/incubator-iceberg/pull/513#discussion_r331117699 ## File path:

[GitHub] [incubator-iceberg] aokolnychyi commented on issue #513: Fix concurrency issue in HiveTableOperations when Table is reused

2019-10-03 Thread GitBox
aokolnychyi commented on issue #513: Fix concurrency issue in HiveTableOperations when Table is reused URL: https://github.com/apache/incubator-iceberg/pull/513#issuecomment-538007129 @Parth-Brahmbhatt could you take a look?

[GitHub] [incubator-iceberg] andrei-ionescu opened a new pull request #514: Fix for cannot update an Iceberg dataset from a Parquet file (#510)

2019-10-03 Thread GitBox
andrei-ionescu opened a new pull request #514: Fix for cannot update an Iceberg dataset from a Parquet file (#510) URL: https://github.com/apache/incubator-iceberg/pull/514 This is a proposed fix for #510 — **Cannot update an Iceberg dataset from a Parquet file due to "field should be

[GitHub] [incubator-iceberg] andrei-ionescu commented on issue #514: Fix for cannot update an Iceberg dataset from a Parquet file (#510)

2019-10-03 Thread GitBox
andrei-ionescu commented on issue #514: Fix for cannot update an Iceberg dataset from a Parquet file (#510) URL: https://github.com/apache/incubator-iceberg/pull/514#issuecomment-538017850 cc @rdblue @aokolnychyi @fbocse @prodeezy @rominparekh