[GitHub] [incubator-iceberg] aokolnychyi commented on issue #553: Spark ReadTask is expensive to serialize

2019-10-17 Thread GitBox
aokolnychyi commented on issue #553: Spark ReadTask is expensive to serialize URL: https://github.com/apache/incubator-iceberg/issues/553#issuecomment-543199975 As a short-term solution, we can broadcast `EncryptionManager` and `FileIO` in `IcebergSource`. Then `Reader` and `ReadTask` can

[GitHub] [incubator-iceberg] aokolnychyi edited a comment on issue #553: Spark ReadTask is expensive to serialize

2019-10-17 Thread GitBox
aokolnychyi edited a comment on issue #553: Spark ReadTask is expensive to serialize URL: https://github.com/apache/incubator-iceberg/issues/553#issuecomment-543199975 As a short-term solution, we can broadcast `EncryptionManager` and `FileIO` in `IcebergSource`. Then `Reader` and

[GitHub] [incubator-iceberg] jzhuge opened a new pull request #554: Fix IllegalArgumentException in DataFiles.Builder.withPartitionPath

2019-10-17 Thread GitBox
jzhuge opened a new pull request #554: Fix IllegalArgumentException in DataFiles.Builder.withPartitionPath URL: https://github.com/apache/incubator-iceberg/pull/554 DataFiles.fillFromPath threw "Invalid partition data, too many fields (expecting 0)" when the path is empty. The fix

[GitHub] [incubator-iceberg] andrei-ionescu commented on issue #510: Cannot update an Iceberg dataset from a Parquet file due to "field should be required, but is optional"

2019-10-17 Thread GitBox
andrei-ionescu commented on issue #510: Cannot update an Iceberg dataset from a Parquet file due to "field should be required, but is optional" URL: https://github.com/apache/incubator-iceberg/issues/510#issuecomment-543192332 @rdsr Given two different locations of data

[GitHub] [incubator-iceberg] aokolnychyi commented on issue #553: Spark ReadTask is expensive to serialize

2019-10-17 Thread GitBox
aokolnychyi commented on issue #553: Spark ReadTask is expensive to serialize URL: https://github.com/apache/incubator-iceberg/issues/553#issuecomment-543149968 I can confirm the issue is resolved if we avoid serializing `FileIO`. The main question is how to achieve that with minimum

[GitHub] [incubator-iceberg] jzhuge commented on issue #554: Fix IllegalArgumentException in DataFiles.Builder.withPartitionPath

2019-10-17 Thread GitBox
jzhuge commented on issue #554: Fix IllegalArgumentException in DataFiles.Builder.withPartitionPath URL: https://github.com/apache/incubator-iceberg/pull/554#issuecomment-543249742 @rdblue When you merged #57 into "rblue/iceberg" branch in commit 22d802aca84f27be4e95bda2030ca7f423e854fc

[GitHub] [incubator-iceberg] rdblue merged pull request #554: Fix IllegalArgumentException in DataFiles.Builder.withPartitionPath

2019-10-17 Thread GitBox
rdblue merged pull request #554: Fix IllegalArgumentException in DataFiles.Builder.withPartitionPath URL: https://github.com/apache/incubator-iceberg/pull/554 This is an automated message from the Apache Git Service. To

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363

2019-10-17 Thread GitBox
rdblue commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363 URL: https://github.com/apache/incubator-iceberg/pull/551#discussion_r336108392 ## File path: site/docs/python-api-intro.md ## @@ -0,0 +1,143 @@ + + +# Iceberg Python

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363

2019-10-17 Thread GitBox
rdblue commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363 URL: https://github.com/apache/incubator-iceberg/pull/551#discussion_r336108513 ## File path: site/docs/python-api-intro.md ## @@ -0,0 +1,143 @@ + + +# Iceberg Python

[GitHub] [incubator-iceberg] rdblue merged pull request #550: Bump ORC from 1.5.5 to 1.5.6

2019-10-17 Thread GitBox
rdblue merged pull request #550: Bump ORC from 1.5.5 to 1.5.6 URL: https://github.com/apache/incubator-iceberg/pull/550 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [incubator-iceberg] jzhuge commented on a change in pull request #549: Add Spark custom Kryo registrator

2019-10-17 Thread GitBox
jzhuge commented on a change in pull request #549: Add Spark custom Kryo registrator URL: https://github.com/apache/incubator-iceberg/pull/549#discussion_r336087118 ## File path: build.gradle ## @@ -429,6 +429,8 @@ project(':iceberg-spark') { compile

[GitHub] [incubator-iceberg] jzhuge edited a comment on issue #554: Fix IllegalArgumentException in DataFiles.Builder.withPartitionPath

2019-10-17 Thread GitBox
jzhuge edited a comment on issue #554: Fix IllegalArgumentException in DataFiles.Builder.withPartitionPath URL: https://github.com/apache/incubator-iceberg/pull/554#issuecomment-543249742 @rdblue When you merged #57 into "rblue/iceberg" branch in commit

[GitHub] [incubator-iceberg] jzhuge edited a comment on issue #554: Fix IllegalArgumentException in DataFiles.Builder.withPartitionPath

2019-10-17 Thread GitBox
jzhuge edited a comment on issue #554: Fix IllegalArgumentException in DataFiles.Builder.withPartitionPath URL: https://github.com/apache/incubator-iceberg/pull/554#issuecomment-543249742 @rdblue When you merged #57 into "rblue/iceberg" branch in commit

[GitHub] [incubator-iceberg] jzhuge edited a comment on issue #554: Fix IllegalArgumentException in DataFiles.Builder.withPartitionPath

2019-10-17 Thread GitBox
jzhuge edited a comment on issue #554: Fix IllegalArgumentException in DataFiles.Builder.withPartitionPath URL: https://github.com/apache/incubator-iceberg/pull/554#issuecomment-543249742 @rdblue When you merged #57 into "rblue/iceberg" branch in 22d802aca84f27be4e95bda2030ca7f423e854fc

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363

2019-10-17 Thread GitBox
rdblue commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363 URL: https://github.com/apache/incubator-iceberg/pull/551#discussion_r336108312 ## File path: site/docs/python-api-intro.md ## @@ -0,0 +1,143 @@ + + +# Iceberg Python

[GitHub] [incubator-iceberg] jzhuge commented on issue #554: Fix IllegalArgumentException in DataFiles.Builder.withPartitionPath

2019-10-17 Thread GitBox
jzhuge commented on issue #554: Fix IllegalArgumentException in DataFiles.Builder.withPartitionPath URL: https://github.com/apache/incubator-iceberg/pull/554#issuecomment-543248490 @rdblue this PR is probably no longer necessary because of #507, right?

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #543: Avoid NullPointerException in FindFiles when there is no snapshot

2019-10-17 Thread GitBox
rdblue commented on a change in pull request #543: Avoid NullPointerException in FindFiles when there is no snapshot URL: https://github.com/apache/incubator-iceberg/pull/543#discussion_r336111263 ## File path: core/src/main/java/org/apache/iceberg/FindFiles.java ## @@

[GitHub] [incubator-iceberg] rdblue commented on issue #550: Bump ORC from 1.5.5 to 1.5.6

2019-10-17 Thread GitBox
rdblue commented on issue #550: Bump ORC from 1.5.5 to 1.5.6 URL: https://github.com/apache/incubator-iceberg/pull/550#issuecomment-543261091 Thanks, @Fokko! This is an automated message from the Apache Git Service. To

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363

2019-10-17 Thread GitBox
rdblue commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363 URL: https://github.com/apache/incubator-iceberg/pull/551#discussion_r336107609 ## File path: python/README.md ## @@ -15,6 +15,26 @@ - limitations under the License.

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363

2019-10-17 Thread GitBox
rdblue commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363 URL: https://github.com/apache/incubator-iceberg/pull/551#discussion_r336107609 ## File path: python/README.md ## @@ -15,6 +15,26 @@ - limitations under the License.

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363

2019-10-17 Thread GitBox
rdblue commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363 URL: https://github.com/apache/incubator-iceberg/pull/551#discussion_r336107337 ## File path: python/README.md ## @@ -15,6 +15,26 @@ - limitations under the License.

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363

2019-10-17 Thread GitBox
rdblue commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363 URL: https://github.com/apache/incubator-iceberg/pull/551#discussion_r336110220 ## File path: site/docs/python-quickstart.md ## @@ -0,0 +1,40 @@ + + +# Examples + +##

[GitHub] [incubator-iceberg] rdblue commented on issue #551: [python] First add to docs, addresses #323 and #363

2019-10-17 Thread GitBox
rdblue commented on issue #551: [python] First add to docs, addresses #323 and #363 URL: https://github.com/apache/incubator-iceberg/pull/551#issuecomment-543259905 Thanks, @TGooch44! Great to see Python docs! This is an

[GitHub] [incubator-iceberg] jzhuge commented on issue #446: KryoException when writing Iceberg tables in Spark

2019-10-17 Thread GitBox
jzhuge commented on issue #446: KryoException when writing Iceberg tables in Spark URL: https://github.com/apache/incubator-iceberg/issues/446#issuecomment-543276340 @aokolnychyi @shardulm94 @rdsr please take a look at a custom Spark Kryo registrator for Iceberg in #549.

[GitHub] [incubator-iceberg] TGooch44 commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363

2019-10-17 Thread GitBox
TGooch44 commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363 URL: https://github.com/apache/incubator-iceberg/pull/551#discussion_r336131555 ## File path: site/docs/python-api-intro.md ## @@ -0,0 +1,143 @@ + + +# Iceberg Python

[GitHub] [incubator-iceberg] TGooch44 commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363

2019-10-17 Thread GitBox
TGooch44 commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363 URL: https://github.com/apache/incubator-iceberg/pull/551#discussion_r336131620 ## File path: site/docs/python-api-intro.md ## @@ -0,0 +1,143 @@ + + +# Iceberg Python

[GitHub] [incubator-iceberg] TGooch44 commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363

2019-10-17 Thread GitBox
TGooch44 commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363 URL: https://github.com/apache/incubator-iceberg/pull/551#discussion_r336131362 ## File path: python/README.md ## @@ -15,6 +15,26 @@ - limitations under the License.

[GitHub] [incubator-iceberg] TGooch44 commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363

2019-10-17 Thread GitBox
TGooch44 commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363 URL: https://github.com/apache/incubator-iceberg/pull/551#discussion_r336131188 ## File path: python/README.md ## @@ -15,6 +15,26 @@ - limitations under the License.

[GitHub] [incubator-iceberg] TGooch44 commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363

2019-10-17 Thread GitBox
TGooch44 commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363 URL: https://github.com/apache/incubator-iceberg/pull/551#discussion_r336134669 ## File path: site/docs/python-quickstart.md ## @@ -0,0 +1,40 @@ + + +# Examples + +##

[GitHub] [incubator-iceberg] goldentriangle opened a new issue #555: Iceberg tables should allow for automatic table creation when writing if table not exists already

2019-10-17 Thread GitBox
goldentriangle opened a new issue #555: Iceberg tables should allow for automatic table creation when writing if table not exists already URL: https://github.com/apache/incubator-iceberg/issues/555 I think this is a special case for https://github.com/apache/incubator-iceberg/issues/540.

[GitHub] [incubator-iceberg] TGooch44 commented on a change in pull request #530: [python] adding Hive package to wrap BaseMetastoreTables/TableOperations

2019-10-17 Thread GitBox
TGooch44 commented on a change in pull request #530: [python] adding Hive package to wrap BaseMetastoreTables/TableOperations URL: https://github.com/apache/incubator-iceberg/pull/530#discussion_r336266729 ## File path: python/iceberg/hive/hive_table_operations.py ## @@

[GitHub] [incubator-iceberg] rdblue commented on issue #555: Iceberg tables should allow for automatic table creation when writing if table not exists already

2019-10-17 Thread GitBox
rdblue commented on issue #555: Iceberg tables should allow for automatic table creation when writing if table not exists already URL: https://github.com/apache/incubator-iceberg/issues/555#issuecomment-543421500 We are planning on adding support for the new logical plans in Spark 3.0.

[GitHub] [incubator-iceberg] rdblue closed issue #555: Iceberg tables should allow for automatic table creation when writing if table not exists already

2019-10-17 Thread GitBox
rdblue closed issue #555: Iceberg tables should allow for automatic table creation when writing if table not exists already URL: https://github.com/apache/incubator-iceberg/issues/555 This is an automated message from the

[GitHub] [incubator-iceberg] rdblue commented on issue #553: Spark ReadTask is expensive to serialize

2019-10-17 Thread GitBox
rdblue commented on issue #553: Spark ReadTask is expensive to serialize URL: https://github.com/apache/incubator-iceberg/issues/553#issuecomment-543421665 Using a broadcast sounds good to me for now. Can you open a PR for this?

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363

2019-10-17 Thread GitBox
rdblue commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363 URL: https://github.com/apache/incubator-iceberg/pull/551#discussion_r336278823 ## File path: site/docs/python-quickstart.md ## @@ -0,0 +1,40 @@ + + +# Examples + +##

[GitHub] [incubator-iceberg] rdblue commented on issue #556: Fix Kryo serialization in ParquetUtil.getSplitOffsets

2019-10-17 Thread GitBox
rdblue commented on issue #556: Fix Kryo serialization in ParquetUtil.getSplitOffsets URL: https://github.com/apache/incubator-iceberg/pull/556#issuecomment-543421851 Looks like the failure is checkstyle: ``` [ant:checkstyle] [ERROR]

[GitHub] [incubator-iceberg] rdblue merged pull request #551: [python] First add to docs, addresses #323 and #363

2019-10-17 Thread GitBox
rdblue merged pull request #551: [python] First add to docs, addresses #323 and #363 URL: https://github.com/apache/incubator-iceberg/pull/551 This is an automated message from the Apache Git Service. To respond to the

[GitHub] [incubator-iceberg] feng-tao commented on issue #551: [python] First add to docs, addresses #323 and #363

2019-10-17 Thread GitBox
feng-tao commented on issue #551: [python] First add to docs, addresses #323 and #363 URL: https://github.com/apache/incubator-iceberg/pull/551#issuecomment-543422736 @TGooch44 do you know if we will have a pypi package to try it out?

[GitHub] [incubator-iceberg] rdblue commented on issue #537: Docs: Fix typos

2019-10-17 Thread GitBox
rdblue commented on issue #537: Docs: Fix typos URL: https://github.com/apache/incubator-iceberg/pull/537#issuecomment-543422604 I'm closing this since I think the typo was actually correct and I haven't heard back. Feel free to reopen if you think it still need to be fixed.

[GitHub] [incubator-iceberg] rdblue closed pull request #537: Docs: Fix typos

2019-10-17 Thread GitBox
rdblue closed pull request #537: Docs: Fix typos URL: https://github.com/apache/incubator-iceberg/pull/537 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #529: Add hadoop table catalog (WIP)

2019-10-17 Thread GitBox
rdblue commented on a change in pull request #529: Add hadoop table catalog (WIP) URL: https://github.com/apache/incubator-iceberg/pull/529#discussion_r336279982 ## File path: core/src/main/java/org/apache/iceberg/hadoop/HadoopCatalog.java ## @@ -0,0 +1,144 @@ +/* + *

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #529: Add hadoop table catalog (WIP)

2019-10-17 Thread GitBox
rdblue commented on a change in pull request #529: Add hadoop table catalog (WIP) URL: https://github.com/apache/incubator-iceberg/pull/529#discussion_r336280440 ## File path: core/src/main/java/org/apache/iceberg/hadoop/HadoopCatalog.java ## @@ -0,0 +1,144 @@ +/* + *

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #529: Add hadoop table catalog (WIP)

2019-10-17 Thread GitBox
rdblue commented on a change in pull request #529: Add hadoop table catalog (WIP) URL: https://github.com/apache/incubator-iceberg/pull/529#discussion_r336280536 ## File path: core/src/main/java/org/apache/iceberg/hadoop/HadoopCatalog.java ## @@ -0,0 +1,144 @@ +/* + *

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #529: Add hadoop table catalog (WIP)

2019-10-17 Thread GitBox
rdblue commented on a change in pull request #529: Add hadoop table catalog (WIP) URL: https://github.com/apache/incubator-iceberg/pull/529#discussion_r336280691 ## File path: core/src/main/java/org/apache/iceberg/hadoop/HadoopCatalog.java ## @@ -0,0 +1,144 @@ +/* + *

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #529: Add hadoop table catalog (WIP)

2019-10-17 Thread GitBox
rdblue commented on a change in pull request #529: Add hadoop table catalog (WIP) URL: https://github.com/apache/incubator-iceberg/pull/529#discussion_r336281218 ## File path: core/src/main/java/org/apache/iceberg/hadoop/HadoopCatalog.java ## @@ -0,0 +1,144 @@ +/* + *

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #529: Add hadoop table catalog (WIP)

2019-10-17 Thread GitBox
rdblue commented on a change in pull request #529: Add hadoop table catalog (WIP) URL: https://github.com/apache/incubator-iceberg/pull/529#discussion_r336280986 ## File path: core/src/main/java/org/apache/iceberg/hadoop/HadoopCatalog.java ## @@ -0,0 +1,144 @@ +/* + *

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #530: [python] adding Hive package to wrap BaseMetastoreTables/TableOperations

2019-10-17 Thread GitBox
rdblue commented on a change in pull request #530: [python] adding Hive package to wrap BaseMetastoreTables/TableOperations URL: https://github.com/apache/incubator-iceberg/pull/530#discussion_r336240323 ## File path: python/iceberg/hive/hive_table_operations.py ## @@

[GitHub] [incubator-iceberg] manishmalhotrawork commented on a change in pull request #529: Add hadoop table catalog (WIP)

2019-10-17 Thread GitBox
manishmalhotrawork commented on a change in pull request #529: Add hadoop table catalog (WIP) URL: https://github.com/apache/incubator-iceberg/pull/529#discussion_r336255270 ## File path: core/src/main/java/org/apache/iceberg/hadoop/HadoopCatalog.java ## @@ -0,0 +1,144 @@

[GitHub] [incubator-iceberg] manishmalhotrawork commented on a change in pull request #529: Add hadoop table catalog (WIP)

2019-10-17 Thread GitBox
manishmalhotrawork commented on a change in pull request #529: Add hadoop table catalog (WIP) URL: https://github.com/apache/incubator-iceberg/pull/529#discussion_r336257210 ## File path: core/src/main/java/org/apache/iceberg/hadoop/HadoopCatalog.java ## @@ -0,0 +1,144 @@

[GitHub] [incubator-iceberg] jzhuge opened a new pull request #556: Fix Kryo serialization in ParquetUtil.getSplitOffsets

2019-10-17 Thread GitBox
jzhuge opened a new pull request #556: Fix Kryo serialization in ParquetUtil.getSplitOffsets URL: https://github.com/apache/incubator-iceberg/pull/556 Found it during integration with downstream Spark 2.3 branch. Added a unit test.

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363

2019-10-17 Thread GitBox
rdblue commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363 URL: https://github.com/apache/incubator-iceberg/pull/551#discussion_r336238243 ## File path: python/README.md ## @@ -15,6 +15,26 @@ - limitations under the License.

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363

2019-10-17 Thread GitBox
rdblue commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363 URL: https://github.com/apache/incubator-iceberg/pull/551#discussion_r336238079 ## File path: site/docs/python-quickstart.md ## @@ -0,0 +1,40 @@ + + +# Examples + +##

[GitHub] [incubator-iceberg] manishmalhotrawork commented on a change in pull request #529: Add hadoop table catalog (WIP)

2019-10-17 Thread GitBox
manishmalhotrawork commented on a change in pull request #529: Add hadoop table catalog (WIP) URL: https://github.com/apache/incubator-iceberg/pull/529#discussion_r336260283 ## File path: core/src/main/java/org/apache/iceberg/hadoop/HadoopCatalog.java ## @@ -0,0 +1,142 @@