[GitHub] [hudi] leesf commented on a change in pull request #1727: [WIP] [Review] refactor hudi-client

2020-06-24 Thread GitBox
leesf commented on a change in pull request #1727: URL: https://github.com/apache/hudi/pull/1727#discussion_r444761321 ## File path: hudi-client/hudi-client-spark/src/main/java/org/apache/hudi/table/HoodieSparkTableFactory.java ## @@ -0,0 +1,56 @@ +/* + * Licensed to the

[GitHub] [hudi] bvaradar commented on a change in pull request #1687: [WIP] [HUDI-684] Introduced abstraction for writing and reading different types of base file formats.

2020-06-24 Thread GitBox
bvaradar commented on a change in pull request #1687: URL: https://github.com/apache/hudi/pull/1687#discussion_r444687099 ## File path: hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieParquetRealtimeInputFormat.java ## @@ -80,58 +77,6 @@ protected

[GitHub] [hudi] leesf commented on a change in pull request #1727: [WIP] [Review] refactor hudi-client

2020-06-24 Thread GitBox
leesf commented on a change in pull request #1727: URL: https://github.com/apache/hudi/pull/1727#discussion_r444755524 ## File path: hudi-client/hudi-client-spark/src/main/java/org/apache/hudi/table/HoodieSparkTableFactory.java ## @@ -0,0 +1,56 @@ +/* + * Licensed to the

[jira] [Assigned] (HUDI-1046) Support updates during clustering in CoW mode

2020-06-24 Thread Hong Shen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Shen reassigned HUDI-1046: --- Assignee: Hong Shen > Support updates during clustering in CoW mode >

[GitHub] [hudi] bvaradar commented on a change in pull request #1687: [WIP] [HUDI-684] Introduced abstraction for writing and reading different types of base file formats.

2020-06-24 Thread GitBox
bvaradar commented on a change in pull request #1687: URL: https://github.com/apache/hudi/pull/1687#discussion_r444684726 ## File path: hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java ## @@ -146,21 +146,22 @@ private void syncSchema(String tableName,

[GitHub] [hudi] bvaradar commented on a change in pull request #1687: [WIP] [HUDI-684] Introduced abstraction for writing and reading different types of base file formats.

2020-06-24 Thread GitBox
bvaradar commented on a change in pull request #1687: URL: https://github.com/apache/hudi/pull/1687#discussion_r444691759 ## File path: hudi-common/src/main/java/org/apache/hudi/common/util/ParquetReaderIterator.java ## @@ -16,7 +16,7 @@ * limitations under the License.

[GitHub] [hudi] leesf commented on a change in pull request #1727: [WIP] [Review] refactor hudi-client

2020-06-24 Thread GitBox
leesf commented on a change in pull request #1727: URL: https://github.com/apache/hudi/pull/1727#discussion_r444754297 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/index/BaseHoodieIndex.java ## @@ -18,90 +18,45 @@ package

[GitHub] [hudi] leesf commented on a change in pull request #1727: [WIP] [Review] refactor hudi-client

2020-06-24 Thread GitBox
leesf commented on a change in pull request #1727: URL: https://github.com/apache/hudi/pull/1727#discussion_r444763429 ## File path: hudi-client/hudi-client-spark/src/main/java/org/apache/hudi/table/action/commit/WriteHelper.java ## @@ -18,27 +18,32 @@ package

[GitHub] [hudi] bvaradar commented on a change in pull request #1687: [WIP] [HUDI-684] Introduced abstraction for writing and reading different types of base file formats.

2020-06-24 Thread GitBox
bvaradar commented on a change in pull request #1687: URL: https://github.com/apache/hudi/pull/1687#discussion_r444684726 ## File path: hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java ## @@ -146,21 +146,22 @@ private void syncSchema(String tableName,

[GitHub] [hudi] christoph-wmt closed issue #1758: [SUPPORT] building InMemoryFileIndex slow with increase target table partitions

2020-06-24 Thread GitBox
christoph-wmt closed issue #1758: URL: https://github.com/apache/hudi/issues/1758 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [hudi] christoph-wmt commented on issue #1758: [SUPPORT] building InMemoryFileIndex slow with increase target table partitions

2020-06-24 Thread GitBox
christoph-wmt commented on issue #1758: URL: https://github.com/apache/hudi/issues/1758#issuecomment-648698394 oh really, my assumption was it would only make it in 6.0. It turns out we are on 5.0 and adding above fix into our build resolved the issue. We still have significant time

[GitHub] [hudi] vinothchandar commented on issue #1758: [SUPPORT] building InMemoryFileIndex slow with increase target table partitions

2020-06-24 Thread GitBox
vinothchandar commented on issue #1758: URL: https://github.com/apache/hudi/issues/1758#issuecomment-648872791 @christoph-wmt Good to know.. My guess is EMR picked up 0.5.0 even though you put in 0.5.3? we are all working on 0.6.0 release where all of this is going to be much

[GitHub] [hudi] vinothchandar commented on pull request #1721: [WIP] [HUDI-1041] Cache the explodeRecordRDDWithFileComparisons instead of commuting it…

2020-06-24 Thread GitBox
vinothchandar commented on pull request #1721: URL: https://github.com/apache/hudi/pull/1721#issuecomment-648885633 > Regarding sampling, what if some of the partitions are skewed? Will that cause more overhead than flush the file out? IIRC the partitionRecordKeyPairRDD would have

[GitHub] [hudi] bobgalvao commented on issue #1723: [SUPPORT] - trouble using Apache Hudi with S3.

2020-06-24 Thread GitBox
bobgalvao commented on issue #1723: URL: https://github.com/apache/hudi/issues/1723#issuecomment-648887040 Hi, I performed the migration to version 0.5.2 and no longer got the errors. Thanks for the support. This is an

[GitHub] [hudi] bobgalvao closed issue #1723: [SUPPORT] - trouble using Apache Hudi with S3.

2020-06-24 Thread GitBox
bobgalvao closed issue #1723: URL: https://github.com/apache/hudi/issues/1723 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] vinothchandar commented on issue #1763: [SUPPORT] Hudi total upsert time is twice than the individual jobs time in Spark UI added together

2020-06-24 Thread GitBox
vinothchandar commented on issue #1763: URL: https://github.com/apache/hudi/issues/1763#issuecomment-648886584 @venkee14 can you give 0.5.3 a shot? it has bunch of perf fixes that might help you.. This is an automated

[jira] [Updated] (HUDI-1049) In inline compaction mode, previously failed compactions needs to be retried before new compactions

2020-06-24 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1049: - Labels: pull-request-available (was: ) > In inline compaction mode, previously failed

[GitHub] [hudi] bvaradar opened a new pull request #1765: [HUDI-1049] In inline compaction mode, previously failed compactions needs to be retried before new compactions

2020-06-24 Thread GitBox
bvaradar opened a new pull request #1765: URL: https://github.com/apache/hudi/pull/1765 This is a patch in 0.5.3 to unblock user noticing this issue - [HUDI-1049] In inline compaction mode, previously failed compactions needs to be retried before new compactions

[GitHub] [hudi] bvaradar commented on issue #1764: [SUPPORT] Commits stays INFLIGHT forever after S3 consistency check fails when Hudi tries to delete duplicate datafiles

2020-06-24 Thread GitBox
bvaradar commented on issue #1764: URL: https://github.com/apache/hudi/issues/1764#issuecomment-648921677 Thanks @zuyanton for reporting the issue. Regarding failed compaction jobs getting retried - As async compaction is the usual compaction mode run by users, we did not notice this

[GitHub] [hudi] vinothchandar commented on issue #1723: [SUPPORT] - trouble using Apache Hudi with S3.

2020-06-24 Thread GitBox
vinothchandar commented on issue #1723: URL: https://github.com/apache/hudi/issues/1723#issuecomment-64320 @bobgalvao we also have a 0.5.3 out, with a bunch of perf fixes.. Might be going there directly as we cook 0.6.0 :)

[GitHub] [hudi] venkee14 commented on issue #1763: [SUPPORT] Hudi total upsert time is twice than the individual jobs time in Spark UI added together

2020-06-24 Thread GitBox
venkee14 commented on issue #1763: URL: https://github.com/apache/hudi/issues/1763#issuecomment-648916677 @vinothchandar : Thanks will try that and report back This is an automated message from the Apache Git Service. To

[jira] [Updated] (HUDI-1049) In inline compaction mode, previously failed compactions needs to be retried before new compactions

2020-06-24 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-1049: - Status: Open (was: New) > In inline compaction mode, previously failed compactions needs

[jira] [Updated] (HUDI-1049) In inline compaction mode, previously failed compactions needs to be retried before new compactions

2020-06-24 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-1049: - Status: In Progress (was: Open) > In inline compaction mode, previously failed

[jira] [Created] (HUDI-1049) In inline compaction mode, previously failed compactions needs to be retried before new compactions

2020-06-24 Thread Balaji Varadarajan (Jira)
Balaji Varadarajan created HUDI-1049: Summary: In inline compaction mode, previously failed compactions needs to be retried before new compactions Key: HUDI-1049 URL:

[GitHub] [hudi] zuyanton commented on issue #1764: [SUPPORT] Commits stays INFLIGHT forever after S3 consistency check fails when Hudi tries to delete duplicate datafiles

2020-06-24 Thread GitBox
zuyanton commented on issue #1764: URL: https://github.com/apache/hudi/issues/1764#issuecomment-648966026 Thank you for quick response. @vinothchandar posting hoodie.properties file zipped. However when I open file on my end with sublime I only see bunch of hex numbers, not sure if

[GitHub] [hudi] vinothchandar edited a comment on issue #1764: [SUPPORT] Commits stays INFLIGHT forever after S3 consistency check fails when Hudi tries to delete duplicate datafiles

2020-06-24 Thread GitBox
vinothchandar edited a comment on issue #1764: URL: https://github.com/apache/hudi/issues/1764#issuecomment-648882567 @zuyanton thanks for reporting this.. let's work together to resolve this. can you please paste the `.hoodie/hoodie.properties` file? The .inflight file hanging around

[GitHub] [hudi] vinothchandar commented on issue #1764: [SUPPORT] Commits stays INFLIGHT forever after S3 consistency check fails when Hudi tries to delete duplicate datafiles

2020-06-24 Thread GitBox
vinothchandar commented on issue #1764: URL: https://github.com/apache/hudi/issues/1764#issuecomment-648882567 @zuyanton thanks for reporting this.. let's work together to resolve this. can you please paste the `.hoodie/hoodie.properties` file? The .inflight file hanging around could be

[GitHub] [hudi] xushiyan commented on pull request #1732: [HUDI-1004] Support update metrics in HoodieDeltaStreamerMetrics

2020-06-24 Thread GitBox
xushiyan commented on pull request #1732: URL: https://github.com/apache/hudi/pull/1732#issuecomment-649085031 @shenh062326 would be nice to have this tested out in real setup..ignore if you already did it. thanks This is

[jira] [Resolved] (HUDI-949) Test MOR : Hive Realtime Query with metadata bootstrap

2020-06-24 Thread Wenning Ding (Jira)
[ https://issues.apache.org/jira/browse/HUDI-949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenning Ding resolved HUDI-949. --- Resolution: Fixed > Test MOR : Hive Realtime Query with metadata bootstrap >

[jira] [Resolved] (HUDI-950) Test COW : Spark SQL Read Optimized Query with metadata bootstrap

2020-06-24 Thread Wenning Ding (Jira)
[ https://issues.apache.org/jira/browse/HUDI-950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenning Ding resolved HUDI-950. --- Resolution: Fixed > Test COW : Spark SQL Read Optimized Query with metadata bootstrap >

[GitHub] [hudi] v3nkatesh commented on pull request #1484: [HUDI-316] : Hbase qps repartition writestatus

2020-06-24 Thread GitBox
v3nkatesh commented on pull request #1484: URL: https://github.com/apache/hudi/pull/1484#issuecomment-649156492 > In any case, we need unit tests for the RateLimiter class.. > Few alternatives .. > You can maintain the index outside Hudi (index classes are pluggable) > I can write

[GitHub] [hudi] vinothchandar commented on a change in pull request #1765: [HUDI-1049] 0.5.3 Patch - In inline compaction mode, previously failed compactions needs to be retried before new compactions

2020-06-24 Thread GitBox
vinothchandar commented on a change in pull request #1765: URL: https://github.com/apache/hudi/pull/1765#discussion_r445214053 ## File path: hudi-client/src/main/java/org/apache/hudi/client/HoodieWriteClient.java ## @@ -1147,6 +1148,18 @@ private HoodieCommitMetadata

[GitHub] [hudi] shenh062326 commented on pull request #1732: [HUDI-1004] Support update metrics in HoodieDeltaStreamerMetrics

2020-06-24 Thread GitBox
shenh062326 commented on pull request #1732: URL: https://github.com/apache/hudi/pull/1732#issuecomment-649189110 > @shenh062326 would be nice to have this tested out in real setup..ignore if you already did it. thanks I have test it and it works.

[jira] [Updated] (HUDI-956) Test COW : Presto Realtime Query with metadata bootstrap

2020-06-24 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-956: Status: Open (was: New) > Test COW : Presto Realtime Query with metadata bootstrap >

[jira] [Assigned] (HUDI-955) Test MOR : Presto Read Optimized Query with metadata bootstrap

2020-06-24 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan reassigned HUDI-955: --- Assignee: Wenning Ding > Test MOR : Presto Read Optimized Query with metadata

[jira] [Updated] (HUDI-954) Test COW : Presto Read Optimized Query with metadata bootstrap

2020-06-24 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-954: Status: In Progress (was: Open) > Test COW : Presto Read Optimized Query with metadata

[jira] [Assigned] (HUDI-954) Test COW : Presto Read Optimized Query with metadata bootstrap

2020-06-24 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan reassigned HUDI-954: --- Assignee: Wenning Ding > Test COW : Presto Read Optimized Query with metadata

[jira] [Updated] (HUDI-956) Test COW : Presto Realtime Query with metadata bootstrap

2020-06-24 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-956: Status: In Progress (was: Open) > Test COW : Presto Realtime Query with metadata bootstrap

[jira] [Updated] (HUDI-955) Test MOR : Presto Read Optimized Query with metadata bootstrap

2020-06-24 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-955: Status: Open (was: New) > Test MOR : Presto Read Optimized Query with metadata bootstrap >

[jira] [Assigned] (HUDI-956) Test COW : Presto Realtime Query with metadata bootstrap

2020-06-24 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan reassigned HUDI-956: --- Assignee: Wenning Ding > Test COW : Presto Realtime Query with metadata bootstrap >

[jira] [Updated] (HUDI-955) Test MOR : Presto Read Optimized Query with metadata bootstrap

2020-06-24 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-955: Status: In Progress (was: Open) > Test MOR : Presto Read Optimized Query with metadata

[jira] [Updated] (HUDI-954) Test COW : Presto Read Optimized Query with metadata bootstrap

2020-06-24 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-954: Status: Open (was: New) > Test COW : Presto Read Optimized Query with metadata bootstrap >

[GitHub] [hudi] garyli1019 commented on a change in pull request #1722: [HUDI-69] Support Spark Datasource for MOR table

2020-06-24 Thread GitBox
garyli1019 commented on a change in pull request #1722: URL: https://github.com/apache/hudi/pull/1722#discussion_r445313298 ## File path: hudi-spark/src/main/scala/org/apache/hudi/SnapshotRelation.scala ## @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation

[jira] [Updated] (HUDI-1052) Support vectorized reader for MOR datasource reader

2020-06-24 Thread Yanjia Gary Li (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanjia Gary Li updated HUDI-1052: - Status: Open (was: New) > Support vectorized reader for MOR datasource reader >

[jira] [Updated] (HUDI-1051) Improve MOR datasource reader file listing and path handling

2020-06-24 Thread Yanjia Gary Li (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanjia Gary Li updated HUDI-1051: - Status: Open (was: New) > Improve MOR datasource reader file listing and path handling >

[GitHub] [hudi] garyli1019 commented on pull request #1722: [HUDI-69] Support Spark Datasource for MOR table

2020-06-24 Thread GitBox
garyli1019 commented on pull request #1722: URL: https://github.com/apache/hudi/pull/1722#issuecomment-649235630 @vinothchandar Thanks for reviewing! I created tickets for the follow-up work. All the file listing and globing can be improved after @umehrot2 's PR merged.

[GitHub] [hudi] garyli1019 commented on a change in pull request #1722: [HUDI-69] Support Spark Datasource for MOR table

2020-06-24 Thread GitBox
garyli1019 commented on a change in pull request #1722: URL: https://github.com/apache/hudi/pull/1722#discussion_r445306899 ## File path: hudi-spark/src/main/scala/org/apache/hudi/SnapshotRelation.scala ## @@ -0,0 +1,132 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [hudi] garyli1019 commented on a change in pull request #1722: [HUDI-69] Support Spark Datasource for MOR table

2020-06-24 Thread GitBox
garyli1019 commented on a change in pull request #1722: URL: https://github.com/apache/hudi/pull/1722#discussion_r445306684 ## File path: hudi-spark/src/main/scala/org/apache/hudi/SnapshotRelation.scala ## @@ -0,0 +1,133 @@ +/* + * Licensed to the Apache Software Foundation

[jira] [Created] (HUDI-1052) Support vectorized reader for MOR datasource reader

2020-06-24 Thread Yanjia Gary Li (Jira)
Yanjia Gary Li created HUDI-1052: Summary: Support vectorized reader for MOR datasource reader Key: HUDI-1052 URL: https://issues.apache.org/jira/browse/HUDI-1052 Project: Apache Hudi Issue

[jira] [Created] (HUDI-1051) Improve MOR datasource reader file listing and path handling

2020-06-24 Thread Yanjia Gary Li (Jira)
Yanjia Gary Li created HUDI-1051: Summary: Improve MOR datasource reader file listing and path handling Key: HUDI-1051 URL: https://issues.apache.org/jira/browse/HUDI-1051 Project: Apache Hudi

[jira] [Created] (HUDI-1050) Support filter pushdown and column pruning for MOR table on Spark Datasource

2020-06-24 Thread Yanjia Gary Li (Jira)
Yanjia Gary Li created HUDI-1050: Summary: Support filter pushdown and column pruning for MOR table on Spark Datasource Key: HUDI-1050 URL: https://issues.apache.org/jira/browse/HUDI-1050 Project:

[GitHub] [hudi] garyli1019 commented on a change in pull request #1722: [HUDI-69] Support Spark Datasource for MOR table

2020-06-24 Thread GitBox
garyli1019 commented on a change in pull request #1722: URL: https://github.com/apache/hudi/pull/1722#discussion_r445305619 ## File path: hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala ## @@ -57,8 +57,7 @@ class DefaultSource extends RelationProvider if

Build failed in Jenkins: hudi-snapshot-deployment-0.5 #319

2020-06-24 Thread Apache Jenkins Server
See Changes: -- [...truncated 2.37 KB...] settings.xml toolchains.xml /home/jenkins/tools/maven/apache-maven-3.5.4/conf/logging: simplelogger.properties

[GitHub] [hudi] garyli1019 commented on a change in pull request #1722: [HUDI-69] Support Spark Datasource for MOR table

2020-06-24 Thread GitBox
garyli1019 commented on a change in pull request #1722: URL: https://github.com/apache/hudi/pull/1722#discussion_r445318518 ## File path: hudi-spark/src/main/scala/org/apache/hudi/SnapshotRelation.scala ## @@ -0,0 +1,139 @@ +/* + * Licensed to the Apache Software Foundation

[jira] [Updated] (HUDI-1050) Support filter pushdown and column pruning for MOR table on Spark Datasource

2020-06-24 Thread Yanjia Gary Li (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanjia Gary Li updated HUDI-1050: - Status: Open (was: New) > Support filter pushdown and column pruning for MOR table on Spark