[GitHub] [hudi] vinothchandar commented on a change in pull request #1100: [HUDI-289] Implement a test suite to support long running test for Hudi writing and querying end-end

2020-07-20 Thread GitBox
vinothchandar commented on a change in pull request #1100: URL: https://github.com/apache/hudi/pull/1100#discussion_r457851643 ## File path: hudi-hadoop-mr/pom.xml ## @@ -125,6 +125,10 @@ mockito-junit-jupiter test + + org.mockito +

[jira] [Resolved] (HUDI-839) Implement rollbacks using marker files instead of relying on commit metadata

2020-07-20 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar resolved HUDI-839. - Resolution: Fixed > Implement rollbacks using marker files instead of relying on commit metadata >

[jira] [Reopened] (HUDI-839) Implement rollbacks using marker files instead of relying on commit metadata

2020-07-20 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar reopened HUDI-839: - > Implement rollbacks using marker files instead of relying on commit metadata >

[jira] [Updated] (HUDI-839) Implement rollbacks using marker files instead of relying on commit metadata

2020-07-20 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-839: Status: Closed (was: Patch Available) > Implement rollbacks using marker files instead of relying

[GitHub] [hudi] vinothchandar merged pull request #1756: [HUDI-839] Introducing support for rollbacks using marker files

2020-07-20 Thread GitBox
vinothchandar merged pull request #1756: URL: https://github.com/apache/hudi/pull/1756 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[jira] [Comment Edited] (HUDI-871) Add support for Tencent cloud COS

2020-07-20 Thread leesf (Jira)
[ https://issues.apache.org/jira/browse/HUDI-871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161718#comment-17161718 ] leesf edited comment on HUDI-871 at 7/21/20, 5:35 AM: -- [~meimile] Sure, assigned to

[jira] [Assigned] (HUDI-871) Add support for Tencent cloud COS

2020-07-20 Thread leesf (Jira)
[ https://issues.apache.org/jira/browse/HUDI-871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf reassigned HUDI-871: -- Assignee: deyzhong > Add support for Tencent cloud COS > - > >

[jira] [Commented] (HUDI-871) Add support for Tencent cloud COS

2020-07-20 Thread leesf (Jira)
[ https://issues.apache.org/jira/browse/HUDI-871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161718#comment-17161718 ] leesf commented on HUDI-871: [~meimile] Sure, feel free to open a new PR. > Add support for Tencent cloud COS

[GitHub] [hudi] garyli1019 commented on pull request #1848: [HUDI-69] Support Spark Datasource for MOR table - RDD approach

2020-07-20 Thread GitBox
garyli1019 commented on pull request #1848: URL: https://github.com/apache/hudi/pull/1848#issuecomment-661642358 @vinothchandar @umehrot2 Ready for review. Thanks! This is an automated message from the Apache Git Service. To

[GitHub] [hudi] garyli1019 commented on a change in pull request #1848: [HUDI-69] Support Spark Datasource for MOR table - RDD approach

2020-07-20 Thread GitBox
garyli1019 commented on a change in pull request #1848: URL: https://github.com/apache/hudi/pull/1848#discussion_r457844997 ## File path: hudi-spark/src/main/scala/org/apache/hudi/SnapshotRelation.scala ## @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [hudi] vinothchandar commented on a change in pull request #1149: [HUDI-472] Introduce configurations and new modes of sorting for bulk_insert

2020-07-20 Thread GitBox
vinothchandar commented on a change in pull request #1149: URL: https://github.com/apache/hudi/pull/1149#discussion_r457839262 ## File path: hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java ## @@ -245,6 +250,16 @@ public int

[GitHub] [hudi] vinothchandar commented on pull request #1756: [HUDI-839] Introducing support for rollbacks using marker files

2020-07-20 Thread GitBox
vinothchandar commented on pull request #1756: URL: https://github.com/apache/hudi/pull/1756#issuecomment-661633207 @lw309637554 Looks good. Planning to merge after CI passes this time.. Thanks a lot of for your contributions. This is one very important PR !

[jira] [Commented] (HUDI-871) Add support for Tencent cloud COS

2020-07-20 Thread deyzhong (Jira)
[ https://issues.apache.org/jira/browse/HUDI-871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161690#comment-17161690 ] deyzhong commented on HUDI-871: --- I have solved this problem. Can I submit this PR? [~xleesf] [~felixzheng]

Build failed in Jenkins: hudi-snapshot-deployment-0.5 #345

2020-07-20 Thread Apache Jenkins Server
See Changes: -- [...truncated 2.41 KB...] /home/jenkins/tools/maven/apache-maven-3.5.4/conf/logging: simplelogger.properties

[GitHub] [hudi] qingyuan18 opened a new issue #1854: query MOR table using spark sql error

2020-07-20 Thread GitBox
qingyuan18 opened a new issue #1854: URL: https://github.com/apache/hudi/issues/1854 version using JDK: Jdk 1.8.0_242 Scala: 2.11.12 Spark: 2.4.0 Hudi Spark bundle: 0.5.2-incubating Steps to reproduce the behavior: 1. create managed hive table 2. using Spark

[GitHub] [hudi] zherenyu831 commented on pull request #1851: [HUDI-1113] Add user define metrics reporter

2020-07-20 Thread GitBox
zherenyu831 commented on pull request #1851: URL: https://github.com/apache/hudi/pull/1851#issuecomment-661596798 @leesf Fixed, please check This is an automated message from the Apache Git Service. To respond to the

[GitHub] [hudi] henrywu2019 commented on a change in pull request #1827: [HUDI-1089] Refactor hudi-client to support multi-engine

2020-07-20 Thread GitBox
henrywu2019 commented on a change in pull request #1827: URL: https://github.com/apache/hudi/pull/1827#discussion_r457798426 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/common/HoodieEngineContext.java ## @@ -0,0 +1,48 @@ +/* + * Licensed to the

[GitHub] [hudi] xushiyan commented on a change in pull request #1849: [WIP] Externalize test classes' configs

2020-07-20 Thread GitBox
xushiyan commented on a change in pull request #1849: URL: https://github.com/apache/hudi/pull/1849#discussion_r457785815 ## File path: hudi-client/src/test/resources/org/apache/hudi/index/hbase/TestHBaseIndex.properties ## @@ -0,0 +1,38 @@ +# +# Licensed to the Apache

[GitHub] [hudi] xushiyan closed pull request #1849: [WIP] Externalize test classes' configs

2020-07-20 Thread GitBox
xushiyan closed pull request #1849: URL: https://github.com/apache/hudi/pull/1849 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [hudi] Mathieu1124 commented on a change in pull request #1842: [HUDI-1037]Introduce a write committed callback hook

2020-07-20 Thread GitBox
Mathieu1124 commented on a change in pull request #1842: URL: https://github.com/apache/hudi/pull/1842#discussion_r457778836 ## File path: hudi-client/src/main/java/org/apache/hudi/callback/impl/HoodieHttpWriteCommitCallback.java ## @@ -0,0 +1,63 @@ +/* + * Licensed to the

[GitHub] [hudi] yanghua commented on a change in pull request #1842: [HUDI-1037]Introduce a write committed callback hook

2020-07-20 Thread GitBox
yanghua commented on a change in pull request #1842: URL: https://github.com/apache/hudi/pull/1842#discussion_r457769421 ## File path: hudi-client/src/main/java/org/apache/hudi/callback/HoodieWriteCommitCallback.java ## @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software

[GitHub] [hudi] vinothchandar commented on pull request #1827: [HUDI-1089] Refactor hudi-client to support multi-engine

2020-07-20 Thread GitBox
vinothchandar commented on pull request #1827: URL: https://github.com/apache/hudi/pull/1827#issuecomment-661504763 @satishkotha @nbalajee @prashantwason @modi95 please take a look as well. This is an automated message from

[jira] [Updated] (HUDI-845) Allow parallel writing and move the pending rollback work into cleaner

2020-07-20 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-845: Status: Open (was: New) > Allow parallel writing and move the pending rollback work into cleaner >

[jira] [Updated] (HUDI-1098) Marker file finalizing may block on a data file that was never written

2020-07-20 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1098: - Status: In Progress (was: Open) > Marker file finalizing may block on a data file that was never

[jira] [Updated] (HUDI-1098) Marker file finalizing may block on a data file that was never written

2020-07-20 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1098: - Status: Open (was: New) > Marker file finalizing may block on a data file that was never written

[jira] [Assigned] (HUDI-845) Allow parallel writing and move the pending rollback work into cleaner

2020-07-20 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar reassigned HUDI-845: --- Assignee: Vinoth Chandar > Allow parallel writing and move the pending rollback work into

[jira] [Updated] (HUDI-1014) Design and Implement upgrade-downgrade infrastrucutre

2020-07-20 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1014: - Status: In Progress (was: Open) > Design and Implement upgrade-downgrade infrastrucutre >

[jira] [Updated] (HUDI-1014) Design and Implement upgrade-downgrade infrastrucutre

2020-07-20 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1014: - Status: Open (was: New) > Design and Implement upgrade-downgrade infrastrucutre >

[jira] [Updated] (HUDI-1049) In inline compaction mode, previously failed compactions needs to be retried before new compactions

2020-07-20 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1049: - Status: Patch Available (was: In Progress) > In inline compaction mode, previously failed

[jira] [Commented] (HUDI-1049) In inline compaction mode, previously failed compactions needs to be retried before new compactions

2020-07-20 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161625#comment-17161625 ] Vinoth Chandar commented on HUDI-1049: -- Need to add a test and retarget for master/0.6.0 > In inline

[jira] [Updated] (HUDI-1013) Bulk Insert w/o converting to RDD

2020-07-20 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1013: - Status: Patch Available (was: In Progress) > Bulk Insert w/o converting to RDD >

[jira] [Updated] (HUDI-651) Incremental Query on Hive via Spark SQL does not return expected results

2020-07-20 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-651: Status: In Progress (was: Open) > Incremental Query on Hive via Spark SQL does not return expected

[jira] [Updated] (HUDI-651) Incremental Query on Hive via Spark SQL does not return expected results

2020-07-20 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-651: Status: Patch Available (was: In Progress) > Incremental Query on Hive via Spark SQL does not

[jira] [Updated] (HUDI-472) Make sortBy() inside bulkInsertInternal() configurable for bulk_insert

2020-07-20 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-472: Status: Patch Available (was: In Progress) > Make sortBy() inside bulkInsertInternal() configurable

[jira] [Updated] (HUDI-305) Presto MOR "_rt" queries only reads base parquet file

2020-07-20 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-305: Status: Patch Available (was: In Progress) > Presto MOR "_rt" queries only reads base parquet file

[jira] [Assigned] (HUDI-1015) Audit all getAllPartitionPaths() calls and keep em out of fast path

2020-07-20 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar reassigned HUDI-1015: Assignee: Balaji Varadarajan (was: Vinoth Chandar) > Audit all getAllPartitionPaths()

[jira] [Assigned] (HUDI-575) Support Async Compaction for spark streaming writes to hudi table

2020-07-20 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar reassigned HUDI-575: --- Assignee: Balaji Varadarajan (was: Vinoth Chandar) > Support Async Compaction for spark

[GitHub] [hudi] bvaradar commented on issue #1847: [SUPPORT] querying MoR tables on S3 becomes slow with number of files growing

2020-07-20 Thread GitBox
bvaradar commented on issue #1847: URL: https://github.com/apache/hudi/issues/1847#issuecomment-661461345 @zuyanton : I am not sure if I can find the source code of this class. @umehrot2 : Can you let me know if the current implementation of FileStatus returned S3NativeFileSystem

[GitHub] [hudi] bvaradar commented on issue #1846: [SUPPORT] HoodieSnapshotCopier example

2020-07-20 Thread GitBox
bvaradar commented on issue #1846: URL: https://github.com/apache/hudi/issues/1846#issuecomment-661457074 @xushiyan : As you are familiar with this part, would you be able to help answer this question ? This is an

[jira] [Updated] (HUDI-1115) Setup and run long running streaming job in AWS environment

2020-07-20 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-1115: - Status: In Progress (was: Open) > Setup and run long running streaming job in AWS

[jira] [Updated] (HUDI-1115) Setup and run long running streaming job in AWS environment

2020-07-20 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-1115: - Status: Open (was: New) > Setup and run long running streaming job in AWS environment >

[jira] [Assigned] (HUDI-1115) Setup and run long running streaming job in AWS environment

2020-07-20 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan reassigned HUDI-1115: Assignee: Balaji Varadarajan > Setup and run long running streaming job in AWS

[jira] [Created] (HUDI-1115) Setup and run long running streaming job in AWS environment

2020-07-20 Thread Balaji Varadarajan (Jira)
Balaji Varadarajan created HUDI-1115: Summary: Setup and run long running streaming job in AWS environment Key: HUDI-1115 URL: https://issues.apache.org/jira/browse/HUDI-1115 Project: Apache Hudi

[GitHub] [hudi] nsivabalan commented on pull request #1149: [HUDI-472] Introduce configurations and new modes of sorting for bulk_insert

2020-07-20 Thread GitBox
nsivabalan commented on pull request #1149: URL: https://github.com/apache/hudi/pull/1149#issuecomment-661451741 sure, thanks. Once done, do ping me and vinoth for review. This is an automated message from the Apache Git

[GitHub] [hudi] yihua commented on pull request #1149: [HUDI-472] Introduce configurations and new modes of sorting for bulk_insert

2020-07-20 Thread GitBox
yihua commented on pull request #1149: URL: https://github.com/apache/hudi/pull/1149#issuecomment-661421240 @nsivabalan Thanks for the fix. There is some ad-hoc code in this PR just for testing. Let me clean that up. This

[jira] [Updated] (HUDI-1072) Reader changes to support clustering and insert overwrite

2020-07-20 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1072: - Labels: pull-request-available (was: ) > Reader changes to support clustering and insert

[GitHub] [hudi] satishkotha opened a new pull request #1853: [HUDI-1072] Add replace metadata file to timeline

2020-07-20 Thread GitBox
satishkotha opened a new pull request #1853: URL: https://github.com/apache/hudi/pull/1853 ## What is the purpose of the pull request This is part of work required for RFC-18 and RFC-19. Add replace action to valid actions in the timeline. To keep the diff small and get

[hudi] branch hudi_test_suite_refactor updated (5cdfbe0 -> 8980e09)

2020-07-20 Thread nagarwal
This is an automated email from the ASF dual-hosted git repository. nagarwal pushed a change to branch hudi_test_suite_refactor in repository https://gitbox.apache.org/repos/asf/hudi.git. discard 5cdfbe0 [HUDI-394] Provide a basic implementation of test suite add 8980e09 [HUDI-394]

[GitHub] [hudi] vinothchandar commented on a change in pull request #1849: [WIP] Externalize test classes' configs

2020-07-20 Thread GitBox
vinothchandar commented on a change in pull request #1849: URL: https://github.com/apache/hudi/pull/1849#discussion_r457717555 ## File path: hudi-client/src/test/resources/org/apache/hudi/index/hbase/TestHBaseIndex.properties ## @@ -0,0 +1,38 @@ +# +# Licensed to the Apache

[GitHub] [hudi] zuyanton edited a comment on issue #1847: [SUPPORT] querying MoR tables on S3 becomes slow with number of files growing

2020-07-20 Thread GitBox
zuyanton edited a comment on issue #1847: URL: https://github.com/apache/hudi/issues/1847#issuecomment-661287128 @bvaradar , logging ```fileStatus.getClass().getName()``` from within ```HoodieBaseFile``` constructor, gives me ```com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem$3```

[GitHub] [hudi] vinothchandar commented on a change in pull request #1756: [HUDI-839] Introducing support for rollbacks using marker files

2020-07-20 Thread GitBox
vinothchandar commented on a change in pull request #1756: URL: https://github.com/apache/hudi/pull/1756#discussion_r457651663 ## File path: hudi-client/src/main/java/org/apache/hudi/table/MarkerFiles.java ## @@ -0,0 +1,153 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [hudi] zuyanton commented on issue #1847: [SUPPORT] querying MoR tables on S3 becomes slow with number of files growing

2020-07-20 Thread GitBox
zuyanton commented on issue #1847: URL: https://github.com/apache/hudi/issues/1847#issuecomment-661287128 @bvaradar , logging ```fileStatus.getClass().getName()``` from within ```HoodieBaseFile``` constructor, gives me ```com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem```

[hudi] branch hudi_test_suite_refactor updated (82f06f3 -> 5cdfbe0)

2020-07-20 Thread nagarwal
This is an automated email from the ASF dual-hosted git repository. nagarwal pushed a change to branch hudi_test_suite_refactor in repository https://gitbox.apache.org/repos/asf/hudi.git. discard 82f06f3 [HUDI-394] Provide a basic implementation of test suite add 5cdfbe0 [HUDI-394]

[jira] [Updated] (HUDI-1114) Explore Spark Structure Streaming for Hudi Dataset

2020-07-20 Thread Yanjia Gary Li (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanjia Gary Li updated HUDI-1114: - Status: Open (was: New) > Explore Spark Structure Streaming for Hudi Dataset >

[jira] [Created] (HUDI-1114) Explore Spark Structure Streaming for Hudi Dataset

2020-07-20 Thread Yanjia Gary Li (Jira)
Yanjia Gary Li created HUDI-1114: Summary: Explore Spark Structure Streaming for Hudi Dataset Key: HUDI-1114 URL: https://issues.apache.org/jira/browse/HUDI-1114 Project: Apache Hudi Issue

[GitHub] [hudi] garyli1019 commented on issue #1839: Question, Add Support to Hudi datasets to spark structured streaming

2020-07-20 Thread GitBox
garyli1019 commented on issue #1839: URL: https://github.com/apache/hudi/issues/1839#issuecomment-661272780 This is an interesting feature. I created a ticket to track this. https://issues.apache.org/jira/browse/HUDI-1114.

[GitHub] [hudi] rubenssoto commented on issue #1839: Question, Add Support to Hudi datasets to spark structured streaming

2020-07-20 Thread GitBox
rubenssoto commented on issue #1839: URL: https://github.com/apache/hudi/issues/1839#issuecomment-661235040 Hi Vinoth, thank you for your anwser. I will see your video, probably incremental query will help me for now, but we want to use spark structured streaming like a default for

[GitHub] [hudi] vinothchandar commented on issue #1839: Question, Add Support to Hudi datasets to spark structured streaming

2020-07-20 Thread GitBox
vinothchandar commented on issue #1839: URL: https://github.com/apache/hudi/issues/1839#issuecomment-661204597 @rubenssoto yes. we already support incremental queries using the spark datasource. It seems like the only thing missing here is that you want the spark structured streaming

[GitHub] [hudi] asheeshgarg commented on issue #1787: Exception During Insert

2020-07-20 Thread GitBox
asheeshgarg commented on issue #1787: URL: https://github.com/apache/hudi/issues/1787#issuecomment-661196097 @bvaradar I am running hudi-spark-bundle This is an automated message from the Apache Git Service. To respond to

[GitHub] [hudi] asheeshgarg commented on issue #1825: [SUPPORT] Compaction of parquet and meta file

2020-07-20 Thread GitBox
asheeshgarg commented on issue #1825: URL: https://github.com/apache/hudi/issues/1825#issuecomment-661195131 @bvaradar thanks Balaji for your continuous support will test this. This is an automated message from the Apache

[hudi] branch hudi_test_suite_refactor updated (13e3d70 -> 82f06f3)

2020-07-20 Thread nagarwal
This is an automated email from the ASF dual-hosted git repository. nagarwal pushed a change to branch hudi_test_suite_refactor in repository https://gitbox.apache.org/repos/asf/hudi.git. discard 13e3d70 [HUDI-394] Provide a basic implementation of test suite add 82f06f3 [HUDI-394]

[GitHub] [hudi] nsivabalan commented on a change in pull request #1819: [HUDI-1058] Make delete marker configurable

2020-07-20 Thread GitBox
nsivabalan commented on a change in pull request #1819: URL: https://github.com/apache/hudi/pull/1819#discussion_r457479621 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java ## @@ -337,9 +337,15 @@ private void refreshTimeline()

[GitHub] [hudi] nsivabalan commented on a change in pull request #1819: [HUDI-1058] Make delete marker configurable

2020-07-20 Thread GitBox
nsivabalan commented on a change in pull request #1819: URL: https://github.com/apache/hudi/pull/1819#discussion_r457472660 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java ## @@ -337,9 +337,15 @@ private void refreshTimeline()

[GitHub] [hudi] Mathieu1124 commented on a change in pull request #1842: [HUDI-1037]Introduce a write committed callback hook

2020-07-20 Thread GitBox
Mathieu1124 commented on a change in pull request #1842: URL: https://github.com/apache/hudi/pull/1842#discussion_r457470710 ## File path: hudi-client/src/main/java/org/apache/hudi/exception/HoodieCommitCallbackException.java ## @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache

[GitHub] [hudi] Mathieu1124 commented on a change in pull request #1842: [HUDI-1037]Introduce a write committed callback hook

2020-07-20 Thread GitBox
Mathieu1124 commented on a change in pull request #1842: URL: https://github.com/apache/hudi/pull/1842#discussion_r457470368 ## File path: hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteCommitCallbackConfig.java ## @@ -0,0 +1,108 @@ +/* + * Licensed to the Apache

[GitHub] [hudi] Mathieu1124 commented on a change in pull request #1842: [HUDI-1037]Introduce a write committed callback hook

2020-07-20 Thread GitBox
Mathieu1124 commented on a change in pull request #1842: URL: https://github.com/apache/hudi/pull/1842#discussion_r457470536 ## File path: hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java ## @@ -632,6 +632,33 @@ public FileSystemViewStorageConfig

[GitHub] [hudi] Mathieu1124 commented on a change in pull request #1842: [HUDI-1037]Introduce a write committed callback hook

2020-07-20 Thread GitBox
Mathieu1124 commented on a change in pull request #1842: URL: https://github.com/apache/hudi/pull/1842#discussion_r457470209 ## File path: hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java ## @@ -632,6 +632,33 @@ public FileSystemViewStorageConfig

[GitHub] [hudi] tooptoop4 edited a comment on issue #1833: [SUPPORT] 100% update on 10mn keys in single partition slow

2020-07-20 Thread GitBox
tooptoop4 edited a comment on issue #1833: URL: https://github.com/apache/hudi/issues/1833#issuecomment-660715533 @bvaradar i noticed "There is insufficient memory for the Java Runtime Environment to continue." error so i reduced SPARK_WORKER_MEMORY (ie leave more room for OS memory). Now

[GitHub] [hudi] ssomuah opened a new issue #1852: [SUPPORT]

2020-07-20 Thread GitBox
ssomuah opened a new issue #1852: URL: https://github.com/apache/hudi/issues/1852 **Describe the problem you faced** Write performance degrades over time **To Reproduce** Steps to reproduce the behavior: 1.Create an unpartitoned MOR table 2.Use it for a few

[GitHub] [hudi] leesf commented on pull request #1816: [HUDI-859]: Added section for key generation in writing data docs

2020-07-20 Thread GitBox
leesf commented on pull request #1816: URL: https://github.com/apache/hudi/pull/1816#issuecomment-660992766 @pratyakshsharma Thanks for the updates and sorry for late response. For users not using latest master, they still need use `NonpartitionedKeyGenerator`, so I think it is valuable

[GitHub] [hudi] leesf commented on a change in pull request #1842: [HUDI-1037]Introduce a write committed callback hook

2020-07-20 Thread GitBox
leesf commented on a change in pull request #1842: URL: https://github.com/apache/hudi/pull/1842#discussion_r457312361 ## File path: hudi-client/src/main/java/org/apache/hudi/exception/HoodieCommitCallbackException.java ## @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache

[GitHub] [hudi] leesf commented on a change in pull request #1842: [HUDI-1037]Introduce a write committed callback hook

2020-07-20 Thread GitBox
leesf commented on a change in pull request #1842: URL: https://github.com/apache/hudi/pull/1842#discussion_r457311954 ## File path: hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java ## @@ -632,6 +632,33 @@ public FileSystemViewStorageConfig

[GitHub] [hudi] leesf commented on a change in pull request #1842: [HUDI-1037]Introduce a write committed callback hook

2020-07-20 Thread GitBox
leesf commented on a change in pull request #1842: URL: https://github.com/apache/hudi/pull/1842#discussion_r457309449 ## File path: hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteCommitCallbackConfig.java ## @@ -0,0 +1,108 @@ +/* + * Licensed to the Apache

[GitHub] [hudi] leesf commented on a change in pull request #1842: [HUDI-1037]Introduce a write committed callback hook

2020-07-20 Thread GitBox
leesf commented on a change in pull request #1842: URL: https://github.com/apache/hudi/pull/1842#discussion_r457309211 ## File path: hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java ## @@ -632,6 +632,33 @@ public FileSystemViewStorageConfig

[GitHub] [hudi] leesf commented on pull request #1842: [HUDI-1037]Introduce a write committed callback hook

2020-07-20 Thread GitBox
leesf commented on pull request #1842: URL: https://github.com/apache/hudi/pull/1842#issuecomment-660975134 > Hi, @yanghua @leesf, I was wondering maybe we should throw an exception instead of logging a warning when the callback service failed(log waring currently). > Since the

[GitHub] [hudi] leesf commented on pull request #1851: [HUDI-1113] Add user define metrics reporter

2020-07-20 Thread GitBox
leesf commented on pull request #1851: URL: https://github.com/apache/hudi/pull/1851#issuecomment-660974770 @zherenyu831 Thanks for your contributing! would you please check the travis failure? This is an automated message

[GitHub] [hudi] yanghua commented on a change in pull request #1774: [HUDI-703]Add unit test for HoodieSyncCommand

2020-07-20 Thread GitBox
yanghua commented on a change in pull request #1774: URL: https://github.com/apache/hudi/pull/1774#discussion_r457278729 ## File path: hudi-cli/src/main/java/org/apache/hudi/cli/commands/HoodieSyncCommand.java ## @@ -74,9 +74,9 @@ public String validateSync( }

[GitHub] [hudi] codecov-commenter commented on pull request #1770: [HUDI-708]Add temps show and unit test for TempViewCommand

2020-07-20 Thread GitBox
codecov-commenter commented on pull request #1770: URL: https://github.com/apache/hudi/pull/1770#issuecomment-660950311 # [Codecov](https://codecov.io/gh/apache/hudi/pull/1770?src=pr=h1) Report > Merging [#1770](https://codecov.io/gh/apache/hudi/pull/1770?src=pr=desc) into

[jira] [Created] (HUDI-1113) Support user defined metrics reporter

2020-07-20 Thread Zheren Yu (Jira)
Zheren Yu created HUDI-1113: --- Summary: Support user defined metrics reporter Key: HUDI-1113 URL: https://issues.apache.org/jira/browse/HUDI-1113 Project: Apache Hudi Issue Type: New Feature

[GitHub] [hudi] zherenyu831 closed pull request #1851: Add user define metrics reporter

2020-07-20 Thread GitBox
zherenyu831 closed pull request #1851: URL: https://github.com/apache/hudi/pull/1851 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [hudi] zherenyu831 opened a new pull request #1851: Add user define metrics reporter

2020-07-20 Thread GitBox
zherenyu831 opened a new pull request #1851: URL: https://github.com/apache/hudi/pull/1851 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of

[GitHub] [hudi] mabin001 closed pull request #1832: [HUDI-1099]: improve quality of the code calling the method.HiveSyncTool#syncPartitions

2020-07-20 Thread GitBox
mabin001 closed pull request #1832: URL: https://github.com/apache/hudi/pull/1832 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [hudi] xushiyan commented on a change in pull request #1849: [WIP] Externalize test classes' configs

2020-07-20 Thread GitBox
xushiyan commented on a change in pull request #1849: URL: https://github.com/apache/hudi/pull/1849#discussion_r457130487 ## File path: hudi-client/src/main/java/org/apache/hudi/config/HoodieIndexConfig.java ## @@ -274,6 +275,7 @@ public HoodieIndexConfig build() {

[GitHub] [hudi] xushiyan commented on a change in pull request #1849: [WIP] Externalize test classes' configs

2020-07-20 Thread GitBox
xushiyan commented on a change in pull request #1849: URL: https://github.com/apache/hudi/pull/1849#discussion_r457130487 ## File path: hudi-client/src/main/java/org/apache/hudi/config/HoodieIndexConfig.java ## @@ -274,6 +275,7 @@ public HoodieIndexConfig build() {

[GitHub] [hudi] xushiyan commented on a change in pull request #1849: [WIP] Externalize test classes' configs

2020-07-20 Thread GitBox
xushiyan commented on a change in pull request #1849: URL: https://github.com/apache/hudi/pull/1849#discussion_r457130487 ## File path: hudi-client/src/main/java/org/apache/hudi/config/HoodieIndexConfig.java ## @@ -274,6 +275,7 @@ public HoodieIndexConfig build() {

[GitHub] [hudi] xushiyan commented on a change in pull request #1849: [WIP] Externalize test classes' configs

2020-07-20 Thread GitBox
xushiyan commented on a change in pull request #1849: URL: https://github.com/apache/hudi/pull/1849#discussion_r457130487 ## File path: hudi-client/src/main/java/org/apache/hudi/config/HoodieIndexConfig.java ## @@ -274,6 +275,7 @@ public HoodieIndexConfig build() {

[GitHub] [hudi] Mathieu1124 commented on a change in pull request #1827: [HUDI-1089] Refactor hudi-client to support multi-engine

2020-07-20 Thread GitBox
Mathieu1124 commented on a change in pull request #1827: URL: https://github.com/apache/hudi/pull/1827#discussion_r457128988 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/common/HoodieEngineContext.java ## @@ -0,0 +1,48 @@ +/* + * Licensed to the

[GitHub] [hudi] sbernauer commented on issue #1845: [SUPPORT] Support for Schema evolution. Facing an error

2020-07-20 Thread GitBox
sbernauer commented on issue #1845: URL: https://github.com/apache/hudi/issues/1845#issuecomment-660853366 Thanks for your fast reply! The PR adds a new Test and improves 2 existing tests. The mentioned 4 new cols in TestHoodieAvroUtils increase the number of tested cases, the tests

[GitHub] [hudi] Mathieu1124 commented on a change in pull request #1827: [HUDI-1089] Refactor hudi-client to support multi-engine

2020-07-20 Thread GitBox
Mathieu1124 commented on a change in pull request #1827: URL: https://github.com/apache/hudi/pull/1827#discussion_r457127004 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/common/HoodieEngineContext.java ## @@ -0,0 +1,48 @@ +/* + * Licensed to the

[GitHub] [hudi] bvaradar commented on issue #1728: Processing time gradually increases while using spark structured streaming

2020-07-20 Thread GitBox
bvaradar commented on issue #1728: URL: https://github.com/apache/hudi/issues/1728#issuecomment-660841314 (copied the comment from https://github.com/apache/hudi/issues/1830#issuecomment-660840191) We spent time over the weekend setting up a local test bed with kafka and structured

[GitHub] [hudi] bvaradar commented on issue #1830: [SUPPORT] Processing time gradually increases while using Spark Streaming

2020-07-20 Thread GitBox
bvaradar commented on issue #1830: URL: https://github.com/apache/hudi/issues/1830#issuecomment-660840191 We spent time over the weekend setting up a local test bed with kafka and structured streaming to reproduce this behavior. Here are the steps I followed with code :

[GitHub] [hudi] bvaradar commented on issue #1847: [SUPPORT] querying MoR tables on S3 becomes slow with number of files growing

2020-07-20 Thread GitBox
bvaradar commented on issue #1847: URL: https://github.com/apache/hudi/issues/1847#issuecomment-660836081 @zuyanton : Thanks for the detailed write-up. This is very interesting. If you look at the base implementation of FileStatus getLen() method, it returns a cached copy of the length.

[GitHub] [hudi] xushiyan commented on pull request #1850: [HUDI-994] Move TestHoodieIndex test cases to unit tests

2020-07-20 Thread GitBox
xushiyan commented on pull request #1850: URL: https://github.com/apache/hudi/pull/1850#issuecomment-660834381 @yanghua @vinothchandar This is a straightforward clean-up :) This is an automated message from the Apache Git

[jira] [Commented] (HUDI-490) Add DeltaStream API example to hudi-examples

2020-07-20 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17160933#comment-17160933 ] Balaji Varadarajan commented on HUDI-490: - [~RocMarshal]: Thanks for your interest. I have assigned

[jira] [Assigned] (HUDI-490) Add DeltaStream API example to hudi-examples

2020-07-20 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan reassigned HUDI-490: --- Assignee: Roc Marshal > Add DeltaStream API example to hudi-examples >

[jira] [Updated] (HUDI-490) Add DeltaStream API example to hudi-examples

2020-07-20 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-490: Status: Open (was: New) > Add DeltaStream API example to hudi-examples >

[jira] [Commented] (HUDI-490) Add DeltaStream API example to hudi-examples

2020-07-20 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17160929#comment-17160929 ] Balaji Varadarajan commented on HUDI-490: - [~dengziming] : Can you add some description on what

[jira] [Commented] (HUDI-490) Add DeltaStream API example to hudi-examples

2020-07-20 Thread Roc Marshal (Jira)
[ https://issues.apache.org/jira/browse/HUDI-490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17160926#comment-17160926 ] Roc Marshal commented on HUDI-490: -- [~vbalaji] I'm willing to do this. Could you assign this ticket to

[GitHub] [hudi] xushiyan opened a new pull request #1850: [HUDI-994] Move TestHoodieIndex test cases to unit tests

2020-07-20 Thread GitBox
xushiyan opened a new pull request #1850: URL: https://github.com/apache/hudi/pull/1850 Split unit test cases `testCreateIndex()`, `testCreateDummyIndex()` and `testCreateIndexWithException()` to `TestHoodieIndexConfigs`. ## Committer checklist - [ ] Has a corresponding JIRA