[GitHub] [hudi] wangxianghu commented on a change in pull request #1744: [HUDI-1027] Introduce TimestampBasedComplexKeyGenerator to support ti…

2020-06-19 Thread GitBox
wangxianghu commented on a change in pull request #1744: URL: https://github.com/apache/hudi/pull/1744#discussion_r443098129 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/keygen/common/TimestampTypeEnum.java ## @@ -0,0 +1,61 @@ +/* + * Licensed to the

[GitHub] [hudi] wangxianghu commented on a change in pull request #1744: [HUDI-1027] Introduce TimestampBasedComplexKeyGenerator to support ti…

2020-06-19 Thread GitBox
wangxianghu commented on a change in pull request #1744: URL: https://github.com/apache/hudi/pull/1744#discussion_r443098352 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/keygen/TimestampBasedComplexKeyGenerator.java ## @@ -0,0 +1,64 @@ +/* + * Licensed

[GitHub] [hudi] wangxianghu commented on a change in pull request #1744: [HUDI-1027] Introduce TimestampBasedComplexKeyGenerator to support ti…

2020-06-19 Thread GitBox
wangxianghu commented on a change in pull request #1744: URL: https://github.com/apache/hudi/pull/1744#discussion_r443098129 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/keygen/common/TimestampTypeEnum.java ## @@ -0,0 +1,61 @@ +/* + * Licensed to the

[GitHub] [hudi] wangxianghu commented on pull request #1744: [HUDI-1027] Introduce TimestampBasedComplexKeyGenerator to support ti…

2020-06-19 Thread GitBox
wangxianghu commented on pull request #1744: URL: https://github.com/apache/hudi/pull/1744#issuecomment-646933340 hi @yanghua, thanks for your detailed review, I have addressed all your concerns, PTAL when free. This is an

Build failed in Jenkins: hudi-snapshot-deployment-0.5 #314

2020-06-19 Thread Apache Jenkins Server
See Changes: -- [...truncated 2.42 KB...] settings.xml toolchains.xml /home/jenkins/tools/maven/apache-maven-3.5.4/conf/logging: simplelogger.properties

[jira] [Updated] (HUDI-1031) Document how to set job scheduling configs for Async compaction

2020-06-19 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-1031: - Status: Open (was: New) > Document how to set job scheduling configs for Async

[jira] [Created] (HUDI-1031) Document how to set job scheduling configs for Async compaction

2020-06-19 Thread Balaji Varadarajan (Jira)
Balaji Varadarajan created HUDI-1031: Summary: Document how to set job scheduling configs for Async compaction Key: HUDI-1031 URL: https://issues.apache.org/jira/browse/HUDI-1031 Project: Apache

[jira] [Assigned] (HUDI-1031) Document how to set job scheduling configs for Async compaction

2020-06-19 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan reassigned HUDI-1031: Assignee: Balaji Varadarajan > Document how to set job scheduling configs for

[GitHub] [hudi] wangxianghu commented on a change in pull request #1744: [HUDI-1027] Introduce TimestampBasedComplexKeyGenerator to support ti…

2020-06-19 Thread GitBox
wangxianghu commented on a change in pull request #1744: URL: https://github.com/apache/hudi/pull/1744#discussion_r443088900 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/keygen/common/TimestampTypeEnum.java ## @@ -0,0 +1,28 @@ +/* + * Licensed to the

[GitHub] [hudi] wangxianghu commented on a change in pull request #1744: [HUDI-1027] Introduce TimestampBasedComplexKeyGenerator to support ti…

2020-06-19 Thread GitBox
wangxianghu commented on a change in pull request #1744: URL: https://github.com/apache/hudi/pull/1744#discussion_r443088843 ## File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/keygen/TestTimestampBasedKeyGenerator.java ## @@ -63,24 +66,24 @@ private

[GitHub] [hudi] wangxianghu commented on a change in pull request #1744: [HUDI-1027] Introduce TimestampBasedComplexKeyGenerator to support ti…

2020-06-19 Thread GitBox
wangxianghu commented on a change in pull request #1744: URL: https://github.com/apache/hudi/pull/1744#discussion_r443088789 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/keygen/util/TimestampBasedKeyGeneratorHelper.java ## @@ -0,0 +1,139 @@ +/* + *

[GitHub] [hudi] wangxianghu commented on a change in pull request #1744: [HUDI-1027] Introduce TimestampBasedComplexKeyGenerator to support ti…

2020-06-19 Thread GitBox
wangxianghu commented on a change in pull request #1744: URL: https://github.com/apache/hudi/pull/1744#discussion_r443088652 ## File path: hudi-spark/src/main/java/org/apache/hudi/keygen/ComplexKeyGenerator.java ## @@ -49,11 +49,11 @@ public

[jira] [Created] (HUDI-1030) Files that Hudi needs to delete during finalize write step are not present in S3

2020-06-19 Thread Anton (Jira)
Anton created HUDI-1030: --- Summary: Files that Hudi needs to delete during finalize write step are not present in S3 Key: HUDI-1030 URL: https://issues.apache.org/jira/browse/HUDI-1030 Project: Apache Hudi

[GitHub] [hudi] yanghua commented on a change in pull request #1744: [HUDI-1027] Introduce TimestampBasedComplexKeyGenerator to support ti…

2020-06-19 Thread GitBox
yanghua commented on a change in pull request #1744: URL: https://github.com/apache/hudi/pull/1744#discussion_r443084118 ## File path: hudi-spark/src/main/java/org/apache/hudi/keygen/ComplexKeyGenerator.java ## @@ -49,11 +49,11 @@ public

[GitHub] [hudi] codecov-commenter commented on pull request #1746: [HUDI-996] Add functional test suite

2020-06-19 Thread GitBox
codecov-commenter commented on pull request #1746: URL: https://github.com/apache/hudi/pull/1746#issuecomment-646906318 # [Codecov](https://codecov.io/gh/apache/hudi/pull/1746?src=pr=h1) Report > Merging [#1746](https://codecov.io/gh/apache/hudi/pull/1746?src=pr=desc) into

[jira] [Comment Edited] (HUDI-1013) Bulk Insert w/o converting to RDD

2020-06-19 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140870#comment-17140870 ] sivabalan narayanan edited comment on HUDI-1013 at 6/19/20, 10:53 PM: --

[jira] [Updated] (HUDI-1013) Bulk Insert w/o converting to RDD

2020-06-19 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1013: -- Description: Our bulk insert(not just bulk insert, all operations infact) does dataset

[jira] [Comment Edited] (HUDI-1013) Bulk Insert w/o converting to RDD

2020-06-19 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140870#comment-17140870 ] sivabalan narayanan edited comment on HUDI-1013 at 6/19/20, 10:35 PM: --

[jira] [Comment Edited] (HUDI-1013) Bulk Insert w/o converting to RDD

2020-06-19 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140870#comment-17140870 ] sivabalan narayanan edited comment on HUDI-1013 at 6/19/20, 10:35 PM: --

[jira] [Commented] (HUDI-1013) Bulk Insert w/o converting to RDD

2020-06-19 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140870#comment-17140870 ] sivabalan narayanan commented on HUDI-1013: --- [~uditme]: Great. Sure, happy to work with you

[jira] [Updated] (HUDI-1013) Bulk Insert w/o converting to RDD

2020-06-19 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1013: -- Description: Our bulk insert(not just bulk insert, all operations infact) does dataset

[jira] [Updated] (HUDI-1013) Bulk Insert w/o converting to RDD

2020-06-19 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1013: -- Description: Our bulk insert(not just bulk insert, all operations infact) does dataset

[jira] [Updated] (HUDI-1013) Bulk Insert w/o converting to RDD

2020-06-19 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1013: -- Description: Our bulk insert(not just bulk insert, all operations infact) does dataset

[jira] [Updated] (HUDI-1013) Bulk Insert w/o converting to RDD

2020-06-19 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1013: -- Description: Our bulk insert(not just bulk insert, all operations infact) does dataset

[GitHub] [hudi] xushiyan commented on pull request #1746: [HUDI-996] Add SharedResources for functional tests

2020-06-19 Thread GitBox
xushiyan commented on pull request #1746: URL: https://github.com/apache/hudi/pull/1746#issuecomment-646877512 waiting for codecov results, which are expected to be affected. Will investigate solution after that. This is an

[GitHub] [hudi] xushiyan commented on pull request #1732: [HUDI-1004] Support update metrics in HoodieDeltaStreamerMetrics

2020-06-19 Thread GitBox
xushiyan commented on pull request #1732: URL: https://github.com/apache/hudi/pull/1732#issuecomment-646856565 > > > Understand the issue originates from `MetricRegistry` not allowing update value for existing metric. By the class design, it does enforce the immutable nature of it. A

[hudi] branch master updated: [HUDI-1023] Add validation error messages in delta sync (#1710)

2020-06-19 Thread vinoth
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 8a9fdd6 [HUDI-1023] Add validation error

[GitHub] [hudi] vinothchandar merged pull request #1710: [HUDI-1023] Add validation error messages in delta sync

2020-06-19 Thread GitBox
vinothchandar merged pull request #1710: URL: https://github.com/apache/hudi/pull/1710 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [hudi] vinothchandar commented on pull request #1710: [HUDI-1023] Add validation error messages in delta sync

2020-06-19 Thread GitBox
vinothchandar commented on pull request #1710: URL: https://github.com/apache/hudi/pull/1710#issuecomment-646822748 yes sir.. on it.. This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [hudi] xushiyan commented on pull request #1710: [HUDI-1023] Add validation error messages in delta sync

2020-06-19 Thread GitBox
xushiyan commented on pull request #1710: URL: https://github.com/apache/hudi/pull/1710#issuecomment-646821678 @vinothchandar Is this good to merge? :) This is an automated message from the Apache Git Service. To respond to

[GitHub] [hudi] vinothchandar merged pull request #1749: [MINOR] Rename `TestSourceConfig` to `SourceConfigs`

2020-06-19 Thread GitBox
vinothchandar merged pull request #1749: URL: https://github.com/apache/hudi/pull/1749 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[hudi] branch master updated (f3a7017 -> ab724af)

2020-06-19 Thread vinoth
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git. from f3a7017 [HUDI-696] Add unit test for CommitsCommand (#1724) add ab724af [MINOR] Rename `TestSourceConfig` to

[GitHub] [hudi] xushiyan commented on pull request #1749: [MINOR] Rename `TestSourceConfig` to `SourceConfigs`

2020-06-19 Thread GitBox
xushiyan commented on pull request #1749: URL: https://github.com/apache/hudi/pull/1749#issuecomment-646821011 @vinothchandar This may be merged. :) This is an automated message from the Apache Git Service. To respond to the

[GitHub] [hudi] wangxianghu commented on pull request #1744: [HUDI-1027] Introduce TimestampBasedComplexKeyGenerator to support ti…

2020-06-19 Thread GitBox
wangxianghu commented on pull request #1744: URL: https://github.com/apache/hudi/pull/1744#issuecomment-646707364 Hi @yanghua, I have addressed all your concerns, PTAL when you get a chance. This is an automated message from

[GitHub] [hudi] wangxianghu commented on a change in pull request #1744: [HUDI-1027] Introduce TimestampBasedComplexKeyGenerator to support ti…

2020-06-19 Thread GitBox
wangxianghu commented on a change in pull request #1744: URL: https://github.com/apache/hudi/pull/1744#discussion_r442913719 ## File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/keygen/TestTimestampBasedKeyGenerator.java ## @@ -91,8 +94,9 @@ public void

[GitHub] [hudi] wangxianghu commented on a change in pull request #1744: [HUDI-1027] Introduce TimestampBasedComplexKeyGenerator to support ti…

2020-06-19 Thread GitBox
wangxianghu commented on a change in pull request #1744: URL: https://github.com/apache/hudi/pull/1744#discussion_r442913719 ## File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/keygen/TestTimestampBasedKeyGenerator.java ## @@ -91,8 +94,9 @@ public void

[GitHub] [hudi] vinothchandar commented on issue #1737: [SUPPORT]spark streaming create small parquet files

2020-06-19 Thread GitBox
vinothchandar commented on issue #1737: URL: https://github.com/apache/hudi/issues/1737#issuecomment-646704256 @cocopc MOR writes out parquet files for inserts only.. and we do have small file handling for those parquet files as well..

[jira] [Commented] (HUDI-1013) Bulk Insert w/o converting to RDD

2020-06-19 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140641#comment-17140641 ] Vinoth Chandar commented on HUDI-1013: -- [~shivnarayan] can you please summarize everything in the

[GitHub] [hudi] prashanthpdesai edited a comment on issue #1745: Deltastreamer -Global bloom Index resulting Duplicates across partitions for Same record Key

2020-06-19 Thread GitBox
prashanthpdesai edited a comment on issue #1745: URL: https://github.com/apache/hudi/issues/1745#issuecomment-646678314 Hi @vinothchandar @nsivabalan : Thank you for checking , No we didn't write table anytime using Bloom index , we wrote as Global Bloom from Beginning . Please find

[GitHub] [hudi] prashanthpdesai removed a comment on issue #1745: Deltastreamer -Global bloom Index resulting Duplicates across partitions for Same record Key

2020-06-19 Thread GitBox
prashanthpdesai removed a comment on issue #1745: URL: https://github.com/apache/hudi/issues/1745#issuecomment-646679807 ![image](https://user-images.githubusercontent.com/60724849/85145568-5dbc5300-b212-11ea-8608-fff234c2ca5f.png)

[GitHub] [hudi] prashanthpdesai commented on issue #1745: Deltastreamer -Global bloom Index resulting Duplicates across partitions for Same record Key

2020-06-19 Thread GitBox
prashanthpdesai commented on issue #1745: URL: https://github.com/apache/hudi/issues/1745#issuecomment-646679807 ![image](https://user-images.githubusercontent.com/60724849/85145568-5dbc5300-b212-11ea-8608-fff234c2ca5f.png)

[GitHub] [hudi] prashanthpdesai commented on issue #1745: Deltastreamer -Global bloom Index resulting Duplicates across partitions for Same record Key

2020-06-19 Thread GitBox
prashanthpdesai commented on issue #1745: URL: https://github.com/apache/hudi/issues/1745#issuecomment-646678314 Hi @vinothchandar @nsivabalan : Thank you for checking , No we didn't write table anytime using Bloom index , we wrote as Global Bloom from Beginning . Please find the

[GitHub] [hudi] leesf commented on pull request #1732: [HUDI-1004] Support update metrics in HoodieDeltaStreamerMetrics

2020-06-19 Thread GitBox
leesf commented on pull request #1732: URL: https://github.com/apache/hudi/pull/1732#issuecomment-646656281 > > Understand the issue originates from `MetricRegistry` not allowing update value for existing metric. By the class design, it does enforce the immutable nature of it. A clean

[GitHub] [hudi] Raghvendradubey commented on issue #1694: Slow Write into Hudi Dataset(MOR)

2020-06-19 Thread GitBox
Raghvendradubey commented on issue #1694: URL: https://github.com/apache/hudi/issues/1694#issuecomment-646596361 Hey vinoth, 1 - Could you please some shed of light on statement "old behavior for real production use-cases"? 2 - Yes Indexing is dominating, not sure why exactly it

[GitHub] [hudi] nsivabalan commented on issue #1745: Deltastreamer -Global bloom Index resulting Duplicates across partitions for Same record Key

2020-06-19 Thread GitBox
nsivabalan commented on issue #1745: URL: https://github.com/apache/hudi/issues/1745#issuecomment-646581422 yes, this should never be feasible unless you created a table as regular and then switched to global later. We have a unit test for the same exact case when we added the "update

[jira] [Comment Edited] (HUDI-340) Increase Default max events to read from kafka source

2020-06-19 Thread wangxianghu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140333#comment-17140333 ] wangxianghu edited comment on HUDI-340 at 6/19/20, 9:54 AM: Hi [~Pratyaksh], I

[GitHub] [hudi] wangxianghu commented on a change in pull request #1744: [HUDI-1027] Introduce TimestampBasedComplexKeyGenerator to support ti…

2020-06-19 Thread GitBox
wangxianghu commented on a change in pull request #1744: URL: https://github.com/apache/hudi/pull/1744#discussion_r442720534 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/keygen/TimestampBasedComplexKeyGenerator.java ## @@ -0,0 +1,119 @@ +/* + *

[GitHub] [hudi] wangxianghu commented on a change in pull request #1744: [HUDI-1027] Introduce TimestampBasedComplexKeyGenerator to support ti…

2020-06-19 Thread GitBox
wangxianghu commented on a change in pull request #1744: URL: https://github.com/apache/hudi/pull/1744#discussion_r442717930 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/keygen/TimestampBasedComplexKeyGenerator.java ## @@ -0,0 +1,119 @@ +/* + *

[GitHub] [hudi] yanghua commented on a change in pull request #1744: [HUDI-1027] Introduce TimestampBasedComplexKeyGenerator to support ti…

2020-06-19 Thread GitBox
yanghua commented on a change in pull request #1744: URL: https://github.com/apache/hudi/pull/1744#discussion_r442705842 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/keygen/TimestampBasedComplexKeyGenerator.java ## @@ -0,0 +1,119 @@ +/* + * Licensed to

[jira] [Comment Edited] (HUDI-340) Increase Default max events to read from kafka source

2020-06-19 Thread wangxianghu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140333#comment-17140333 ] wangxianghu edited comment on HUDI-340 at 6/19/20, 8:37 AM: Hi [~Pratyaksh], I

[jira] [Comment Edited] (HUDI-340) Increase Default max events to read from kafka source

2020-06-19 Thread wangxianghu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140333#comment-17140333 ] wangxianghu edited comment on HUDI-340 at 6/19/20, 8:32 AM: Hi [~Pratyaksh], I

[jira] [Comment Edited] (HUDI-340) Increase Default max events to read from kafka source

2020-06-19 Thread wangxianghu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140333#comment-17140333 ] wangxianghu edited comment on HUDI-340 at 6/19/20, 8:30 AM: Hi [~Pratyaksh], I

[jira] [Comment Edited] (HUDI-340) Increase Default max events to read from kafka source

2020-06-19 Thread wangxianghu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140333#comment-17140333 ] wangxianghu edited comment on HUDI-340 at 6/19/20, 8:28 AM: Hi [~Pratyaksh], I

[jira] [Commented] (HUDI-340) Increase Default max events to read from kafka source

2020-06-19 Thread wangxianghu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140333#comment-17140333 ] wangxianghu commented on HUDI-340: -- Hi [~Pratyaksh], I got confused here, what's the purpose doing this

[hudi] branch asf-site updated: Travis CI build asf-site

2020-06-19 Thread vinoth
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new e6a631e Travis CI build asf-site e6a631e is

[jira] [Commented] (HUDI-839) Implement rollbacks using marker files instead of relying on commit metadata

2020-06-19 Thread liwei (Jira)
[ https://issues.apache.org/jira/browse/HUDI-839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140244#comment-17140244 ] liwei commented on HUDI-839: [~vinoth] Sorry PR has not submitted yet, work is currently in progress, this

[GitHub] [hudi] lw309637554 commented on pull request #1716: [HUDI-875] Introduce a new pom module named hudi-common-sync

2020-06-19 Thread GitBox
lw309637554 commented on pull request #1716: URL: https://github.com/apache/hudi/pull/1716#issuecomment-646451608 > Sorry for the delay. Was spending time on another large review. > > Can we make those changes in this pr itself? Would love to see end-end functionality working. IIRC