Build failed in Jenkins: hudi-snapshot-deployment-0.5 #350

2020-07-25 Thread Apache Jenkins Server
See Changes: -- [...truncated 2.29 KB...] /home/jenkins/tools/maven/apache-maven-3.5.4/conf: logging settings.xml toolchains.xml

[GitHub] [hudi] rubenssoto commented on issue #1878: [SUPPORT] Spark Structured Streaming To Hudi Sink Datasource taking much longer

2020-07-25 Thread GitBox
rubenssoto commented on issue #1878: URL: https://github.com/apache/hudi/issues/1878#issuecomment-663927931 Hi Again.  When I changed the insert option to upsert the performance got worse. 1 Master Node m5.xlarge(4 vcpu, 16gb Ram) 1 Core Node r5.xlarge(4 vcpu, 32gb ram) 4

[GitHub] [hudi] yanghua commented on pull request #1873: [HUDI-995] Move TestRawTripPayload and HoodieTestDataGenerator to hudi-common

2020-07-25 Thread GitBox
yanghua commented on pull request #1873: URL: https://github.com/apache/hudi/pull/1873#issuecomment-663927747 > @yanghua would you be able to take a pass on this change please? thanks Yes, I'd like to review this PR.

[jira] [Updated] (HUDI-1082) Bug in deciding the upsert/insert buckets

2020-07-25 Thread leesf (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-1082: Fix Version/s: (was: 0.6.1) 0.6.0 > Bug in deciding the upsert/insert buckets >

[jira] [Updated] (HUDI-1082) Bug in deciding the upsert/insert buckets

2020-07-25 Thread leesf (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-1082: Status: Open (was: New) > Bug in deciding the upsert/insert buckets > - >

[jira] [Closed] (HUDI-1082) Bug in deciding the upsert/insert buckets

2020-07-25 Thread leesf (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-1082. --- > Bug in deciding the upsert/insert buckets > - > > Key:

[jira] [Resolved] (HUDI-1082) Bug in deciding the upsert/insert buckets

2020-07-25 Thread leesf (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-1082. - Resolution: Fixed > Bug in deciding the upsert/insert buckets > - > >

[jira] [Closed] (HUDI-985) Introduce rerun ci bot

2020-07-25 Thread leesf (Jira)
[ https://issues.apache.org/jira/browse/HUDI-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-985. -- > Introduce rerun ci bot > -- > > Key: HUDI-985 > URL:

[jira] [Resolved] (HUDI-985) Introduce rerun ci bot

2020-07-25 Thread leesf (Jira)
[ https://issues.apache.org/jira/browse/HUDI-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-985. Fix Version/s: 0.6.0 Resolution: Fixed > Introduce rerun ci bot > -- > >

[jira] [Updated] (HUDI-802) AWSDmsTransformer does not handle insert -> delete of a row in a single batch correctly

2020-07-25 Thread leesf (Jira)
[ https://issues.apache.org/jira/browse/HUDI-802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-802: --- Status: Closed (was: Patch Available) > AWSDmsTransformer does not handle insert -> delete of a row in a single batch

[jira] [Closed] (HUDI-871) Add support for Tencent cloud COS

2020-07-25 Thread leesf (Jira)
[ https://issues.apache.org/jira/browse/HUDI-871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-871. -- > Add support for Tencent cloud COS > - > > Key: HUDI-871 >

[jira] [Resolved] (HUDI-871) Add support for Tencent cloud COS

2020-07-25 Thread leesf (Jira)
[ https://issues.apache.org/jira/browse/HUDI-871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-871. Resolution: Fixed > Add support for Tencent cloud COS > - > > Key:

[jira] [Updated] (HUDI-871) Add support for Tencent cloud COS

2020-07-25 Thread leesf (Jira)
[ https://issues.apache.org/jira/browse/HUDI-871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-871: --- Fix Version/s: 0.6.0 > Add support for Tencent cloud COS > - > > Key:

[jira] [Closed] (HUDI-839) Implement rollbacks using marker files instead of relying on commit metadata

2020-07-25 Thread leesf (Jira)
[ https://issues.apache.org/jira/browse/HUDI-839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-839. -- > Implement rollbacks using marker files instead of relying on commit metadata >

[jira] [Closed] (HUDI-92) Include custom names for spark HUDI spark DAG stages for easier understanding

2020-07-25 Thread leesf (Jira)
[ https://issues.apache.org/jira/browse/HUDI-92?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-92. - > Include custom names for spark HUDI spark DAG stages for easier understanding >

[jira] [Closed] (HUDI-1102) Separate out Spark and Path detection utilities used in Bootstrap datasource work

2020-07-25 Thread leesf (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-1102. --- > Separate out Spark and Path detection utilities used in Bootstrap datasource > work >

[jira] [Resolved] (HUDI-1102) Separate out Spark and Path detection utilities used in Bootstrap datasource work

2020-07-25 Thread leesf (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-1102. - Fix Version/s: 0.6.0 Resolution: Fixed > Separate out Spark and Path detection utilities used in Bootstrap

[jira] [Updated] (HUDI-1102) Separate out Spark and Path detection utilities used in Bootstrap datasource work

2020-07-25 Thread leesf (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-1102: Status: Open (was: New) > Separate out Spark and Path detection utilities used in Bootstrap datasource > work >

[GitHub] [hudi] shenh062326 commented on pull request #1868: [HUDI-1083] Optimization in determining insert bucket location for a given key

2020-07-25 Thread GitBox
shenh062326 commented on pull request #1868: URL: https://github.com/apache/hudi/pull/1868#issuecomment-663918069 @nsivabalan @vinothchandar Can you take a look at this pull request. This is an automated message from the

[GitHub] [hudi] shenh062326 edited a comment on pull request #1868: [HUDI-1083] Optimization in determining insert bucket location for a given key

2020-07-25 Thread GitBox
shenh062326 edited a comment on pull request #1868: URL: https://github.com/apache/hudi/pull/1868#issuecomment-663917822 Add a performance test, which insert 10 records, 1000 fileGroups, each fileGroup's weight is 0.001. ``` public void partitionWeightPerformance() throws

[GitHub] [hudi] shenh062326 commented on pull request #1868: [HUDI-1083] Minor optimization in determining insert bucket location for a given key

2020-07-25 Thread GitBox
shenh062326 commented on pull request #1868: URL: https://github.com/apache/hudi/pull/1868#issuecomment-663917822 Add a performance test, which insert 10 records, 1000 fileGroups, each fileGroup's weight is 0.001. ``` public void partitionWeightPerformance() throws Exception {

[GitHub] [hudi] shenh062326 commented on pull request #1819: [HUDI-1058] Make delete marker configurable

2020-07-25 Thread GitBox
shenh062326 commented on pull request #1819: URL: https://github.com/apache/hudi/pull/1819#issuecomment-663915325 > I don't see any tests being added as part of the patch. Would be nice to have some tests covering the new code that was added at all levels. > > * WriteClient > *

[GitHub] [hudi] xushiyan commented on pull request #1873: [HUDI-995] Move TestRawTripPayload and HoodieTestDataGenerator to hudi-common

2020-07-25 Thread GitBox
xushiyan commented on pull request #1873: URL: https://github.com/apache/hudi/pull/1873#issuecomment-663911960 @yanghua would you be able to take a pass on this change please? thanks This is an automated message from the

[GitHub] [hudi] xushiyan commented on a change in pull request #1873: [HUDI-995] Move TestRawTripPayload and HoodieTestDataGenerator to hudi-common

2020-07-25 Thread GitBox
xushiyan commented on a change in pull request #1873: URL: https://github.com/apache/hudi/pull/1873#discussion_r460451609 ## File path: hudi-utilities/src/test/java/org/apache/hudi/common/fs/inline/TestParquetInLining.java ## @@ -48,8 +47,7 @@ import static

[GitHub] [hudi] rubenssoto commented on issue #1878: [SUPPORT] Spark Structured Streaming To Hudi Sink Datasource taking much longer

2020-07-25 Thread GitBox
rubenssoto commented on issue #1878: URL: https://github.com/apache/hudi/issues/1878#issuecomment-663906344 Hi bvaradar, thank you for your awnser. I tried to increase spark.yarn.executor.memoryOverhead to 2GB with foreachbatch option inside writestream and it worked. 4 nodes with 4

[GitHub] [hudi] leesf removed a comment on pull request #1869: [HUDI-427] Implement CLI support for performing bootstrap

2020-07-25 Thread GitBox
leesf removed a comment on pull request #1869: URL: https://github.com/apache/hudi/pull/1869#issuecomment-663850664 rerun tests This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [hudi] leesf commented on pull request #1869: [HUDI-427] Implement CLI support for performing bootstrap

2020-07-25 Thread GitBox
leesf commented on pull request #1869: URL: https://github.com/apache/hudi/pull/1869#issuecomment-663850664 rerun tests This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [hudi] leesf commented on pull request #1869: [HUDI-427] Implement CLI support for performing bootstrap

2020-07-25 Thread GitBox
leesf commented on pull request #1869: URL: https://github.com/apache/hudi/pull/1869#issuecomment-663850540 rerun test This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [hudi] bvaradar commented on issue #1852: [SUPPORT]

2020-07-25 Thread GitBox
bvaradar commented on issue #1852: URL: https://github.com/apache/hudi/issues/1852#issuecomment-663816363 @ssomuah : Looking at the commit metadata, it is the case where your updates are spread across a large number of files. For example, in latest commit, 334 files sees updates whereas