[hudi] branch hudi_test_suite_refactor updated (bcac621 -> 29d4b2b)

2020-06-28 Thread nagarwal
This is an automated email from the ASF dual-hosted git repository. nagarwal pushed a change to branch hudi_test_suite_refactor in repository https://gitbox.apache.org/repos/asf/hudi.git. from bcac621 [HUDI-394] Provide a basic implementation of test suite add 29d4b2b Refactored

[GitHub] [hudi] afeldman1 commented on a change in pull request #1761: [MINOR] Add documentation for using multi-column table keys and for n…

2020-06-28 Thread GitBox
afeldman1 commented on a change in pull request #1761: URL: https://github.com/apache/hudi/pull/1761#discussion_r446699556 ## File path: docs/_docs/2_3_querying_data.md ## @@ -136,6 +136,16 @@ The Spark Datasource API is a popular way of authoring Spark ETL pipelines. Hudi

[GitHub] [hudi] afeldman1 commented on a change in pull request #1761: [MINOR] Add documentation for using multi-column table keys and for n…

2020-06-28 Thread GitBox
afeldman1 commented on a change in pull request #1761: URL: https://github.com/apache/hudi/pull/1761#discussion_r446703736 ## File path: docs/_docs/2_3_querying_data.md ## @@ -136,6 +136,16 @@ The Spark Datasource API is a popular way of authoring Spark ETL pipelines. Hudi

[GitHub] [hudi] afeldman1 commented on a change in pull request #1761: [MINOR] Add documentation for using multi-column table keys and for n…

2020-06-28 Thread GitBox
afeldman1 commented on a change in pull request #1761: URL: https://github.com/apache/hudi/pull/1761#discussion_r446703736 ## File path: docs/_docs/2_3_querying_data.md ## @@ -136,6 +136,16 @@ The Spark Datasource API is a popular way of authoring Spark ETL pipelines. Hudi

Build failed in Jenkins: hudi-snapshot-deployment-0.5 #323

2020-06-28 Thread Apache Jenkins Server
See Changes: -- [...truncated 2.32 KB...] /home/jenkins/tools/maven/apache-maven-3.5.4/conf: logging settings.xml toolchains.xml

[GitHub] [hudi] codecov-commenter edited a comment on pull request #1722: [HUDI-69] Support Spark Datasource for MOR table

2020-06-28 Thread GitBox
codecov-commenter edited a comment on pull request #1722: URL: https://github.com/apache/hudi/pull/1722#issuecomment-643095877 # [Codecov](https://codecov.io/gh/apache/hudi/pull/1722?src=pr=h1) Report > Merging [#1722](https://codecov.io/gh/apache/hudi/pull/1722?src=pr=desc) into

[GitHub] [hudi] codecov-commenter edited a comment on pull request #1722: [HUDI-69] Support Spark Datasource for MOR table

2020-06-28 Thread GitBox
codecov-commenter edited a comment on pull request #1722: URL: https://github.com/apache/hudi/pull/1722#issuecomment-643095877 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [hudi] RajasekarSribalan edited a comment on issue #1766: [SUPPORT] Hudi COW - Bulk Insert followed by Upsert via Spark streaming job

2020-06-28 Thread GitBox
RajasekarSribalan edited a comment on issue #1766: URL: https://github.com/apache/hudi/issues/1766#issuecomment-650712662 Thanks for your reply @bhasudha @vinothchandar I tried this setting as well but I get duplicate records when querying hudi tables... ideally it has to pick up

[GitHub] [hudi] codecov-commenter edited a comment on pull request #1722: [HUDI-69] Support Spark Datasource for MOR table

2020-06-28 Thread GitBox
codecov-commenter edited a comment on pull request #1722: URL: https://github.com/apache/hudi/pull/1722#issuecomment-643095877 # [Codecov](https://codecov.io/gh/apache/hudi/pull/1722?src=pr=h1) Report > Merging [#1722](https://codecov.io/gh/apache/hudi/pull/1722?src=pr=desc) into

[jira] [Commented] (HUDI-1057) optional int32 is not a group

2020-06-28 Thread Hong Shen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17147506#comment-17147506 ] Hong Shen commented on HUDI-1057: - Did you use hivemeta? Whether the int32 type is defined in hivemeta, or

[GitHub] [hudi] bhasudha commented on issue #1766: [SUPPORT] Hudi COW - Bulk Insert followed by Upsert via Spark streaming job

2020-06-28 Thread GitBox
bhasudha commented on issue #1766: URL: https://github.com/apache/hudi/issues/1766#issuecomment-650708997 It is strange you are seeing this for Hudi and non Hudi tables. Could you try setting this config when querying Hive ```set

[GitHub] [hudi] RajasekarSribalan commented on issue #1766: [SUPPORT] Hudi COW - Bulk Insert followed by Upsert via Spark streaming job

2020-06-28 Thread GitBox
RajasekarSribalan commented on issue #1766: URL: https://github.com/apache/hudi/issues/1766#issuecomment-650712662 Thanks for your reply. I tried this setting as well but I get duplicate records when querying hudi table... ideally it has to pick up only latest commit but it

[GitHub] [hudi] bvaradar commented on pull request #1512: [HUDI-763] Add hoodie.table.base.file.format option to hoodie.properties file

2020-06-28 Thread GitBox
bvaradar commented on pull request #1512: URL: https://github.com/apache/hudi/pull/1512#issuecomment-650710398 yes @vinothchandar , this overlaps with PR-1687 which got merged. PR-1687 exposes setting base-file format through deltastreamer and spark data-source writer.

[GitHub] [hudi] yanghua commented on pull request #1100: [HUDI-289] Implement a test suite to support long running test for Hudi writing and querying end-end

2020-06-28 Thread GitBox
yanghua commented on pull request #1100: URL: https://github.com/apache/hudi/pull/1100#issuecomment-650702037 > @n3nash @yanghua do you mind me pushing some changes to this and land this? Of course No, please feel free to improve it.

[GitHub] [hudi] vinothchandar commented on a change in pull request #1577: [HUDI-855] Run Auto Cleaner in parallel with ingestion

2020-06-28 Thread GitBox
vinothchandar commented on a change in pull request #1577: URL: https://github.com/apache/hudi/pull/1577#discussion_r446606888 ## File path: hudi-client/src/main/java/org/apache/hudi/async/AbstractAsyncService.java ## @@ -16,7 +16,7 @@ * limitations under the License. */

[hudi] branch master updated: [HUDI-896] Report test coverage by modules & parallelize CI (#1753)

2020-06-28 Thread vinoth
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 31247e9 [HUDI-896] Report test coverage by

[GitHub] [hudi] vinothchandar commented on pull request #1746: [HUDI-996] Add functional test suite for hudi-utilities

2020-06-28 Thread GitBox
vinothchandar commented on pull request #1746: URL: https://github.com/apache/hudi/pull/1746#issuecomment-650703651 I will take another pass at this once you rebase.. merged #1753 This is an automated message from the

[GitHub] [hudi] vinothchandar merged pull request #1753: [HUDI-896] Report test coverage by modules

2020-06-28 Thread GitBox
vinothchandar merged pull request #1753: URL: https://github.com/apache/hudi/pull/1753 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[hudi] branch master updated: [HUDI-855] Run Cleaner async with writing (#1577)

2020-06-28 Thread vinoth
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 8919be6 [HUDI-855] Run Cleaner async with

[GitHub] [hudi] vinothchandar merged pull request #1577: [HUDI-855] Run Auto Cleaner in parallel with ingestion

2020-06-28 Thread GitBox
vinothchandar merged pull request #1577: URL: https://github.com/apache/hudi/pull/1577 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [hudi] vinothchandar commented on pull request #1756: [HUDI-839] Adding unit test for MarkerFiles,RollbackUtils, RollbackActionExecutor for markers and filelisting

2020-06-28 Thread GitBox
vinothchandar commented on pull request #1756: URL: https://github.com/apache/hudi/pull/1756#issuecomment-650728664 >for rollback successful commit, in HoodieWriteClient.java i remove the deleteMarkerDir() in postcommit when is in usingmarkers mode. But it will double the file numbers in

[jira] [Updated] (HUDI-855) Run Auto Cleaner in parallel with ingestion

2020-06-28 Thread leesf (Jira)
[ https://issues.apache.org/jira/browse/HUDI-855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf updated HUDI-855: --- Status: Closed (was: Patch Available) > Run Auto Cleaner in parallel with ingestion >

[GitHub] [hudi] pratyakshsharma commented on pull request #1650: [HUDI-541]: replaced dataFile/df with baseFile/bf throughout code base

2020-06-28 Thread GitBox
pratyakshsharma commented on pull request #1650: URL: https://github.com/apache/hudi/pull/1650#issuecomment-650740750 @bvaradar please take a pass. I have added aliases in avro schema. This is an automated message from the

[GitHub] [hudi] vinothchandar commented on a change in pull request #1768: [HUDI-1054][Peformance] Several performance fixes during finalizing writes

2020-06-28 Thread GitBox
vinothchandar commented on a change in pull request #1768: URL: https://github.com/apache/hudi/pull/1768#discussion_r446621351 ## File path: hudi-common/pom.xml ## @@ -147,6 +147,16 @@ test + + + org.apache.spark +

[jira] [Updated] (HUDI-859) Improve documentation around key generators

2020-06-28 Thread Pratyaksh Sharma (Jira)
[ https://issues.apache.org/jira/browse/HUDI-859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pratyaksh Sharma updated HUDI-859: -- Status: In Progress (was: Open) > Improve documentation around key generators >

[GitHub] [hudi] vinothchandar commented on a change in pull request #1752: [HUDI-575] Support Async Compaction for spark streaming writes to hudi table

2020-06-28 Thread GitBox
vinothchandar commented on a change in pull request #1752: URL: https://github.com/apache/hudi/pull/1752#discussion_r446615608 ## File path: hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala ## @@ -48,7 +49,12 @@ private[hudi] object HoodieSparkSqlWriter {

[GitHub] [hudi] vinothchandar commented on a change in pull request #1752: [HUDI-575] Support Async Compaction for spark streaming writes to hudi table

2020-06-28 Thread GitBox
vinothchandar commented on a change in pull request #1752: URL: https://github.com/apache/hudi/pull/1752#discussion_r446628169 ## File path: hudi-spark/src/test/java/HoodieJavaStreamingApp.java ## @@ -68,7 +74,7 @@ private String tableName = "hoodie_test";

[GitHub] [hudi] codecov-commenter commented on pull request #1752: [HUDI-575] Support Async Compaction for spark streaming writes to hudi table

2020-06-28 Thread GitBox
codecov-commenter commented on pull request #1752: URL: https://github.com/apache/hudi/pull/1752#issuecomment-650732218 # [Codecov](https://codecov.io/gh/apache/hudi/pull/1752?src=pr=h1) Report > Merging [#1752](https://codecov.io/gh/apache/hudi/pull/1752?src=pr=desc) into

[jira] [Created] (HUDI-1058) Make delete marker configurable

2020-06-28 Thread Raymond Xu (Jira)
Raymond Xu created HUDI-1058: Summary: Make delete marker configurable Key: HUDI-1058 URL: https://issues.apache.org/jira/browse/HUDI-1058 Project: Apache Hudi Issue Type: Improvement

[GitHub] [hudi] xushiyan commented on pull request #1746: [HUDI-996] Add functional test suite for hudi-utilities

2020-06-28 Thread GitBox
xushiyan commented on pull request #1746: URL: https://github.com/apache/hudi/pull/1746#issuecomment-650807492 > I will take another pass at this once you rebase.. merged #1753 @vinothchandar ok update the branch This

[GitHub] [hudi] xushiyan commented on pull request #1767: [MINOR] Adding test to WriteClient to validate update partition path with global bloom

2020-06-28 Thread GitBox
xushiyan commented on pull request #1767: URL: https://github.com/apache/hudi/pull/1767#issuecomment-650793135 > @xushiyan Could you please review this PR since you did the related work before. @leesf Yes I can look into this soon.

[GitHub] [hudi] afeldman1 commented on a change in pull request #1761: [MINOR] Add documentation for using multi-column table keys and for n…

2020-06-28 Thread GitBox
afeldman1 commented on a change in pull request #1761: URL: https://github.com/apache/hudi/pull/1761#discussion_r446695942 ## File path: docs/_docs/2_2_writing_data.md ## @@ -176,15 +176,49 @@ In some cases, you may want to migrate your existing table into Hudi beforehand.

[GitHub] [hudi] afeldman1 commented on a change in pull request #1761: [MINOR] Add documentation for using multi-column table keys and for n…

2020-06-28 Thread GitBox
afeldman1 commented on a change in pull request #1761: URL: https://github.com/apache/hudi/pull/1761#discussion_r446696252 ## File path: docs/_docs/2_2_writing_data.md ## @@ -176,15 +176,49 @@ In some cases, you may want to migrate your existing table into Hudi beforehand.