[GitHub] [hudi] bvaradar commented on issue #1910: [SUPPORT] Upsert operation duplicating records in a partition

2020-08-06 Thread GitBox
bvaradar commented on issue #1910: URL: https://github.com/apache/hudi/issues/1910#issuecomment-670341009 @mingujotemp : I just noticed you are using hive 3.x. I have not seen similar issues with Hive 2.x. Can you enable debug logging to see if your spark sql query triggers

[GitHub] [hudi] pratyakshsharma commented on a change in pull request #1900: [HUDI-531]Add java doc for hudi test suite general classes

2020-08-06 Thread GitBox
pratyakshsharma commented on a change in pull request #1900: URL: https://github.com/apache/hudi/pull/1900#discussion_r466834745 ## File path: hudi-integ-test/src/main/java/org/apache/hudi/integ/testsuite/dag/scheduler/DagScheduler.java ## @@ -48,6 +51,11 @@ public

[GitHub] [hudi] pratyakshsharma commented on a change in pull request #1900: [HUDI-531]Add java doc for hudi test suite general classes

2020-08-06 Thread GitBox
pratyakshsharma commented on a change in pull request #1900: URL: https://github.com/apache/hudi/pull/1900#discussion_r466834370 ## File path: hudi-integ-test/src/main/java/org/apache/hudi/integ/testsuite/dag/nodes/DagNode.java ## @@ -76,6 +76,12 @@ public void

[GitHub] [hudi] pratyakshsharma commented on a change in pull request #1900: [HUDI-531]Add java doc for hudi test suite general classes

2020-08-06 Thread GitBox
pratyakshsharma commented on a change in pull request #1900: URL: https://github.com/apache/hudi/pull/1900#discussion_r466834278 ## File path: hudi-integ-test/src/main/java/org/apache/hudi/integ/testsuite/dag/nodes/DagNode.java ## @@ -76,6 +76,12 @@ public void

[GitHub] [hudi] pratyakshsharma commented on a change in pull request #1901: [HUDI-532]Add java doc for hudi test suite test classes

2020-08-06 Thread GitBox
pratyakshsharma commented on a change in pull request #1901: URL: https://github.com/apache/hudi/pull/1901#discussion_r466833643 ## File path: hudi-integ-test/src/test/java/org/apache/hudi/integ/testsuite/TestDFSHoodieTestSuiteWriterAdapter.java ## @@ -52,6 +52,9 @@ import

[GitHub] [hudi] pratyakshsharma commented on a change in pull request #1901: [HUDI-532]Add java doc for hudi test suite test classes

2020-08-06 Thread GitBox
pratyakshsharma commented on a change in pull request #1901: URL: https://github.com/apache/hudi/pull/1901#discussion_r466833116 ## File path: hudi-integ-test/src/test/java/org/apache/hudi/integ/testsuite/utils/TestUtils.java ## @@ -45,6 +48,15 @@ return

[GitHub] [hudi] pratyakshsharma commented on a change in pull request #1901: [HUDI-532]Add java doc for hudi test suite test classes

2020-08-06 Thread GitBox
pratyakshsharma commented on a change in pull request #1901: URL: https://github.com/apache/hudi/pull/1901#discussion_r466832837 ## File path: hudi-integ-test/src/test/java/org/apache/hudi/integ/testsuite/utils/TestUtils.java ## @@ -28,6 +28,9 @@ import

[GitHub] [hudi] pratyakshsharma commented on a change in pull request #1901: [HUDI-532]Add java doc for hudi test suite test classes

2020-08-06 Thread GitBox
pratyakshsharma commented on a change in pull request #1901: URL: https://github.com/apache/hudi/pull/1901#discussion_r466832582 ## File path: hudi-integ-test/src/test/java/org/apache/hudi/integ/testsuite/dag/HiveSyncDagGenerator.java ## @@ -31,6 +31,9 @@ import

[GitHub] [hudi] pratyakshsharma commented on a change in pull request #1901: [HUDI-532]Add java doc for hudi test suite test classes

2020-08-06 Thread GitBox
pratyakshsharma commented on a change in pull request #1901: URL: https://github.com/apache/hudi/pull/1901#discussion_r466832642 ## File path: hudi-integ-test/src/test/java/org/apache/hudi/integ/testsuite/dag/HiveSyncDagGeneratorMOR.java ## @@ -31,6 +31,9 @@ import

[GitHub] [hudi] pratyakshsharma commented on a change in pull request #1901: [HUDI-532]Add java doc for hudi test suite test classes

2020-08-06 Thread GitBox
pratyakshsharma commented on a change in pull request #1901: URL: https://github.com/apache/hudi/pull/1901#discussion_r466832530 ## File path: hudi-integ-test/src/test/java/org/apache/hudi/integ/testsuite/dag/ComplexDagGenerator.java ## @@ -33,6 +33,9 @@ import

[GitHub] [hudi] cheshta2904 commented on a change in pull request #1901: [HUDI-532]Add java doc for hudi test suite test classes

2020-08-06 Thread GitBox
cheshta2904 commented on a change in pull request #1901: URL: https://github.com/apache/hudi/pull/1901#discussion_r466831488 ## File path: hudi-integ-test/src/test/java/org/apache/hudi/integ/ITTestBase.java ## @@ -48,6 +48,9 @@ import static

[GitHub] [hudi] cheshta2904 commented on pull request #1927: [HUDI-1156] Remove unused dependencies from HoodieDeltaStreamerWrapper Class

2020-08-06 Thread GitBox
cheshta2904 commented on pull request #1927: URL: https://github.com/apache/hudi/pull/1927#issuecomment-670335359 Please fix the build. This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [hudi] garyli1019 commented on a change in pull request #1848: [HUDI-69] Support Spark Datasource for MOR table - RDD approach

2020-08-06 Thread GitBox
garyli1019 commented on a change in pull request #1848: URL: https://github.com/apache/hudi/pull/1848#discussion_r466829856 ## File path: hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala ## @@ -132,11 +132,15 @@ class DefaultSource extends RelationProvider

[GitHub] [hudi] vinothchandar commented on a change in pull request #1848: [HUDI-69] Support Spark Datasource for MOR table - RDD approach

2020-08-06 Thread GitBox
vinothchandar commented on a change in pull request #1848: URL: https://github.com/apache/hudi/pull/1848#discussion_r466828072 ## File path: hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala ## @@ -132,11 +132,15 @@ class DefaultSource extends RelationProvider

[jira] [Assigned] (HUDI-1154) Hive Sync Partition Extractor not handling decimal types properly

2020-08-06 Thread linshan-ma (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] linshan-ma reassigned HUDI-1154: Assignee: linshan-ma > Hive Sync Partition Extractor not handling decimal types properly >

Build failed in Jenkins: hudi-snapshot-deployment-0.5 #362

2020-08-06 Thread Apache Jenkins Server
See Changes: -- [...truncated 2.60 KB...] cdi-api-1.0.jar cdi-api.license commons-cli-1.4.jar commons-cli.license commons-io-2.5.jar commons-io.license

[GitHub] [hudi] nsivabalan commented on pull request #1912: [HUDI-1098] Adding TimedWaitOnAppearConsistencyGuard

2020-08-06 Thread GitBox
nsivabalan commented on pull request #1912: URL: https://github.com/apache/hudi/pull/1912#issuecomment-670295868 Synced up with @bvaradar on the diff. Here are some changes/conclusions we narrowed down after our discussion. - We felt exposing TimedWaitOnAppearCG to external users may

[GitHub] [hudi] Mathieu1124 edited a comment on pull request #1886: [HUDI-1122]Introduce a kafka implementation of hoodie write commit ca…

2020-08-06 Thread GitBox
Mathieu1124 edited a comment on pull request #1886: URL: https://github.com/apache/hudi/pull/1886#issuecomment-670294044 @yanghua VC seems busy, do you have any other concern about this pr ? if it is ok, can we merge this first, and file a new pr if VC agree to move this to

[GitHub] [hudi] Mathieu1124 commented on pull request #1886: [HUDI-1122]Introduce a kafka implementation of hoodie write commit ca…

2020-08-06 Thread GitBox
Mathieu1124 commented on pull request #1886: URL: https://github.com/apache/hudi/pull/1886#issuecomment-670294044 @yanghua VC seems busy, do you have any other concern about this pr ? if it is ok, can we merge this first, and file a new pr if VC agree to move this to hudi-client :)

[GitHub] [hudi] linshan-ma opened a new pull request #1927: [HUDI-1156] Remove unused dependencies from HoodieDeltaStreamerWrapper Class

2020-08-06 Thread GitBox
linshan-ma opened a new pull request #1927: URL: https://github.com/apache/hudi/pull/1927 ## *Tips* - *Remove unused dependencies from HoodieDeltaStreamerWrapper Class* ## What is the purpose of the pull request *(For example: This pull request adds quick-start document.)*

[GitHub] [hudi] linshan-ma closed pull request #1926: [HUDI-1156] Remove unused dependencies from HoodieDeltaStreamerWrapper Class

2020-08-06 Thread GitBox
linshan-ma closed pull request #1926: URL: https://github.com/apache/hudi/pull/1926 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[jira] [Updated] (HUDI-1156) Remove unused dependencies from HoodieDeltaStreamerWrapper Class

2020-08-06 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1156: - Labels: pull-request-available (was: ) > Remove unused dependencies from

[GitHub] [hudi] linshan-ma opened a new pull request #1926: [HUDI-1156] Remove unused dependencies from HoodieDeltaStreamerWrapper Class

2020-08-06 Thread GitBox
linshan-ma opened a new pull request #1926: URL: https://github.com/apache/hudi/pull/1926 ## *Tips* - *Remove unused dependencies from HoodieDeltaStreamerWrapper Class.* ## What is the purpose of the pull request *(For example: This pull request adds quick-start

[jira] [Assigned] (HUDI-1156) Remove unused dependencies from HoodieDeltaStreamerWrapper Class

2020-08-06 Thread linshan-ma (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] linshan-ma reassigned HUDI-1156: Assignee: linshan-ma > Remove unused dependencies from HoodieDeltaStreamerWrapper Class >

[GitHub] [hudi] umehrot2 commented on a change in pull request #1924: [HUDI-999][Performance] Parallelize fetching of bootstrap source data files/partitions

2020-08-06 Thread GitBox
umehrot2 commented on a change in pull request #1924: URL: https://github.com/apache/hudi/pull/1924#discussion_r466779443 ## File path: hudi-client/src/main/java/org/apache/hudi/table/action/bootstrap/BootstrapUtils.java ## @@ -41,37 +48,87 @@ * Returns leaf folders with

[GitHub] [hudi] umehrot2 commented on a change in pull request #1924: [HUDI-999][Performance] Parallelize fetching of bootstrap source data files/partitions

2020-08-06 Thread GitBox
umehrot2 commented on a change in pull request #1924: URL: https://github.com/apache/hudi/pull/1924#discussion_r466779443 ## File path: hudi-client/src/main/java/org/apache/hudi/table/action/bootstrap/BootstrapUtils.java ## @@ -41,37 +48,87 @@ * Returns leaf folders with

[jira] [Assigned] (HUDI-1158) Optimizations in parallelized listing behaviour for markers and bootstrap source files

2020-08-06 Thread Udit Mehrotra (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udit Mehrotra reassigned HUDI-1158: --- Assignee: Udit Mehrotra > Optimizations in parallelized listing behaviour for markers and

[jira] [Assigned] (HUDI-1158) Optimizations in parallelized listing behaviour for markers and bootstrap source files

2020-08-06 Thread Udit Mehrotra (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udit Mehrotra reassigned HUDI-1158: --- Assignee: (was: Udit Mehrotra) > Optimizations in parallelized listing behaviour for

[jira] [Created] (HUDI-1158) Optimizations in parallelized listing behaviour for markers and bootstrap source files

2020-08-06 Thread Udit Mehrotra (Jira)
Udit Mehrotra created HUDI-1158: --- Summary: Optimizations in parallelized listing behaviour for markers and bootstrap source files Key: HUDI-1158 URL: https://issues.apache.org/jira/browse/HUDI-1158

[GitHub] [hudi] mingujotemp commented on issue #1910: [SUPPORT] Upsert operation duplicating records in a partition

2020-08-06 Thread GitBox
mingujotemp commented on issue #1910: URL: https://github.com/apache/hudi/issues/1910#issuecomment-670277613 yup that has been set for sure This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [hudi] umehrot2 commented on a change in pull request #1924: [HUDI-999][Performance] Parallelize fetching of bootstrap source data files/partitions

2020-08-06 Thread GitBox
umehrot2 commented on a change in pull request #1924: URL: https://github.com/apache/hudi/pull/1924#discussion_r466774810 ## File path: hudi-client/src/main/java/org/apache/hudi/table/action/bootstrap/BootstrapUtils.java ## @@ -41,37 +48,87 @@ * Returns leaf folders with

[GitHub] [hudi] bvaradar commented on issue #1925: [SUPPORT] Support for Confluent Cloud SchemaRegistryProvider

2020-08-06 Thread GitBox
bvaradar commented on issue #1925: URL: https://github.com/apache/hudi/issues/1925#issuecomment-670260480 For password based BASIC authentication allows passing username and password like this :http://username:passw...@example.com/; -- this sends the credentials in the standard HTTP

[GitHub] [hudi] zhedoubushishi commented on pull request #1870: [HUDI-808] Support cleaning bootstrap source data

2020-08-06 Thread GitBox
zhedoubushishi commented on pull request #1870: URL: https://github.com/apache/hudi/pull/1870#issuecomment-670256477 > @zhedoubushishi there is one issue here. we are changing what goes into the cleaner plan i.e its writing full path as opposed to just the file names. > > This means

[GitHub] [hudi] umehrot2 commented on a change in pull request #1702: [HUDI-426] Bootstrap datasource integration

2020-08-06 Thread GitBox
umehrot2 commented on a change in pull request #1702: URL: https://github.com/apache/hudi/pull/1702#discussion_r466748892 ## File path: hudi-spark/src/main/scala/org/apache/hudi/HudiBootstrapRDD.scala ## @@ -0,0 +1,131 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [hudi] umehrot2 commented on a change in pull request #1702: [HUDI-426] Bootstrap datasource integration

2020-08-06 Thread GitBox
umehrot2 commented on a change in pull request #1702: URL: https://github.com/apache/hudi/pull/1702#discussion_r466748555 ## File path: hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala ## @@ -56,29 +58,56 @@ class DefaultSource extends RelationProvider val

[jira] [Created] (HUDI-1157) Optimization whether to query Bootstrapped table using HoodieBootstrapRelation vs Sparks Parquet datasource

2020-08-06 Thread Udit Mehrotra (Jira)
Udit Mehrotra created HUDI-1157: --- Summary: Optimization whether to query Bootstrapped table using HoodieBootstrapRelation vs Sparks Parquet datasource Key: HUDI-1157 URL:

[GitHub] [hudi] zhedoubushishi commented on a change in pull request #1870: [HUDI-808] Support cleaning bootstrap source data

2020-08-06 Thread GitBox
zhedoubushishi commented on a change in pull request #1870: URL: https://github.com/apache/hudi/pull/1870#discussion_r466742409 ## File path: hudi-common/src/test/java/org/apache/hudi/common/testutils/HoodieTestUtils.java ## @@ -513,4 +522,41 @@ public static void

[jira] [Assigned] (HUDI-1108) Allow parallel listing of dataset partitions for various actions during write

2020-08-06 Thread Udit Mehrotra (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udit Mehrotra reassigned HUDI-1108: --- Assignee: Ryan Pifer (was: Udit Mehrotra) > Allow parallel listing of dataset partitions

[jira] [Assigned] (HUDI-1108) Allow parallel listing of dataset partitions for various actions during write

2020-08-06 Thread Udit Mehrotra (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udit Mehrotra reassigned HUDI-1108: --- Assignee: Udit Mehrotra > Allow parallel listing of dataset partitions for various actions

[jira] [Assigned] (HUDI-1108) Allow parallel listing of dataset partitions for various actions during write

2020-08-06 Thread Udit Mehrotra (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udit Mehrotra reassigned HUDI-1108: --- Assignee: (was: Udit Mehrotra) > Allow parallel listing of dataset partitions for

[GitHub] [hudi] luffyd commented on issue #1913: [SUPPORT][MOR]Too many open files on IOException and Crash

2020-08-06 Thread GitBox
luffyd commented on issue #1913: URL: https://github.com/apache/hudi/issues/1913#issuecomment-670211630 From Spark ENV tab, parquet version seems to be this

[GitHub] [hudi] jpugliesi opened a new issue #1925: [SUPPORT] Support for Confluent Cloud SchemaRegistryProvider

2020-08-06 Thread GitBox
jpugliesi opened a new issue #1925: URL: https://github.com/apache/hudi/issues/1925 **Describe the problem you faced** [Sharing here as requested on Slack](https://apache-hudi.slack.com/archives/C4D716NPQ/p1596675249254300) I would like to configure a DeltaStreamer

[GitHub] [hudi] zhedoubushishi commented on a change in pull request #1869: [HUDI-427] Implement CLI support for performing bootstrap

2020-08-06 Thread GitBox
zhedoubushishi commented on a change in pull request #1869: URL: https://github.com/apache/hudi/pull/1869#discussion_r466696366 ## File path: hudi-common/src/main/java/org/apache/hudi/common/bootstrap/index/HFileBootstrapIndex.java ## @@ -240,13 +240,21 @@ private

[GitHub] [hudi] vinothchandar commented on a change in pull request #1702: [HUDI-426] Bootstrap datasource integration

2020-08-06 Thread GitBox
vinothchandar commented on a change in pull request #1702: URL: https://github.com/apache/hudi/pull/1702#discussion_r48582 ## File path: hudi-spark/src/main/scala/org/apache/hudi/IncrementalRelation.scala ## @@ -92,36 +102,69 @@ class IncrementalRelation(val sqlContext:

[GitHub] [hudi] vinothchandar commented on a change in pull request #1702: [HUDI-426] Bootstrap datasource integration

2020-08-06 Thread GitBox
vinothchandar commented on a change in pull request #1702: URL: https://github.com/apache/hudi/pull/1702#discussion_r4 ## File path: hudi-spark/src/main/scala/org/apache/hudi/HudiBootstrapRDD.scala ## @@ -0,0 +1,131 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [hudi] vinothchandar commented on a change in pull request #1702: [HUDI-426] Bootstrap datasource integration

2020-08-06 Thread GitBox
vinothchandar commented on a change in pull request #1702: URL: https://github.com/apache/hudi/pull/1702#discussion_r47402 ## File path: hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala ## @@ -56,29 +58,56 @@ class DefaultSource extends RelationProvider

[GitHub] [hudi] zhedoubushishi commented on a change in pull request #1869: [HUDI-427] Implement CLI support for performing bootstrap

2020-08-06 Thread GitBox
zhedoubushishi commented on a change in pull request #1869: URL: https://github.com/apache/hudi/pull/1869#discussion_r466589747 ## File path: hudi-cli/src/main/java/org/apache/hudi/cli/commands/BootstrapCommand.java ## @@ -0,0 +1,184 @@ +/* + * Licensed to the Apache Software

[GitHub] [hudi] zhedoubushishi commented on a change in pull request #1869: [HUDI-427] Implement CLI support for performing bootstrap

2020-08-06 Thread GitBox
zhedoubushishi commented on a change in pull request #1869: URL: https://github.com/apache/hudi/pull/1869#discussion_r466589474 ## File path: hudi-common/src/main/java/org/apache/hudi/common/bootstrap/index/HFileBootstrapIndex.java ## @@ -240,13 +240,21 @@ private

[GitHub] [hudi] garyli1019 commented on pull request #1848: [HUDI-69] Support Spark Datasource for MOR table - RDD approach

2020-08-06 Thread GitBox
garyli1019 commented on pull request #1848: URL: https://github.com/apache/hudi/pull/1848#issuecomment-670078060 > ``` > [ERROR] Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 66.614 s <<< FAILURE! - in org.apache.hudi.functional.TestCOWDataSource > [ERROR]

[GitHub] [hudi] zhedoubushishi commented on a change in pull request #1869: [HUDI-427] Implement CLI support for performing bootstrap

2020-08-06 Thread GitBox
zhedoubushishi commented on a change in pull request #1869: URL: https://github.com/apache/hudi/pull/1869#discussion_r466572098 ## File path: hudi-cli/src/main/java/org/apache/hudi/cli/commands/BootstrapCommand.java ## @@ -0,0 +1,184 @@ +/* + * Licensed to the Apache Software

[GitHub] [hudi] zhedoubushishi commented on a change in pull request #1869: [HUDI-427] Implement CLI support for performing bootstrap

2020-08-06 Thread GitBox
zhedoubushishi commented on a change in pull request #1869: URL: https://github.com/apache/hudi/pull/1869#discussion_r466562083 ## File path: hudi-cli/src/main/java/org/apache/hudi/cli/commands/BootstrapCommand.java ## @@ -0,0 +1,184 @@ +/* + * Licensed to the Apache Software

[GitHub] [hudi] luffyd commented on issue #1913: [SUPPORT][MOR]Too many open files on IOException and Crash

2020-08-06 Thread GitBox
luffyd commented on issue #1913: URL: https://github.com/apache/hudi/issues/1913#issuecomment-670039732 Thanks for the input @bvaradar "Too many open files on IOException" issue also seems to be co-related with having 2G as max file limit. Will confirm the parquet version.

[GitHub] [hudi] bschell closed pull request #1922: [HUDI-1152] Add option to skip syncing Hudi metadata columns

2020-08-06 Thread GitBox
bschell closed pull request #1922: URL: https://github.com/apache/hudi/pull/1922 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [hudi] bschell commented on pull request #1922: [HUDI-1152] Add option to skip syncing Hudi metadata columns

2020-08-06 Thread GitBox
bschell commented on pull request #1922: URL: https://github.com/apache/hudi/pull/1922#issuecomment-670038375 @vinothchandar thanks for the detailed explanation! When I was considering this feature I was only considering the feedback that the extra columns were confusing for end users who

[jira] [Closed] (HUDI-1151) Fix NPE when no new data in kafka using HoodieDeltaStreamer

2020-08-06 Thread vinoyang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang closed HUDI-1151. -- Resolution: Fixed Fixed via master branch: b51646dcc76acc68e97dd6a67cc7557e362b590d > Fix NPE when no new data

[GitHub] [hudi] yanghua merged pull request #1921: [HUDI-1151]Fix NPE when no new data in kafka using HoodieDeltaStreamer

2020-08-06 Thread GitBox
yanghua merged pull request #1921: URL: https://github.com/apache/hudi/pull/1921 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[hudi] branch master updated (51ea27d -> b51646d)

2020-08-06 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository. vinoyang pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git. from 51ea27d [HUDI-875] Abstract hudi-sync-common, and support hudi-hive-sync, hudi-dla-sync (#1810) add b51646d

[GitHub] [hudi] bvaradar commented on issue #1913: [SUPPORT][MOR]Too many open files on IOException and Crash

2020-08-06 Thread GitBox
bvaradar commented on issue #1913: URL: https://github.com/apache/hudi/issues/1913#issuecomment-670015540 @luffyd : I spent some time trying to understand your use-case. To your question : Hudi needs to list partitions in-order to figure out the list of valid files that constitute

[GitHub] [hudi] vinothchandar commented on pull request #1848: [HUDI-69] Support Spark Datasource for MOR table - RDD approach

2020-08-06 Thread GitBox
vinothchandar commented on pull request #1848: URL: https://github.com/apache/hudi/pull/1848#issuecomment-669960354 ``` [ERROR] Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 66.614 s <<< FAILURE! - in org.apache.hudi.functional.TestCOWDataSource [ERROR]

[GitHub] [hudi] nsivabalan commented on a change in pull request #1834: [HUDI-1013] Adding Bulk Insert V2 implementation

2020-08-06 Thread GitBox
nsivabalan commented on a change in pull request #1834: URL: https://github.com/apache/hudi/pull/1834#discussion_r466385801 ## File path: hudi-client/src/main/java/org/apache/hudi/keygen/BuiltinKeyGenerator.java ## @@ -1,88 +0,0 @@ -/* - * Licensed to the Apache Software

[GitHub] [hudi] nsivabalan commented on a change in pull request #1834: [HUDI-1013] Adding Bulk Insert V2 implementation

2020-08-06 Thread GitBox
nsivabalan commented on a change in pull request #1834: URL: https://github.com/apache/hudi/pull/1834#discussion_r466385455 ## File path: hudi-client/src/main/java/org/apache/hudi/io/HoodieRowCreateHandle.java ## @@ -0,0 +1,202 @@ +/* + * Licensed to the Apache Software

[GitHub] [hudi] leesf commented on a change in pull request #1916: [HUDI-1025] Meter RPC calls in HoodieWrapperFileSystem

2020-08-06 Thread GitBox
leesf commented on a change in pull request #1916: URL: https://github.com/apache/hudi/pull/1916#discussion_r466398471 ## File path: hudi-common/src/main/java/org/apache/hudi/common/metrics/Registry.java ## @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [hudi] leesf commented on a change in pull request #1916: [HUDI-1025] Meter RPC calls in HoodieWrapperFileSystem

2020-08-06 Thread GitBox
leesf commented on a change in pull request #1916: URL: https://github.com/apache/hudi/pull/1916#discussion_r466397854 ## File path: hudi-common/src/main/java/org/apache/hudi/common/metrics/Registry.java ## @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [hudi] leesf commented on a change in pull request #1916: [HUDI-1025] Meter RPC calls in HoodieWrapperFileSystem

2020-08-06 Thread GitBox
leesf commented on a change in pull request #1916: URL: https://github.com/apache/hudi/pull/1916#discussion_r466396853 ## File path: hudi-common/src/main/java/org/apache/hudi/common/metrics/Counter.java ## @@ -0,0 +1,42 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [hudi] leesf commented on a change in pull request #1916: [HUDI-1025] Meter RPC calls in HoodieWrapperFileSystem

2020-08-06 Thread GitBox
leesf commented on a change in pull request #1916: URL: https://github.com/apache/hudi/pull/1916#discussion_r466395899 ## File path: hudi-common/src/main/java/org/apache/hudi/common/fs/HoodieWrapperFileSystem.java ## @@ -64,10 +65,15 @@ public static final String

[GitHub] [hudi] Ares-W commented on issue #1913: [SUPPORT][MOR]Too many open files on IOException and Crash

2020-08-06 Thread GitBox
Ares-W commented on issue #1913: URL: https://github.com/apache/hudi/issues/1913#issuecomment-669866341 Maybe https://issues.apache.org/jira/browse/PARQUET-783 cause this exception. This is an automated message from the

[GitHub] [hudi] umehrot2 commented on a change in pull request #1702: [HUDI-426] Bootstrap datasource integration

2020-08-06 Thread GitBox
umehrot2 commented on a change in pull request #1702: URL: https://github.com/apache/hudi/pull/1702#discussion_r466320096 ## File path: hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala ## @@ -54,29 +58,54 @@ class DefaultSource extends RelationProvider val

[GitHub] [hudi] umehrot2 commented on a change in pull request #1702: [HUDI-426] Bootstrap datasource integration

2020-08-06 Thread GitBox
umehrot2 commented on a change in pull request #1702: URL: https://github.com/apache/hudi/pull/1702#discussion_r466317612 ## File path: hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala ## @@ -56,29 +58,56 @@ class DefaultSource extends RelationProvider val

[GitHub] [hudi] s-sanjay commented on issue #1895: HUDI Dataset backed by Hive Metastore fails on Presto with Unknown converted type TIMESTAMP_MICROS

2020-08-06 Thread GitBox
s-sanjay commented on issue #1895: URL: https://github.com/apache/hudi/issues/1895#issuecomment-669845428 Right now presto does not support reading TIMESTAMP_MICROS type. This needs to be fixed from the presto side for which I am working on a fix. ( presto only supports timestamp upto

[jira] [Updated] (HUDI-1151) Fix NPE when no new data in kafka using HoodieDeltaStreamer

2020-08-06 Thread wangxianghu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangxianghu updated HUDI-1151: -- Status: Open (was: New) > Fix NPE when no new data in kafka using HoodieDeltaStreamer >

[jira] [Updated] (HUDI-1078) Fix IllegalArgumentException in Delete data demo of Quick-Start Guide

2020-08-06 Thread wangxianghu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangxianghu updated HUDI-1078: -- Status: Open (was: New) > Fix IllegalArgumentException in Delete data demo of Quick-Start Guide >

[jira] [Resolved] (HUDI-1078) Fix IllegalArgumentException in Delete data demo of Quick-Start Guide

2020-08-06 Thread wangxianghu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangxianghu resolved HUDI-1078. --- Resolution: Fixed > Fix IllegalArgumentException in Delete data demo of Quick-Start Guide >

[jira] [Commented] (HUDI-1078) Fix IllegalArgumentException in Delete data demo of Quick-Start Guide

2020-08-06 Thread wangxianghu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17172212#comment-17172212 ] wangxianghu commented on HUDI-1078: --- done via: 10e457278bee529d14e445012fa61e875e3f77cd > Fix

[GitHub] [hudi] umehrot2 commented on a change in pull request #1702: [HUDI-426] Bootstrap datasource integration

2020-08-06 Thread GitBox
umehrot2 commented on a change in pull request #1702: URL: https://github.com/apache/hudi/pull/1702#discussion_r466311768 ## File path: hudi-spark/src/main/scala/org/apache/hudi/HudiBootstrapRDD.scala ## @@ -0,0 +1,131 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [hudi] umehrot2 commented on a change in pull request #1702: [HUDI-426] Bootstrap datasource integration

2020-08-06 Thread GitBox
umehrot2 commented on a change in pull request #1702: URL: https://github.com/apache/hudi/pull/1702#discussion_r466310309 ## File path: hudi-spark/src/main/scala/org/apache/hudi/IncrementalRelation.scala ## @@ -92,36 +102,69 @@ class IncrementalRelation(val sqlContext:

[GitHub] [hudi] umehrot2 commented on a change in pull request #1702: [HUDI-426] Bootstrap datasource integration

2020-08-06 Thread GitBox
umehrot2 commented on a change in pull request #1702: URL: https://github.com/apache/hudi/pull/1702#discussion_r466310024 ## File path: hudi-spark/src/main/scala/org/apache/hudi/IncrementalRelation.scala ## @@ -92,36 +102,69 @@ class IncrementalRelation(val sqlContext:

[jira] [Created] (HUDI-1156) Remove unused dependencies from HoodieDeltaStreamerWrapper Class

2020-08-06 Thread Cheshta Sharma (Jira)
Cheshta Sharma created HUDI-1156: Summary: Remove unused dependencies from HoodieDeltaStreamerWrapper Class Key: HUDI-1156 URL: https://issues.apache.org/jira/browse/HUDI-1156 Project: Apache Hudi

[GitHub] [hudi] vinothchandar commented on a change in pull request #1858: [HUDI-1014] Adding Upgrade and downgrade infra for smooth transitioning from list based rollback to marker based rollback

2020-08-06 Thread GitBox
vinothchandar commented on a change in pull request #1858: URL: https://github.com/apache/hudi/pull/1858#discussion_r464714682 ## File path: hudi-client/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java ## @@ -186,10 +188,14 @@ public HoodieMetrics

[GitHub] [hudi] vinothchandar commented on a change in pull request #1870: [HUDI-808] Support cleaning bootstrap source data

2020-08-06 Thread GitBox
vinothchandar commented on a change in pull request #1870: URL: https://github.com/apache/hudi/pull/1870#discussion_r466243056 ## File path: hudi-common/src/test/java/org/apache/hudi/common/testutils/HoodieTestUtils.java ## @@ -513,4 +522,41 @@ public static void

[GitHub] [hudi] vinothchandar commented on pull request #1870: [HUDI-808] Support cleaning bootstrap source data

2020-08-06 Thread GitBox
vinothchandar commented on pull request #1870: URL: https://github.com/apache/hudi/pull/1870#issuecomment-669803111 cc @bvaradar as well This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [hudi] umehrot2 commented on a change in pull request #1869: [HUDI-427] Implement CLI support for performing bootstrap

2020-08-06 Thread GitBox
umehrot2 commented on a change in pull request #1869: URL: https://github.com/apache/hudi/pull/1869#discussion_r466248630 ## File path: hudi-common/src/main/java/org/apache/hudi/common/bootstrap/index/HFileBootstrapIndex.java ## @@ -240,13 +240,21 @@ private

[GitHub] [hudi] hddong commented on pull request #1242: [HUDI-544] Archived commits command code cleanup

2020-08-06 Thread GitBox
hddong commented on pull request #1242: URL: https://github.com/apache/hudi/pull/1242#issuecomment-669800409 @n3nash : had rebase this, please have a review when free. This is an automated message from the Apache Git

[GitHub] [hudi] umehrot2 commented on a change in pull request #1869: [HUDI-427] Implement CLI support for performing bootstrap

2020-08-06 Thread GitBox
umehrot2 commented on a change in pull request #1869: URL: https://github.com/apache/hudi/pull/1869#discussion_r466117695 ## File path: hudi-cli/src/main/java/org/apache/hudi/cli/commands/BootstrapCommand.java ## @@ -0,0 +1,184 @@ +/* + * Licensed to the Apache Software

[GitHub] [hudi] vinothchandar commented on a change in pull request #1869: [HUDI-427] Implement CLI support for performing bootstrap

2020-08-06 Thread GitBox
vinothchandar commented on a change in pull request #1869: URL: https://github.com/apache/hudi/pull/1869#discussion_r466240100 ## File path: hudi-common/src/main/java/org/apache/hudi/common/bootstrap/index/HFileBootstrapIndex.java ## @@ -240,13 +240,21 @@ private

[GitHub] [hudi] vinothchandar commented on pull request #1848: [HUDI-69] Support Spark Datasource for MOR table - RDD approach

2020-08-06 Thread GitBox
vinothchandar commented on pull request #1848: URL: https://github.com/apache/hudi/pull/1848#issuecomment-669787801 @garyli1019 I am afraid this has something to do with the changes we for `InMemoryFileIndex` or sth made in the pr . ``` TestBootstrap :

[GitHub] [hudi] vinothchandar commented on a change in pull request #1702: [HUDI-426] Bootstrap datasource integration

2020-08-06 Thread GitBox
vinothchandar commented on a change in pull request #1702: URL: https://github.com/apache/hudi/pull/1702#discussion_r466223251 ## File path: hudi-spark/src/main/scala/org/apache/hudi/DefaultSource.scala ## @@ -56,29 +58,56 @@ class DefaultSource extends RelationProvider

[GitHub] [hudi] vinothchandar commented on a change in pull request #1924: [HUDI-999][Performance] Parallelize fetching of bootstrap source data files/partitions

2020-08-06 Thread GitBox
vinothchandar commented on a change in pull request #1924: URL: https://github.com/apache/hudi/pull/1924#discussion_r466217168 ## File path: hudi-client/src/main/java/org/apache/hudi/table/action/bootstrap/BootstrapUtils.java ## @@ -41,37 +48,87 @@ * Returns leaf folders

[GitHub] [hudi] vinothchandar commented on a change in pull request #1924: [HUDI-999][Performance] Parallelize fetching of bootstrap source data files/partitions

2020-08-06 Thread GitBox
vinothchandar commented on a change in pull request #1924: URL: https://github.com/apache/hudi/pull/1924#discussion_r466216468 ## File path: hudi-client/src/main/java/org/apache/hudi/table/action/bootstrap/BootstrapUtils.java ## @@ -41,37 +48,87 @@ * Returns leaf folders

[jira] [Commented] (HUDI-1126) code implementation to support structured streaming

2020-08-06 Thread linshan-ma (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17172097#comment-17172097 ] linshan-ma commented on HUDI-1126: -- After a few days of thinking, trial and error, I have no idea.My

[jira] [Assigned] (HUDI-1126) code implementation to support structured streaming

2020-08-06 Thread linshan-ma (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] linshan-ma reassigned HUDI-1126: Assignee: linshan-ma > code implementation to support structured streaming >

[jira] [Updated] (HUDI-575) Support Async Compaction for spark streaming writes to hudi table

2020-08-06 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-575: Status: Closed (was: Patch Available) > Support Async Compaction for spark streaming writes

[jira] [Resolved] (HUDI-1144) Speedup spark read queries by caching metaclient in HoodieROPathFilter

2020-08-06 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan resolved HUDI-1144. -- Resolution: Fixed > Speedup spark read queries by caching metaclient in

[jira] [Resolved] (HUDI-1155) support aliyun DLA meta

2020-08-06 Thread liwei (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liwei resolved HUDI-1155. - Resolution: Fixed > support aliyun DLA meta > --- > > Key: HUDI-1155 >

[jira] [Reopened] (HUDI-1155) support aliyun DLA meta

2020-08-06 Thread liwei (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liwei reopened HUDI-1155: - Assignee: liwei > support aliyun DLA meta > --- > > Key: HUDI-1155 >

[jira] [Resolved] (HUDI-841) Abstract common meta sync module support multiple meta service

2020-08-06 Thread liwei (Jira)
[ https://issues.apache.org/jira/browse/HUDI-841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liwei resolved HUDI-841. Resolution: Fixed > Abstract common meta sync module support multiple meta service >

[jira] [Resolved] (HUDI-841) Abstract common meta sync module support multiple meta service

2020-08-06 Thread leesf (Jira)
[ https://issues.apache.org/jira/browse/HUDI-841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf resolved HUDI-841. Resolution: Fixed > Abstract common meta sync module support multiple meta service >

[jira] [Reopened] (HUDI-841) Abstract common meta sync module support multiple meta service

2020-08-06 Thread liwei (Jira)
[ https://issues.apache.org/jira/browse/HUDI-841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liwei reopened HUDI-841: > Abstract common meta sync module support multiple meta service >

[jira] [Closed] (HUDI-841) Abstract common meta sync module support multiple meta service

2020-08-06 Thread leesf (Jira)
[ https://issues.apache.org/jira/browse/HUDI-841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf closed HUDI-841. -- > Abstract common meta sync module support multiple meta service >

[jira] [Reopened] (HUDI-841) Abstract common meta sync module support multiple meta service

2020-08-06 Thread leesf (Jira)
[ https://issues.apache.org/jira/browse/HUDI-841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] leesf reopened HUDI-841: > Abstract common meta sync module support multiple meta service >

[jira] [Updated] (HUDI-841) Abstract common meta sync module support multiple meta service

2020-08-06 Thread liwei (Jira)
[ https://issues.apache.org/jira/browse/HUDI-841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liwei updated HUDI-841: --- Status: Closed (was: Patch Available) > Abstract common meta sync module support multiple meta service >

  1   2   >