[GitHub] [hudi] bhasudha commented on issue #1675: [SUPPORT] Get all changed records from an incremental query rather than the latest one

2020-06-22 Thread GitBox
bhasudha commented on issue #1675: URL: https://github.com/apache/hudi/issues/1675#issuecomment-647912047 @abhibhat98 were you able to progress on this ? This is an automated message from the Apache Git Service. To respond

[GitHub] [hudi] leesf commented on a change in pull request #1727: [WIP] [Review] refactor hudi-client

2020-06-22 Thread GitBox
leesf commented on a change in pull request #1727: URL: https://github.com/apache/hudi/pull/1727#discussion_r443962715 ## File path: hudi-client/hudi-client-spark/src/main/java/org/apache/hudi/client/HoodieSparkWriteClient.java ## @@ -481,6 +585,11 @@ public void close() {

[GitHub] [hudi] leesf commented on a change in pull request #1727: [WIP] [Review] refactor hudi-client

2020-06-22 Thread GitBox
leesf commented on a change in pull request #1727: URL: https://github.com/apache/hudi/pull/1727#discussion_r443962715 ## File path: hudi-client/hudi-client-spark/src/main/java/org/apache/hudi/client/HoodieSparkWriteClient.java ## @@ -481,6 +585,11 @@ public void close() {

[GitHub] [hudi] leesf commented on a change in pull request #1727: [WIP] [Review] refactor hudi-client

2020-06-22 Thread GitBox
leesf commented on a change in pull request #1727: URL: https://github.com/apache/hudi/pull/1727#discussion_r443962715 ## File path: hudi-client/hudi-client-spark/src/main/java/org/apache/hudi/client/HoodieSparkWriteClient.java ## @@ -481,6 +585,11 @@ public void close() {

[GitHub] [hudi] leesf commented on a change in pull request #1727: [WIP] [Review] refactor hudi-client

2020-06-22 Thread GitBox
leesf commented on a change in pull request #1727: URL: https://github.com/apache/hudi/pull/1727#discussion_r443960252 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieClient.java ## @@ -18,53 +18,55 @@ package

Build failed in Jenkins: hudi-snapshot-deployment-0.5 #317

2020-06-22 Thread Apache Jenkins Server
See Changes: -- [...truncated 2.44 KB...] settings.xml toolchains.xml /home/jenkins/tools/maven/apache-maven-3.5.4/conf/logging: simplelogger.properties

[GitHub] [hudi] somebol opened a new issue #1757: Slow Bulk Insert Performance [SUPPORT]

2020-06-22 Thread GitBox
somebol opened a new issue #1757: URL: https://github.com/apache/hudi/issues/1757 Hi Team, We are trying to load a very large dataset into hudi. The bulk insert job took ~16.5 hours to complete. The job was run with vanilla settings without any optimisations. How can we tune the

[jira] [Updated] (HUDI-396) Provide an documentation to describe how to use test suite

2020-06-22 Thread Trevorzhang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trevorzhang updated HUDI-396: - Status: In Progress (was: Open) > Provide an documentation to describe how to use test suite >

[jira] [Commented] (HUDI-472) Make sortBy() inside bulkInsertInternal() configurable for bulk_insert

2020-06-22 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17142506#comment-17142506 ] Vinoth Chandar commented on HUDI-472: - [~guoyihua] are you still working on this?  [~shivnarayan] do

[jira] [Updated] (HUDI-472) Make sortBy() inside bulkInsertInternal() configurable for bulk_insert

2020-06-22 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-472: Status: In Progress (was: Open) > Make sortBy() inside bulkInsertInternal() configurable for

[jira] [Updated] (HUDI-472) Make sortBy() inside bulkInsertInternal() configurable for bulk_insert

2020-06-22 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-472: Status: Open (was: New) > Make sortBy() inside bulkInsertInternal() configurable for bulk_insert >

[jira] [Commented] (HUDI-1013) Bulk Insert w/o converting to RDD

2020-06-22 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17142505#comment-17142505 ] Vinoth Chandar commented on HUDI-1013: -- Actually, the numbers above were based on an intermediate

[jira] [Updated] (HUDI-1014) Design and Implement upgrade-downgrade infrastrucutre

2020-06-22 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-1014: - Status: Open (was: New) > Design and Implement upgrade-downgrade infrastrucutre >

[jira] [Assigned] (HUDI-1014) Design and Implement upgrade-downgrade infrastrucutre

2020-06-22 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar reassigned HUDI-1014: Assignee: Balaji Varadarajan (was: Vinoth Chandar) > Design and Implement

[jira] [Assigned] (HUDI-855) Run Auto Cleaner in parallel with ingestion

2020-06-22 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar reassigned HUDI-855: --- Assignee: Balaji Varadarajan > Run Auto Cleaner in parallel with ingestion >

[jira] [Updated] (HUDI-1013) Bulk Insert w/o converting to RDD

2020-06-22 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1013: -- Status: In Progress (was: Open) > Bulk Insert w/o converting to RDD >

[jira] [Updated] (HUDI-855) Run Auto Cleaner in parallel with ingestion

2020-06-22 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-855: Status: Patch Available (was: In Progress) > Run Auto Cleaner in parallel with ingestion >

[jira] [Updated] (HUDI-855) Run Auto Cleaner in parallel with ingestion

2020-06-22 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-855: Status: Open (was: New) > Run Auto Cleaner in parallel with ingestion >

[jira] [Updated] (HUDI-855) Run Auto Cleaner in parallel with ingestion

2020-06-22 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-855: Status: In Progress (was: Open) > Run Auto Cleaner in parallel with ingestion >

[jira] [Updated] (HUDI-635) MergeHandle's DiskBasedMap entries can be thinner

2020-06-22 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-635: Status: New (was: Open) > MergeHandle's DiskBasedMap entries can be thinner >

[jira] [Assigned] (HUDI-1013) Bulk Insert w/o converting to RDD

2020-06-22 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar reassigned HUDI-1013: Assignee: sivabalan narayanan > Bulk Insert w/o converting to RDD >

[jira] [Updated] (HUDI-760) Remove Rolling Stat management from Hudi Writer

2020-06-22 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-760: Status: Patch Available (was: In Progress) > Remove Rolling Stat management from Hudi Writer >

[jira] [Updated] (HUDI-760) Remove Rolling Stat management from Hudi Writer

2020-06-22 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-760: Status: In Progress (was: Open) > Remove Rolling Stat management from Hudi Writer >

[jira] [Assigned] (HUDI-760) Remove Rolling Stat management from Hudi Writer

2020-06-22 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar reassigned HUDI-760: --- Assignee: renyi.bao (was: sivabalan narayanan) > Remove Rolling Stat management from Hudi

[jira] [Updated] (HUDI-575) Support Async Compaction for spark streaming writes to hudi table

2020-06-22 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-575: Status: In Progress (was: Open) > Support Async Compaction for spark streaming writes to hudi table

[jira] [Updated] (HUDI-575) Support Async Compaction for spark streaming writes to hudi table

2020-06-22 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-575: Status: Patch Available (was: In Progress) > Support Async Compaction for spark streaming writes to

[jira] [Assigned] (HUDI-635) MergeHandle's DiskBasedMap entries can be thinner

2020-06-22 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan reassigned HUDI-635: Assignee: (was: sivabalan narayanan) > MergeHandle's DiskBasedMap entries can

[GitHub] [hudi] vinothchandar closed pull request #1712: Cherry picking HUDI-988 and HUDI-990 to release-0.5.3

2020-06-22 Thread GitBox
vinothchandar closed pull request #1712: URL: https://github.com/apache/hudi/pull/1712 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [hudi] vinothchandar commented on pull request #1712: Cherry picking HUDI-988 and HUDI-990 to release-0.5.3

2020-06-22 Thread GitBox
vinothchandar commented on pull request #1712: URL: https://github.com/apache/hudi/pull/1712#issuecomment-647825579 IS this still relevant? Closing. Please reopen if needed This is an automated message from the Apache Git

[GitHub] [hudi] vinothchandar commented on pull request #1721: Cache the explodeRecordRDDWithFileComparisons instead of commuting it…

2020-06-22 Thread GitBox
vinothchandar commented on pull request #1721: URL: https://github.com/apache/hudi/pull/1721#issuecomment-647825299 Can you please include the jira number in the pr title This is an automated message from the Apache Git

[jira] [Closed] (HUDI-963) HoodieMergedLogRecordScanner support for HoodieHFileDataBlocks

2020-06-22 Thread Prashant Wason (Jira)
[ https://issues.apache.org/jira/browse/HUDI-963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prashant Wason closed HUDI-963. --- Resolution: Duplicate To be covered as part of HUDI-960. > HoodieMergedLogRecordScanner support for

[jira] [Closed] (HUDI-962) HoodieLogFormatWriter and HoodieLogFileReader support for HoodieHFileDataBlock

2020-06-22 Thread Prashant Wason (Jira)
[ https://issues.apache.org/jira/browse/HUDI-962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prashant Wason closed HUDI-962. --- Resolution: Duplicate To be covered as part of HUDI-960. > HoodieLogFormatWriter and

[jira] [Updated] (HUDI-963) HoodieMergedLogRecordScanner support for HoodieHFileDataBlocks

2020-06-22 Thread Prashant Wason (Jira)
[ https://issues.apache.org/jira/browse/HUDI-963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prashant Wason updated HUDI-963: Status: Open (was: New) > HoodieMergedLogRecordScanner support for HoodieHFileDataBlocks >

[jira] [Closed] (HUDI-961) Implement HoodieHFileDataBlock to allow for logging Hfile to log blocks

2020-06-22 Thread Prashant Wason (Jira)
[ https://issues.apache.org/jira/browse/HUDI-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prashant Wason closed HUDI-961. --- Resolution: Duplicate > Implement HoodieHFileDataBlock to allow for logging Hfile to log blocks >

[jira] [Commented] (HUDI-961) Implement HoodieHFileDataBlock to allow for logging Hfile to log blocks

2020-06-22 Thread Prashant Wason (Jira)
[ https://issues.apache.org/jira/browse/HUDI-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17142481#comment-17142481 ] Prashant Wason commented on HUDI-961: - To be covered as part of HUDI-960. > Implement

[jira] [Updated] (HUDI-960) HFile Support for HUDI

2020-06-22 Thread Prashant Wason (Jira)
[ https://issues.apache.org/jira/browse/HUDI-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prashant Wason updated HUDI-960: Description: This task builds upon HUDI-684 which introduces abstraction to storage reader/writer

[jira] [Commented] (HUDI-1035) Remove unused class KeyLookupResult

2020-06-22 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17142470#comment-17142470 ] Vinoth Chandar commented on HUDI-1035: -- these are pojos used in Spark serialization.. if they are

[GitHub] [hudi] prashantwason commented on pull request #1687: [WIP] [HUDI-684] Introduced abstraction for writing and reading different types of base file formats.

2020-06-22 Thread GitBox
prashantwason commented on pull request #1687: URL: https://github.com/apache/hudi/pull/1687#issuecomment-647808188 @bvaradar @nsivabalan I had reduced the scope of this PR to only the storage abstractions needed to support another base/log file format. The rest of my change will be

[GitHub] [hudi] vinothchandar commented on issue #1751: [SUPPORT] Hudi not working with Spark 3.0.0

2020-06-22 Thread GitBox
vinothchandar commented on issue #1751: URL: https://github.com/apache/hudi/issues/1751#issuecomment-647805153 lets wait for @lyogev to chime in.. I think @n3nash did explicitly test Spark 3 and confirmed it working as of 0.5.1/0.5.2

[jira] [Commented] (HUDI-340) Increase Default max events to read from kafka source

2020-06-22 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17142463#comment-17142463 ] Vinoth Chandar commented on HUDI-340: - yes.. that line is unnecessary.. [~wangxianghu].. we can

[GitHub] [hudi] vinothchandar commented on pull request #1753: [HUDI-896] Report test coverage by modules

2020-06-22 Thread GitBox
vinothchandar commented on pull request #1753: URL: https://github.com/apache/hudi/pull/1753#issuecomment-647803654 I know.. that PR cannot be affecting this.. lets wait for @ramachandranms to chime in :) I am fine with the approach you have taken

[GitHub] [hudi] xushiyan commented on pull request #1753: [HUDI-896] Report test coverage by modules

2020-06-22 Thread GitBox
xushiyan commented on pull request #1753: URL: https://github.com/apache/hudi/pull/1753#issuecomment-647802515 > is that the record size estimation commit, from gary? It seems so, but I don't think that change itself affected the coverage. it's more likely codecov itself reported

[GitHub] [hudi] vinothchandar commented on pull request #1739: [HUDI-760]Remove Rolling Stat management from Hudi Writer

2020-06-22 Thread GitBox
vinothchandar commented on pull request #1739: URL: https://github.com/apache/hudi/pull/1739#issuecomment-647800356 @n3nash can you take a quick second pass here? and we can merge? This is an automated message from the

[GitHub] [hudi] vinothchandar commented on issue #1694: Slow Write into Hudi Dataset(MOR)

2020-06-22 Thread GitBox
vinothchandar commented on issue #1694: URL: https://github.com/apache/hudi/issues/1694#issuecomment-647799639 @Raghvendradubey thanks for the info.. you may also want to understand how much of the existing data changes every minute.. if its 70% updates, I would suggest using MOR as it

[GitHub] [hudi] vinothchandar commented on pull request #1753: [HUDI-896] Report test coverage by modules

2020-06-22 Thread GitBox
vinothchandar commented on pull request #1753: URL: https://github.com/apache/hudi/pull/1753#issuecomment-647798688 is that the record size estimation commit, from gary? This is an automated message from the Apache Git

[GitHub] [hudi] vinothchandar commented on pull request #1433: [HUDI-728]: Implement custom key generator

2020-06-22 Thread GitBox
vinothchandar commented on pull request #1433: URL: https://github.com/apache/hudi/pull/1433#issuecomment-647798545 cc @wangxianghu .. @pratyakshsharma confirmed, he will resume this wiork and take it across finish line

[jira] [Commented] (HUDI-839) Implement rollbacks using marker files instead of relying on commit metadata

2020-06-22 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17142450#comment-17142450 ] Vinoth Chandar commented on HUDI-839: - Will review.. thanks!  > Implement rollbacks using marker files

[GitHub] [hudi] xushiyan edited a comment on pull request #1753: [HUDI-896] Report test coverage by modules

2020-06-22 Thread GitBox
xushiyan edited a comment on pull request #1753: URL: https://github.com/apache/hudi/pull/1753#issuecomment-647792654 > > Merging #1753 into master will increase coverage by 42.55%. > > this does seem problematic? @vinothchandar I think the coverage dropped to ~18% at some

[GitHub] [hudi] xushiyan commented on pull request #1753: [HUDI-896] Report test coverage by modules

2020-06-22 Thread GitBox
xushiyan commented on pull request #1753: URL: https://github.com/apache/hudi/pull/1753#issuecomment-647792654 > > Merging #1753 into master will increase coverage by 42.55%. > > this does seem problematic? @vinothchandar I think the coverage dropped to ~18% at some point in

[GitHub] [hudi] vinothchandar edited a comment on pull request #1753: [HUDI-896] Report test coverage by modules

2020-06-22 Thread GitBox
vinothchandar edited a comment on pull request #1753: URL: https://github.com/apache/hudi/pull/1753#issuecomment-647790674 cc @ramachandranms can you please review this as well? >Merging #1753 into master will increase coverage by 42.55%. this does seem problematic?

[GitHub] [hudi] vinothchandar commented on pull request #1753: [HUDI-896] Report test coverage by modules

2020-06-22 Thread GitBox
vinothchandar commented on pull request #1753: URL: https://github.com/apache/hudi/pull/1753#issuecomment-647790674 cc @ramachandranms can you please review this as well? >Merging #1753 into master will increase coverage by 42.55%. this does seem problematic? @xushiyan I

[GitHub] [hudi] prashantwason commented on a change in pull request #1687: [WIP] [HUDI-684] Introduced abstraction for writing and reading different types of base file formats.

2020-06-22 Thread GitBox
prashantwason commented on a change in pull request #1687: URL: https://github.com/apache/hudi/pull/1687#discussion_r443849600 ## File path: hudi-client/src/main/java/org/apache/hudi/table/action/commit/CommitActionExecutor.java ## @@ -115,7 +115,11 @@ public

[GitHub] [hudi] prashantwason commented on a change in pull request #1687: [WIP] [HUDI-684] Introduced abstraction for writing and reading different types of base file formats.

2020-06-22 Thread GitBox
prashantwason commented on a change in pull request #1687: URL: https://github.com/apache/hudi/pull/1687#discussion_r443848335 ## File path: hudi-client/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java ## @@ -132,7 +133,8 @@ private void init(String fileId, String

[GitHub] [hudi] prashantwason commented on a change in pull request #1687: [WIP] [HUDI-684] Introduced abstraction for writing and reading different types of base file formats.

2020-06-22 Thread GitBox
prashantwason commented on a change in pull request #1687: URL: https://github.com/apache/hudi/pull/1687#discussion_r443845391 ## File path: hudi-client/src/main/java/org/apache/hudi/io/HoodieReadHandle.java ## @@ -56,4 +61,9 @@ protected HoodieBaseFile getLatestDataFile() {

[GitHub] [hudi] prashanthpdesai edited a comment on issue #1745: Deltastreamer -Global bloom Index resulting Duplicates across partitions for Same record Key

2020-06-22 Thread GitBox
prashanthpdesai edited a comment on issue #1745: URL: https://github.com/apache/hudi/issues/1745#issuecomment-647196270 @vinothchandar : Yes Vinoth ,we are using current date as our modified_dt , just pulled the latest record from recent run , which we started on 19th , have 3 partitions

[GitHub] [hudi] bvaradar commented on pull request #1687: [WIP] [HUDI-684] Introduced abstraction for writing and reading different types of base file formats.

2020-06-22 Thread GitBox
bvaradar commented on pull request #1687: URL: https://github.com/apache/hudi/pull/1687#issuecomment-647588704 @prashantwason : As I was mentioning this earlier, can you split them into 2 PRs with first one containing only. 1. HoodieStorageWriter abstraction for writer and

[GitHub] [hudi] prashanthpdesai edited a comment on issue #1745: Deltastreamer -Global bloom Index resulting Duplicates across partitions for Same record Key

2020-06-22 Thread GitBox
prashanthpdesai edited a comment on issue #1745: URL: https://github.com/apache/hudi/issues/1745#issuecomment-647196270 @vinothchandar : Yes Vinoth ,we are using current date as our modified_dt , just pulled the latest record from recent run , which we started on 19th , have 3 partitions

[GitHub] [hudi] bvaradar merged pull request #1690: [HUDI-908] Add some data types to HoodieTestDataGenerator and fix some bugs

2020-06-22 Thread GitBox
bvaradar merged pull request #1690: URL: https://github.com/apache/hudi/pull/1690 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[hudi] branch master updated: [HUDI-908] Add some data types to HoodieTestDataGenerator and fix some some bugs. (#1690)

2020-06-22 Thread vbalaji
This is an automated email from the ASF dual-hosted git repository. vbalaji pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 89e37d5 [HUDI-908] Add some data types to

[GitHub] [hudi] FelixKJose commented on issue #1552: Time taken for upserting hudi table is increasing with increase in number of partitions

2020-06-22 Thread GitBox
FelixKJose commented on issue #1552: URL: https://github.com/apache/hudi/issues/1552#issuecomment-647577732 Did someone try replacing the existing HUDI jars with 0.6.0 version or some custom jars in /usr/lib/hudi/ of EMR? I am trying to use 6.0 EMR, but the HUDI version is 0.5.0 and is

[jira] [Commented] (HUDI-839) Implement rollbacks using marker files instead of relying on commit metadata

2020-06-22 Thread liwei (Jira)
[ https://issues.apache.org/jira/browse/HUDI-839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17142129#comment-17142129 ] liwei commented on HUDI-839: [~vinoth] I have add a PR #1756 , please also give some advice > Implement

[GitHub] [hudi] lw309637554 opened a new pull request #1756: Adding unit test for MarkerFiles,RollbackUtils, RollbackActionExecutor for markers and filelisting

2020-06-22 Thread GitBox
lw309637554 opened a new pull request #1756: URL: https://github.com/apache/hudi/pull/1756 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of

[GitHub] [hudi] lw309637554 closed pull request #1755: Adding unit test for MarkerFiles,RollbackUtils, RollbackActionExecutor for markers and filelisting

2020-06-22 Thread GitBox
lw309637554 closed pull request #1755: URL: https://github.com/apache/hudi/pull/1755 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[jira] [Issue Comment Deleted] (HUDI-839) Implement rollbacks using marker files instead of relying on commit metadata

2020-06-22 Thread liwei (Jira)
[ https://issues.apache.org/jira/browse/HUDI-839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liwei updated HUDI-839: --- Comment: was deleted (was: [~vinoth] have add the PR#1755 ,please also give some advice.) > Implement rollbacks using

[jira] [Commented] (HUDI-839) Implement rollbacks using marker files instead of relying on commit metadata

2020-06-22 Thread liwei (Jira)
[ https://issues.apache.org/jira/browse/HUDI-839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17142107#comment-17142107 ] liwei commented on HUDI-839: [~vinoth] have add the PR#1755 ,please also give some advice. > Implement

[GitHub] [hudi] lw309637554 opened a new pull request #1755: Adding unit test for MarkerFiles,RollbackUtils, RollbackActionExecutor for markers and filelisting

2020-06-22 Thread GitBox
lw309637554 opened a new pull request #1755: URL: https://github.com/apache/hudi/pull/1755 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of

[GitHub] [hudi] wangxianghu commented on pull request #1754: [HUDI-1035]Remove unused class KeyLookupResult

2020-06-22 Thread GitBox
wangxianghu commented on pull request #1754: URL: https://github.com/apache/hudi/pull/1754#issuecomment-647475307 @yanghua could you please take a look when free? :) This is an automated message from the Apache Git Service.

[jira] [Assigned] (HUDI-1038) Adding perf benchmark using jmh to Hudi

2020-06-22 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan reassigned HUDI-1038: - Assignee: sivabalan narayanan > Adding perf benchmark using jmh to Hudi >

[jira] [Created] (HUDI-1038) Adding perf benchmark using jmh to Hudi

2020-06-22 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-1038: - Summary: Adding perf benchmark using jmh to Hudi Key: HUDI-1038 URL: https://issues.apache.org/jira/browse/HUDI-1038 Project: Apache Hudi Issue

[jira] [Updated] (HUDI-1038) Adding perf benchmark using jmh to Hudi

2020-06-22 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1038: -- Fix Version/s: 0.6.0 > Adding perf benchmark using jmh to Hudi >

[jira] [Commented] (HUDI-1033) Remove redundant CLI tests

2020-06-22 Thread hong dongdong (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17141836#comment-17141836 ] hong dongdong commented on HUDI-1033: - [~vbalaji]: Somethings that I had not explained clearly. There

[GitHub] [hudi] bvaradar commented on issue #1747: [SUPPORT] HiveSynctool syncs wrong location

2020-06-22 Thread GitBox
bvaradar commented on issue #1747: URL: https://github.com/apache/hudi/issues/1747#issuecomment-647351218 @bhasudha : Yes, agree we should make it explicit for users to configure this setting. I am wondering if we can have a config to let users explicitly tell if the dataset is

[GitHub] [hudi] bvaradar commented on issue #1679: How to disable Hive JDBC and enable metastore

2020-06-22 Thread GitBox
bvaradar commented on issue #1679: URL: https://github.com/apache/hudi/issues/1679#issuecomment-647347106 @cdmikechen : Long time :) Hudi utilities include following hive jars in shaded form ``` org.apache.hive:hive-common

[jira] [Commented] (HUDI-1033) Remove redundant CLI tests

2020-06-22 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17141754#comment-17141754 ] Balaji Varadarajan commented on HUDI-1033: -- [~hongdongdong] [~yanghua]: I strongly feel that we

[GitHub] [hudi] Raghvendradubey commented on issue #1694: Slow Write into Hudi Dataset(MOR)

2020-06-22 Thread GitBox
Raghvendradubey commented on issue #1694: URL: https://github.com/apache/hudi/issues/1694#issuecomment-647325340 sure. I am trying to achieve near real time data( like Read Optimized View) by updating records over S3. eg - let's say I have records a1 b1 t1 a1, b2, t2