[GitHub] [hudi] sathyaprakashg commented on pull request #1664: HUDI-942 Increase default value number of delta commits for inline compaction

2020-06-09 Thread GitBox
sathyaprakashg commented on pull request #1664: URL: https://github.com/apache/hudi/pull/1664#issuecomment-641619354 Thanks @vinothchandar. @bhasudha Please refer here for the issue i am facing https://www.mail-archive.com/dev@hudi.apache.org/msg02967.html Please suggest on how to

[GitHub] [hudi] codecov-commenter edited a comment on pull request #1721: Cache the explodeRecordRDDWithFileComparisons instead of commuting it…

2020-06-09 Thread GitBox
codecov-commenter edited a comment on pull request #1721: URL: https://github.com/apache/hudi/pull/1721#issuecomment-641622744 # [Codecov](https://codecov.io/gh/apache/hudi/pull/1721?src=pr=h1) Report > Merging [#1721](https://codecov.io/gh/apache/hudi/pull/1721?src=pr=desc) into

[jira] [Created] (HUDI-1017) Integration test failure

2020-06-09 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-1017: - Summary: Integration test failure Key: HUDI-1017 URL: https://issues.apache.org/jira/browse/HUDI-1017 Project: Apache Hudi Issue Type: Bug

[GitHub] [hudi] codecov-commenter commented on pull request #1721: Cache the explodeRecordRDDWithFileComparisons instead of commuting it…

2020-06-09 Thread GitBox
codecov-commenter commented on pull request #1721: URL: https://github.com/apache/hudi/pull/1721#issuecomment-641622744 # [Codecov](https://codecov.io/gh/apache/hudi/pull/1721?src=pr=h1) Report > Merging [#1721](https://codecov.io/gh/apache/hudi/pull/1721?src=pr=desc) into

[jira] [Updated] (HUDI-1016) [Minor] Code optimization

2020-06-09 Thread Hong Shen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Shen updated HUDI-1016: Status: Open (was: New) > [Minor] Code optimization > - > > Key:

[jira] [Resolved] (HUDI-1016) [Minor] Code optimization

2020-06-09 Thread Hong Shen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Shen resolved HUDI-1016. - Resolution: Fixed > [Minor] Code optimization > - > > Key:

[GitHub] [hudi] vinothchandar commented on a change in pull request #1687: [WIP] [HUDI-684] Introduced abstraction for writing and reading different types of base file formats.

2020-06-09 Thread GitBox
vinothchandar commented on a change in pull request #1687: URL: https://github.com/apache/hudi/pull/1687#discussion_r437846366 ## File path: hudi-client/src/main/java/org/apache/hudi/io/storage/HoodieStorageWriterFactory.java ## @@ -66,4 +67,21 @@ return new

[jira] [Updated] (HUDI-1018) Handle empty checkpoint better in delta streamer

2020-06-09 Thread Yanjia Gary Li (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanjia Gary Li updated HUDI-1018: - Component/s: DeltaStreamer > Handle empty checkpoint better in delta streamer >

[jira] [Updated] (HUDI-1018) Handle empty checkpoint better in delta streamer

2020-06-09 Thread Yanjia Gary Li (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanjia Gary Li updated HUDI-1018: - Status: Open (was: New) > Handle empty checkpoint better in delta streamer >

[GitHub] [hudi] shenh062326 commented on pull request #1690: [HUDI-908] Add decimals to HoodieTestDataGenerator

2020-06-09 Thread GitBox
shenh062326 commented on pull request #1690: URL: https://github.com/apache/hudi/pull/1690#issuecomment-641671278 > @shenh062326 : It makes sense to cover other data-types in a single PR. Can you also add them to this PR. Also, Can you let us know what the missing data types are ?

[jira] [Updated] (HUDI-806) Implement support for bootstrapping via Spark datasource API

2020-06-09 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-806: Priority: Blocker (was: Major) > Implement support for bootstrapping via Spark datasource

[jira] [Updated] (HUDI-971) Fix HFileBootstrapIndexReader.getIndexedPartitions() returns unclean partition name

2020-06-09 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-971: Fix Version/s: 0.6.0 > Fix HFileBootstrapIndexReader.getIndexedPartitions() returns unclean

[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type

2020-06-09 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-992: Fix Version/s: 0.6.0 > For hive-style partitioned source data, partition columns synced with

[jira] [Updated] (HUDI-806) Implement support for bootstrapping via Spark datasource API

2020-06-09 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-806: Fix Version/s: 0.6.0 > Implement support for bootstrapping via Spark datasource API >

[jira] [Updated] (HUDI-956) Test COW : Presto Realtime Query with metadata bootstrap

2020-06-09 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-956: Fix Version/s: 0.6.0 > Test COW : Presto Realtime Query with metadata bootstrap >

[jira] [Commented] (HUDI-781) Re-design test utilities

2020-06-09 Thread Nishith Agarwal (Jira)
[ https://issues.apache.org/jira/browse/HUDI-781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130049#comment-17130049 ] Nishith Agarwal commented on HUDI-781: -- [~pwason] Can you help with #2 ? Like we talked about, mocks

[jira] [Updated] (HUDI-955) Test MOR : Presto Read Optimized Query with metadata bootstrap

2020-06-09 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-955: Fix Version/s: 0.6.0 > Test MOR : Presto Read Optimized Query with metadata bootstrap >

[jira] [Updated] (HUDI-807) Spark DS Support for incremental queries for bootstrapped tables

2020-06-09 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-807: Fix Version/s: 0.6.0 > Spark DS Support for incremental queries for bootstrapped tables >

[jira] [Updated] (HUDI-619) Investigate and implement mechanism to have hive/presto/sparksql queries avoid stitching and return null values for hoodie columns

2020-06-09 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-619: Fix Version/s: 0.6.0 > Investigate and implement mechanism to have hive/presto/sparksql

[GitHub] [hudi] vinothchandar commented on pull request #1722: [HUDI-69] Support Spark Datasource for MOR table

2020-06-09 Thread GitBox
vinothchandar commented on pull request #1722: URL: https://github.com/apache/hudi/pull/1722#issuecomment-641708414 @umehrot2 take a look as well? This is an automated message from the Apache Git Service. To respond to the

[jira] [Assigned] (HUDI-1010) Fix the memory leak for hudi-client unit tests

2020-06-09 Thread Nishith Agarwal (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishith Agarwal reassigned HUDI-1010: - Assignee: Nishith Agarwal > Fix the memory leak for hudi-client unit tests >

[jira] [Created] (HUDI-1018) Handle empty checkpoint better in delta streamer

2020-06-09 Thread Yanjia Gary Li (Jira)
Yanjia Gary Li created HUDI-1018: Summary: Handle empty checkpoint better in delta streamer Key: HUDI-1018 URL: https://issues.apache.org/jira/browse/HUDI-1018 Project: Apache Hudi Issue

[GitHub] [hudi] garyli1019 commented on a change in pull request #1719: [HUDI-1006]deltastreamer use kafkaSource with offset reset strategy:latest can't consume data

2020-06-09 Thread GitBox
garyli1019 commented on a change in pull request #1719: URL: https://github.com/apache/hudi/pull/1719#discussion_r437841744 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/AvroKafkaSource.java ## @@ -57,10 +57,10 @@ public

[jira] [Updated] (HUDI-971) Fix HFileBootstrapIndexReader.getIndexedPartitions() returns unclean partition name

2020-06-09 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-971: Priority: Blocker (was: Major) > Fix HFileBootstrapIndexReader.getIndexedPartitions()

[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type

2020-06-09 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-992: Priority: Blocker (was: Major) > For hive-style partitioned source data, partition columns

[jira] [Updated] (HUDI-807) Spark DS Support for incremental queries for bootstrapped tables

2020-06-09 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-807: Priority: Blocker (was: Major) > Spark DS Support for incremental queries for bootstrapped

[jira] [Updated] (HUDI-999) Parallelize listing of Source dataset partitions

2020-06-09 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-999: Priority: Blocker (was: Major) > Parallelize listing of Source dataset partitions >

[GitHub] [hudi] bobgalvao opened a new issue #1723: [SUPPORT] - trouble using Apache Hudi with S3.

2020-06-09 Thread GitBox
bobgalvao opened a new issue #1723: URL: https://github.com/apache/hudi/issues/1723 Hi, I'm having a trouble using Apache Hudi with S3. **Steps to reproduce the behavior:** 1. Produce messages to topic Kafka. (2000 records per window on average) 2. Start streaming

Build failed in Jenkins: hudi-snapshot-deployment-0.5 #304

2020-06-09 Thread Apache Jenkins Server
See Changes: -- [...truncated 2.42 KB...] settings.xml toolchains.xml /home/jenkins/tools/maven/apache-maven-3.5.4/conf/logging: simplelogger.properties

[GitHub] [hudi] vinothchandar commented on a change in pull request #1687: [WIP] [HUDI-684] Introduced abstraction for writing and reading different types of base file formats.

2020-06-09 Thread GitBox
vinothchandar commented on a change in pull request #1687: URL: https://github.com/apache/hudi/pull/1687#discussion_r437846366 ## File path: hudi-client/src/main/java/org/apache/hudi/io/storage/HoodieStorageWriterFactory.java ## @@ -66,4 +67,21 @@ return new

[jira] [Resolved] (HUDI-1005) NPE in HoodieWriteClient.clean

2020-06-09 Thread Hong Shen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Shen resolved HUDI-1005. - Resolution: Fixed > NPE in HoodieWriteClient.clean > --- > >

[jira] [Updated] (HUDI-1005) NPE in HoodieWriteClient.clean

2020-06-09 Thread Hong Shen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Shen updated HUDI-1005: Status: Open (was: New) > NPE in HoodieWriteClient.clean > --- > >

[GitHub] [hudi] shenh062326 commented on pull request #1714: [HUDI-1005] fix NPE in HoodieWriteClient.clean

2020-06-09 Thread GitBox
shenh062326 commented on pull request #1714: URL: https://github.com/apache/hudi/pull/1714#issuecomment-641677856 > I was wondering if there was a way to just throw an exception or make it an Option.. merged.. let's punt on this for now When I try to run HoodieDeltaStreamer with

[jira] [Updated] (HUDI-999) Parallelize listing of Source dataset partitions

2020-06-09 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-999: Fix Version/s: 0.6.0 > Parallelize listing of Source dataset partitions >

[jira] [Assigned] (HUDI-994) Identify functional tests that are convertible to unit tests with mocks

2020-06-09 Thread Nishith Agarwal (Jira)
[ https://issues.apache.org/jira/browse/HUDI-994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishith Agarwal reassigned HUDI-994: Assignee: Prashant Wason > Identify functional tests that are convertible to unit tests

[jira] [Updated] (HUDI-954) Test COW : Presto Read Optimized Query with metadata bootstrap

2020-06-09 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-954: Fix Version/s: 0.6.0 > Test COW : Presto Read Optimized Query with metadata bootstrap >

[jira] [Updated] (HUDI-954) Test COW : Presto Read Optimized Query with metadata bootstrap

2020-06-09 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-954: Priority: Blocker (was: Major) > Test COW : Presto Read Optimized Query with metadata

[jira] [Updated] (HUDI-956) Test COW : Presto Realtime Query with metadata bootstrap

2020-06-09 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-956: Priority: Blocker (was: Major) > Test COW : Presto Realtime Query with metadata bootstrap >

[hudi] 01/01: [HUDI-988] Fix More Unit Test Flakiness

2020-06-09 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository. sivabalan pushed a commit to branch release-0.5.3 in repository https://gitbox.apache.org/repos/asf/hudi.git commit 937566ee15c6a873258cb22f0cf78623f3c169fc Author: garyli1019 AuthorDate: Fri Jun 5 17:25:59 2020 -0700

[hudi] branch release-0.5.3 updated (e0c45f6 -> 937566e)

2020-06-09 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository. sivabalan pushed a change to branch release-0.5.3 in repository https://gitbox.apache.org/repos/asf/hudi.git. discard e0c45f6 [HUDI-988] Fix More Unit Test Flakiness new 937566e [HUDI-988] Fix More Unit Test Flakiness

[hudi] branch master updated: [HUDI-822] decouple Hudi related logics from HoodieInputFormat (#1592)

2020-06-09 Thread vinoth
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 37838ce [HUDI-822] decouple Hudi related logics

[jira] [Updated] (HUDI-982) Introduce AbstractHoodieTable for hudi write client

2020-06-09 Thread wangxianghu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangxianghu updated HUDI-982: - Description: could be like this   {code:java} //代码占位符 public abstract class HoodieTable, I, K, O, P>

[jira] [Commented] (HUDI-781) Re-design test utilities

2020-06-09 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129248#comment-17129248 ] Vinoth Chandar commented on HUDI-781: - Sounds good overall.. I would suggest we get a head start in #5

[jira] [Commented] (HUDI-896) Parallelize CI testing to reduce CI wait time

2020-06-09 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129251#comment-17129251 ] Vinoth Chandar commented on HUDI-896: - [~rxu] It's unlikely codecov may actually resolve this issue in

[hudi] branch master updated (22cd824 -> 6318e94)

2020-06-09 Thread leesf
This is an automated email from the ASF dual-hosted git repository. leesf pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git. from 22cd824 HUDI-494 fix incorrect record size estimation add 6318e94 [HUDI-1016] Code optimization in

[jira] [Updated] (HUDI-981) Introduce AbstractHoodieIndex for hudi write client

2020-06-09 Thread wangxianghu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangxianghu updated HUDI-981: - Description: At a high level, HoodieIndex should be irrelevant to engines. So we should abstract it.

[hudi] branch master updated (6318e94 -> 3387b38)

2020-06-09 Thread vinoth
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git. from 6318e94 [HUDI-1016] Code optimization in MergeOnReadRollbackActionExecutor(#1718) add 3387b38 [HUDI-1005] fix

[GitHub] [hudi] shenh062326 opened a new pull request #1718: [HUDI-1016] [Minor] Code optimization

2020-06-09 Thread GitBox
shenh062326 opened a new pull request #1718: URL: https://github.com/apache/hudi/pull/1718 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of

[GitHub] [hudi] vinothchandar commented on pull request #1707: [HUDI-988] fix more unit tests flakiness

2020-06-09 Thread GitBox
vinothchandar commented on pull request #1707: URL: https://github.com/apache/hudi/pull/1707#issuecomment-640542414 >The unit test structure need to be refactored. Right now we initialized and clean up resources for every single test, which is inefficient. @xushiyan @yanghua This

[GitHub] [hudi] satishkotha opened a new pull request #1717: [HUDI-1012] Add unit test for snapshot reads

2020-06-09 Thread GitBox
satishkotha opened a new pull request #1717: URL: https://github.com/apache/hudi/pull/1717 ## What is the purpose of the pull request Adding a test for snapshot reads ## Brief change log For MOR tables, there are tests for incremental reads. But tests are missing

[GitHub] [hudi] nikitap95 commented on issue #1550: Hudi 0.5.2 inability save complex type with nullable = true [SUPPORT]

2020-06-09 Thread GitBox
nikitap95 commented on issue #1550: URL: https://github.com/apache/hudi/issues/1550#issuecomment-640574748 @vinothchandar Thanks for your prompt help. Will wait for the release in that case rather than using the patch. Sure, I'll get myself added to it, would be great to be a part of

[GitHub] [hudi] vinothchandar commented on a change in pull request #1683: Updating release docs for release-0.5.3

2020-06-09 Thread GitBox
vinothchandar commented on a change in pull request #1683: URL: https://github.com/apache/hudi/pull/1683#discussion_r434991598 ## File path: docs/_pages/releases.md ## @@ -3,8 +3,40 @@ title: "Releases" permalink: /releases layout: releases toc: true -last_modified_at:

[GitHub] [hudi] shenh062326 commented on pull request #1690: [HUDI-908] Add decimals to HoodieTestDataGenerator

2020-06-09 Thread GitBox
shenh062326 commented on pull request #1690: URL: https://github.com/apache/hudi/pull/1690#issuecomment-640978958 @bvaradar Should I add all data types to this pr or open another pr. My original idea was that this pr fixes the bug of parsing decimal type, and another pr is added to add

[GitHub] [hudi] vinothchandar commented on pull request #1713: Add HIVE_STYLE_PARTITIONING_OPT_KEY to Config doc

2020-06-09 Thread GitBox
vinothchandar commented on pull request #1713: URL: https://github.com/apache/hudi/pull/1713#issuecomment-640545767 @zhedoubushishi Thanks for doing this. was on my todo list as well.. This is an automated message from the

[jira] [Updated] (HUDI-875) Introduce a new pom module named hudi-common-sync

2020-06-09 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-875: Labels: pull-request-available (was: ) > Introduce a new pom module named hudi-common-sync >

[GitHub] [hudi] codecov-commenter commented on pull request #1717: [HUDI-1012] Add unit test for snapshot reads

2020-06-09 Thread GitBox
codecov-commenter commented on pull request #1717: URL: https://github.com/apache/hudi/pull/1717#issuecomment-640854839 # [Codecov](https://codecov.io/gh/apache/hudi/pull/1717?src=pr=h1) Report > Merging [#1717](https://codecov.io/gh/apache/hudi/pull/1717?src=pr=desc) into

[GitHub] [hudi] leesf merged pull request #1715: [HUDI-1002] Ignore case when setting incremental mode in hive query

2020-06-09 Thread GitBox
leesf merged pull request #1715: URL: https://github.com/apache/hudi/pull/1715 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[jira] [Updated] (HUDI-1012) add test for snapshot reads

2020-06-09 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1012: - Labels: pull-request-available (was: ) > add test for snapshot reads >

[GitHub] [hudi] wangxianghu edited a comment on pull request #1665: [HUDI-910]Introduce HoodieWriteInput for hudi write client

2020-06-09 Thread GitBox
wangxianghu edited a comment on pull request #1665: URL: https://github.com/apache/hudi/pull/1665#issuecomment-641279534 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [hudi] codecov-commenter edited a comment on pull request #1720: [HUDI-1003] Handle partitions correctly for syncing hudi non-parititioned table to hive

2020-06-09 Thread GitBox
codecov-commenter edited a comment on pull request #1720: URL: https://github.com/apache/hudi/pull/1720#issuecomment-641194386 # [Codecov](https://codecov.io/gh/apache/hudi/pull/1720?src=pr=h1) Report > Merging [#1720](https://codecov.io/gh/apache/hudi/pull/1720?src=pr=desc) into

[GitHub] [hudi] leesf commented on a change in pull request #1711: [HUDI-974] fix fields out of order in MOR mode when using Hive

2020-06-09 Thread GitBox
leesf commented on a change in pull request #1711: URL: https://github.com/apache/hudi/pull/1711#discussion_r436636431 ## File path: hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/RealtimeUnmergedRecordReader.java ## @@ -82,7 +82,7 @@ public

[GitHub] [hudi] nsivabalan commented on a change in pull request #1712: Cherry picking HUDI-988 and HUDI-990 to release-0.5.3

2020-06-09 Thread GitBox
nsivabalan commented on a change in pull request #1712: URL: https://github.com/apache/hudi/pull/1712#discussion_r436940846 ## File path: hudi-cli/src/test/java/org/apache/hudi/cli/commands/AbstractShellIntegrationTest.java ## @@ -58,4 +58,13 @@ public void teardown() throws

[GitHub] [hudi] yanghua commented on pull request #1707: [HUDI-988] fix more unit tests flakiness

2020-06-09 Thread GitBox
yanghua commented on pull request #1707: URL: https://github.com/apache/hudi/pull/1707#issuecomment-640572401 > > The unit test structure need to be refactored. Right now we initialized and clean up resources for every single test, which is inefficient. > > @xushiyan @yanghua WDYT?

[GitHub] [hudi] vinothchandar commented on pull request #1714: [HUDI-1005] fix NPE in HoodieWriteClient.clean

2020-06-09 Thread GitBox
vinothchandar commented on pull request #1714: URL: https://github.com/apache/hudi/pull/1714#issuecomment-641273663 I was wondering if there was a way to just throw an exception or make it an Option.. merged.. let's punt on this for now

[GitHub] [hudi] lw309637554 opened a new pull request #1716: [HUDI-875] Introduce a new pom module named hudi-common-sync

2020-06-09 Thread GitBox
lw309637554 opened a new pull request #1716: URL: https://github.com/apache/hudi/pull/1716 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of

[GitHub] [hudi] Raghvendradubey commented on issue #1694: Slow Write into Hudi Dataset(MOR)

2020-06-09 Thread GitBox
Raghvendradubey commented on issue #1694: URL: https://github.com/apache/hudi/issues/1694#issuecomment-640738415 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [hudi] garyli1019 commented on pull request #1707: [HUDI-988] fix more unit tests flakiness

2020-06-09 Thread GitBox
garyli1019 commented on pull request #1707: URL: https://github.com/apache/hudi/pull/1707#issuecomment-640738265 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [hudi] vinothchandar commented on pull request #1715: [HUDI-1002] Ignore case when setting incremental mode in hive query

2020-06-09 Thread GitBox
vinothchandar commented on pull request #1715: URL: https://github.com/apache/hudi/pull/1715#issuecomment-640548021 LGTM! This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [hudi] nsivabalan commented on issue #1550: Hudi 0.5.2 inability save complex type with nullable = true [SUPPORT]

2020-06-09 Thread GitBox
nsivabalan commented on issue #1550: URL: https://github.com/apache/hudi/issues/1550#issuecomment-640609843 yes, I should have a candidate up for voting by today or tomorrow. This is an automated message from the Apache Git

[GitHub] [hudi] vinothchandar edited a comment on pull request #1707: [HUDI-988] fix more unit tests flakiness

2020-06-09 Thread GitBox
vinothchandar edited a comment on pull request #1707: URL: https://github.com/apache/hudi/pull/1707#issuecomment-640542414 >The unit test structure need to be refactored. Right now we initialized and clean up resources for every single test, which is inefficient. @xushiyan @yanghua

[GitHub] [hudi] codecov-commenter commented on pull request #1720: [HUDI-1003] Handle partitions correctly for syncing hudi non-parititioned table to hive

2020-06-09 Thread GitBox
codecov-commenter commented on pull request #1720: URL: https://github.com/apache/hudi/pull/1720#issuecomment-641194386 # [Codecov](https://codecov.io/gh/apache/hudi/pull/1720?src=pr=h1) Report > Merging [#1720](https://codecov.io/gh/apache/hudi/pull/1720?src=pr=desc) into

[GitHub] [hudi] shenh062326 commented on pull request #1714: [HUDI-1005] fix NPE in HoodieWriteClient.clean

2020-06-09 Thread GitBox
shenh062326 commented on pull request #1714: URL: https://github.com/apache/hudi/pull/1714#issuecomment-640974416 @vinothchandar can you take a look at this? This is an automated message from the Apache Git Service. To

[GitHub] [hudi] wangxianghu commented on pull request #1665: [HUDI-910]Introduce HoodieWriteInput for hudi write client

2020-06-09 Thread GitBox
wangxianghu commented on pull request #1665: URL: https://github.com/apache/hudi/pull/1665#issuecomment-641275707 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[jira] [Updated] (HUDI-1016) [Minor] Code optimization

2020-06-09 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1016: - Labels: pull-request-available (was: ) > [Minor] Code optimization > - >

[GitHub] [hudi] codecov-commenter commented on pull request #1718: [HUDI-1016] [Minor] Code optimization

2020-06-09 Thread GitBox
codecov-commenter commented on pull request #1718: URL: https://github.com/apache/hudi/pull/1718#issuecomment-641093847 # [Codecov](https://codecov.io/gh/apache/hudi/pull/1718?src=pr=h1) Report > Merging [#1718](https://codecov.io/gh/apache/hudi/pull/1718?src=pr=desc) into

[GitHub] [hudi] n3nash merged pull request #1638: HUDI-515 Resolve API conflict for Hive 2 & Hive 3

2020-06-09 Thread GitBox
n3nash merged pull request #1638: URL: https://github.com/apache/hudi/pull/1638 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[jira] [Updated] (HUDI-1003) Handle partitions correctly when sync non-partitioned table to hive.

2020-06-09 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1003: - Labels: newbe pull-request-available starter (was: newbe starter) > Handle partitions correctly

[GitHub] [hudi] xushiyan commented on pull request #1707: [HUDI-988] fix more unit tests flakiness

2020-06-09 Thread GitBox
xushiyan commented on pull request #1707: URL: https://github.com/apache/hudi/pull/1707#issuecomment-640766975 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [hudi] vinothchandar commented on issue #1550: Hudi 0.5.2 inability save complex type with nullable = true [SUPPORT]

2020-06-09 Thread GitBox
vinothchandar commented on issue #1550: URL: https://github.com/apache/hudi/issues/1550#issuecomment-640542938 @nsivabalan is driving the release.. We are planning to do a 0.5.3 this week. right siva ? This is an automated

[GitHub] [hudi] vinothchandar merged pull request #1602: [HUDI-494] fix incorrect record size estimation

2020-06-09 Thread GitBox
vinothchandar merged pull request #1602: URL: https://github.com/apache/hudi/pull/1602 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [hudi] leesf merged pull request #1652: [HUDI-918] Fix kafkaOffsetGen can not read kafka data bug

2020-06-09 Thread GitBox
leesf merged pull request #1652: URL: https://github.com/apache/hudi/pull/1652 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] bvaradar commented on pull request #1690: [HUDI-908] Add decimals to HoodieTestDataGenerator

2020-06-09 Thread GitBox
bvaradar commented on pull request #1690: URL: https://github.com/apache/hudi/pull/1690#issuecomment-641301829 @shenh062326 : It makes sense to cover other data-types in a single PR. Can you also add them to this PR. Also, Can you let us know what the missing data types are ?

[GitHub] [hudi] codecov-commenter commented on pull request #1719: [HUDI-1006]deltastreamer use kafkaSource with offset reset strategy:latest can't consume data

2020-06-09 Thread GitBox
codecov-commenter commented on pull request #1719: URL: https://github.com/apache/hudi/pull/1719#issuecomment-641096930 # [Codecov](https://codecov.io/gh/apache/hudi/pull/1719?src=pr=h1) Report > Merging [#1719](https://codecov.io/gh/apache/hudi/pull/1719?src=pr=desc) into

[GitHub] [hudi] sbernauer commented on pull request #1647: [HUDI-867]: fixed IllegalArgumentException from graphite metrics in deltaStreamer continuous mode

2020-06-09 Thread GitBox
sbernauer commented on pull request #1647: URL: https://github.com/apache/hudi/pull/1647#issuecomment-641278957 If i read https://stackoverflow.com/a/55753138 correctly, normally you register an gauge only at startup (or first metric write) and than just update the value in every loop.

[GitHub] [hudi] leesf commented on pull request #1719: [HUDI-1006]deltastreamer use kafkaSource with offset reset strategy:latest can't consume data

2020-06-09 Thread GitBox
leesf commented on pull request #1719: URL: https://github.com/apache/hudi/pull/1719#issuecomment-641222330 @garyli1019 would you please review this one? This is an automated message from the Apache Git Service. To respond

[GitHub] [hudi] vinothchandar edited a comment on issue #1550: Hudi 0.5.2 inability save complex type with nullable = true [SUPPORT]

2020-06-09 Thread GitBox
vinothchandar edited a comment on issue #1550: URL: https://github.com/apache/hudi/issues/1550#issuecomment-640542938 @nsivabalan is driving the release.. We are planning to do a 0.5.3 this week. right siva ? This release will have the fix.. @nikitap95 if interested, you can join the

[GitHub] [hudi] leesf commented on pull request #1652: [HUDI-918] Fix kafkaOffsetGen can not read kafka data bug

2020-06-09 Thread GitBox
leesf commented on pull request #1652: URL: https://github.com/apache/hudi/pull/1652#issuecomment-640580726 merging this. cc @garyli1019 This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [hudi] vinothchandar merged pull request #1714: [HUDI-1005] fix NPE in HoodieWriteClient.clean

2020-06-09 Thread GitBox
vinothchandar merged pull request #1714: URL: https://github.com/apache/hudi/pull/1714 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [hudi] vinothchandar commented on a change in pull request #1602: [HUDI-494] fix incorrect record size estimation

2020-06-09 Thread GitBox
vinothchandar commented on a change in pull request #1602: URL: https://github.com/apache/hudi/pull/1602#discussion_r436636150 ## File path: hudi-client/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java ## @@ -54,6 +54,12 @@ public static final String

[GitHub] [hudi] nikitap95 edited a comment on issue #1550: Hudi 0.5.2 inability save complex type with nullable = true [SUPPORT]

2020-06-09 Thread GitBox
nikitap95 edited a comment on issue #1550: URL: https://github.com/apache/hudi/issues/1550#issuecomment-640574748 @vinothchandar Thanks for your prompt response. Will wait for the release in that case rather than using the patch. Sure, I'll get myself added to it, would be great to be

[GitHub] [hudi] codecov-commenter edited a comment on pull request #1717: [HUDI-1012] Add unit test for snapshot reads

2020-06-09 Thread GitBox
codecov-commenter edited a comment on pull request #1717: URL: https://github.com/apache/hudi/pull/1717#issuecomment-640854839 # [Codecov](https://codecov.io/gh/apache/hudi/pull/1717?src=pr=h1) Report > Merging [#1717](https://codecov.io/gh/apache/hudi/pull/1717?src=pr=desc) into

[GitHub] [hudi] nandini57 edited a comment on issue #1705: Tracking Hudi Data along transaction time and buisness time

2020-06-09 Thread GitBox
nandini57 edited a comment on issue #1705: URL: https://github.com/apache/hudi/issues/1705#issuecomment-640599130 Yes Balaji. Each record can have 4 columns (IN_Z,OUT_Z(system dimension),FROM_Z,THRU_Z(business dimension)) .If you see the code above,i am creating different unique keys and

[GitHub] [hudi] vinothchandar commented on a change in pull request #1711: [HUDI-974] fix fields out of order in MOR mode when using Hive

2020-06-09 Thread GitBox
vinothchandar commented on a change in pull request #1711: URL: https://github.com/apache/hudi/pull/1711#discussion_r436633121 ## File path: hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/RealtimeUnmergedRecordReader.java ## @@ -82,7 +82,7 @@ public

[GitHub] [hudi] Litianye opened a new pull request #1719: [HUDI-1006]deltastreamer use kafkaSource with offset reset strategy:latest can't consume data

2020-06-09 Thread GitBox
Litianye opened a new pull request #1719: URL: https://github.com/apache/hudi/pull/1719 ## What is the purpose of the pull request This pull request fix deltastreamer use kafkasource (such as JsonKafkaSource / AvroKafkaSource) with offset reset strategy:latest can't consume data

[GitHub] [hudi] leesf merged pull request #1718: [HUDI-1016] [Minor] Code optimization

2020-06-09 Thread GitBox
leesf merged pull request #1718: URL: https://github.com/apache/hudi/pull/1718 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] vinothchandar merged pull request #1592: [HUDI-822] decouple Hudi related logics from HoodieInputFormat

2020-06-09 Thread GitBox
vinothchandar merged pull request #1592: URL: https://github.com/apache/hudi/pull/1592 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [hudi] vinothchandar commented on pull request #1602: [HUDI-494] fix incorrect record size estimation

2020-06-09 Thread GitBox
vinothchandar commented on pull request #1602: URL: https://github.com/apache/hudi/pull/1602#issuecomment-640555982 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [hudi] wangxianghu closed pull request #1665: [HUDI-910]Introduce HoodieWriteInput for hudi write client

2020-06-09 Thread GitBox
wangxianghu closed pull request #1665: URL: https://github.com/apache/hudi/pull/1665 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [hudi] garyli1019 commented on pull request #1602: [HUDI-494] fix incorrect record size estimation

2020-06-09 Thread GitBox
garyli1019 commented on pull request #1602: URL: https://github.com/apache/hudi/pull/1602#issuecomment-640757660 @vinothchandar CI passed with rebase. This is an automated message from the Apache Git Service. To respond to

[GitHub] [hudi] luoyajun526 opened a new pull request #1720: [HUDI-1003] Handle partitions correctly for syncing hudi non-parititioned table to hive

2020-06-09 Thread GitBox
luoyajun526 opened a new pull request #1720: URL: https://github.com/apache/hudi/pull/1720 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of

  1   2   >