[GitHub] [hudi] bobgalvao opened a new issue #1723: [SUPPORT] - trouble using Apache Hudi with S3.

2020-06-09 Thread GitBox
bobgalvao opened a new issue #1723: URL: https://github.com/apache/hudi/issues/1723 Hi, I'm having a trouble using Apache Hudi with S3. **Steps to reproduce the behavior:** 1. Produce messages to topic Kafka. (2000 records per window on average) 2. Start streaming

[GitHub] [hudi] garyli1019 commented on a change in pull request #1719: [HUDI-1006]deltastreamer use kafkaSource with offset reset strategy:latest can't consume data

2020-06-09 Thread GitBox
garyli1019 commented on a change in pull request #1719: URL: https://github.com/apache/hudi/pull/1719#discussion_r437841744 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/AvroKafkaSource.java ## @@ -57,10 +57,10 @@ public

[jira] [Updated] (HUDI-1018) Handle empty checkpoint better in delta streamer

2020-06-09 Thread Yanjia Gary Li (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanjia Gary Li updated HUDI-1018: - Component/s: DeltaStreamer > Handle empty checkpoint better in delta streamer >

[jira] [Updated] (HUDI-1018) Handle empty checkpoint better in delta streamer

2020-06-09 Thread Yanjia Gary Li (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanjia Gary Li updated HUDI-1018: - Status: Open (was: New) > Handle empty checkpoint better in delta streamer >

[jira] [Created] (HUDI-1018) Handle empty checkpoint better in delta streamer

2020-06-09 Thread Yanjia Gary Li (Jira)
Yanjia Gary Li created HUDI-1018: Summary: Handle empty checkpoint better in delta streamer Key: HUDI-1018 URL: https://issues.apache.org/jira/browse/HUDI-1018 Project: Apache Hudi Issue

[jira] [Assigned] (HUDI-1010) Fix the memory leak for hudi-client unit tests

2020-06-09 Thread Nishith Agarwal (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishith Agarwal reassigned HUDI-1010: - Assignee: Nishith Agarwal > Fix the memory leak for hudi-client unit tests >

[jira] [Assigned] (HUDI-994) Identify functional tests that are convertible to unit tests with mocks

2020-06-09 Thread Nishith Agarwal (Jira)
[ https://issues.apache.org/jira/browse/HUDI-994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishith Agarwal reassigned HUDI-994: Assignee: Prashant Wason > Identify functional tests that are convertible to unit tests

[jira] [Updated] (HUDI-999) Parallelize listing of Source dataset partitions

2020-06-09 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-999: Fix Version/s: 0.6.0 > Parallelize listing of Source dataset partitions >

[jira] [Updated] (HUDI-807) Spark DS Support for incremental queries for bootstrapped tables

2020-06-09 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-807: Fix Version/s: 0.6.0 > Spark DS Support for incremental queries for bootstrapped tables >

[jira] [Updated] (HUDI-954) Test COW : Presto Read Optimized Query with metadata bootstrap

2020-06-09 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-954: Fix Version/s: 0.6.0 > Test COW : Presto Read Optimized Query with metadata bootstrap >

[jira] [Updated] (HUDI-806) Implement support for bootstrapping via Spark datasource API

2020-06-09 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-806: Fix Version/s: 0.6.0 > Implement support for bootstrapping via Spark datasource API >

[jira] [Updated] (HUDI-956) Test COW : Presto Realtime Query with metadata bootstrap

2020-06-09 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-956: Fix Version/s: 0.6.0 > Test COW : Presto Realtime Query with metadata bootstrap >

[jira] [Commented] (HUDI-781) Re-design test utilities

2020-06-09 Thread Nishith Agarwal (Jira)
[ https://issues.apache.org/jira/browse/HUDI-781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17130049#comment-17130049 ] Nishith Agarwal commented on HUDI-781: -- [~pwason] Can you help with #2 ? Like we talked about, mocks

[jira] [Updated] (HUDI-955) Test MOR : Presto Read Optimized Query with metadata bootstrap

2020-06-09 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-955: Fix Version/s: 0.6.0 > Test MOR : Presto Read Optimized Query with metadata bootstrap >

[jira] [Updated] (HUDI-619) Investigate and implement mechanism to have hive/presto/sparksql queries avoid stitching and return null values for hoodie columns

2020-06-09 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-619: Fix Version/s: 0.6.0 > Investigate and implement mechanism to have hive/presto/sparksql

[jira] [Updated] (HUDI-971) Fix HFileBootstrapIndexReader.getIndexedPartitions() returns unclean partition name

2020-06-09 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-971: Fix Version/s: 0.6.0 > Fix HFileBootstrapIndexReader.getIndexedPartitions() returns unclean

[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type

2020-06-09 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-992: Fix Version/s: 0.6.0 > For hive-style partitioned source data, partition columns synced with

[jira] [Updated] (HUDI-806) Implement support for bootstrapping via Spark datasource API

2020-06-09 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-806: Priority: Blocker (was: Major) > Implement support for bootstrapping via Spark datasource

[jira] [Updated] (HUDI-999) Parallelize listing of Source dataset partitions

2020-06-09 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-999: Priority: Blocker (was: Major) > Parallelize listing of Source dataset partitions >

[jira] [Updated] (HUDI-807) Spark DS Support for incremental queries for bootstrapped tables

2020-06-09 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-807: Priority: Blocker (was: Major) > Spark DS Support for incremental queries for bootstrapped

[jira] [Updated] (HUDI-971) Fix HFileBootstrapIndexReader.getIndexedPartitions() returns unclean partition name

2020-06-09 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-971: Priority: Blocker (was: Major) > Fix HFileBootstrapIndexReader.getIndexedPartitions()

[jira] [Updated] (HUDI-992) For hive-style partitioned source data, partition columns synced with Hive will always have String type

2020-06-09 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-992: Priority: Blocker (was: Major) > For hive-style partitioned source data, partition columns

[jira] [Updated] (HUDI-956) Test COW : Presto Realtime Query with metadata bootstrap

2020-06-09 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-956: Priority: Blocker (was: Major) > Test COW : Presto Realtime Query with metadata bootstrap >

[jira] [Updated] (HUDI-954) Test COW : Presto Read Optimized Query with metadata bootstrap

2020-06-09 Thread Balaji Varadarajan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Balaji Varadarajan updated HUDI-954: Priority: Blocker (was: Major) > Test COW : Presto Read Optimized Query with metadata

[GitHub] [hudi] vinothchandar commented on pull request #1722: [HUDI-69] Support Spark Datasource for MOR table

2020-06-09 Thread GitBox
vinothchandar commented on pull request #1722: URL: https://github.com/apache/hudi/pull/1722#issuecomment-641708414 @umehrot2 take a look as well? This is an automated message from the Apache Git Service. To respond to the

[GitHub] [hudi] vinothchandar commented on a change in pull request #1687: [WIP] [HUDI-684] Introduced abstraction for writing and reading different types of base file formats.

2020-06-09 Thread GitBox
vinothchandar commented on a change in pull request #1687: URL: https://github.com/apache/hudi/pull/1687#discussion_r437846366 ## File path: hudi-client/src/main/java/org/apache/hudi/io/storage/HoodieStorageWriterFactory.java ## @@ -66,4 +67,21 @@ return new

[GitHub] [hudi] vinothchandar commented on a change in pull request #1687: [WIP] [HUDI-684] Introduced abstraction for writing and reading different types of base file formats.

2020-06-09 Thread GitBox
vinothchandar commented on a change in pull request #1687: URL: https://github.com/apache/hudi/pull/1687#discussion_r437846366 ## File path: hudi-client/src/main/java/org/apache/hudi/io/storage/HoodieStorageWriterFactory.java ## @@ -66,4 +67,21 @@ return new

Build failed in Jenkins: hudi-snapshot-deployment-0.5 #304

2020-06-09 Thread Apache Jenkins Server
See Changes: -- [...truncated 2.42 KB...] settings.xml toolchains.xml /home/jenkins/tools/maven/apache-maven-3.5.4/conf/logging: simplelogger.properties

[jira] [Resolved] (HUDI-1005) NPE in HoodieWriteClient.clean

2020-06-09 Thread Hong Shen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Shen resolved HUDI-1005. - Resolution: Fixed > NPE in HoodieWriteClient.clean > --- > >

[jira] [Updated] (HUDI-1005) NPE in HoodieWriteClient.clean

2020-06-09 Thread Hong Shen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Shen updated HUDI-1005: Status: Open (was: New) > NPE in HoodieWriteClient.clean > --- > >

[GitHub] [hudi] shenh062326 commented on pull request #1714: [HUDI-1005] fix NPE in HoodieWriteClient.clean

2020-06-09 Thread GitBox
shenh062326 commented on pull request #1714: URL: https://github.com/apache/hudi/pull/1714#issuecomment-641677856 > I was wondering if there was a way to just throw an exception or make it an Option.. merged.. let's punt on this for now When I try to run HoodieDeltaStreamer with

[GitHub] [hudi] shenh062326 commented on pull request #1690: [HUDI-908] Add decimals to HoodieTestDataGenerator

2020-06-09 Thread GitBox
shenh062326 commented on pull request #1690: URL: https://github.com/apache/hudi/pull/1690#issuecomment-641671278 > @shenh062326 : It makes sense to cover other data-types in a single PR. Can you also add them to this PR. Also, Can you let us know what the missing data types are ?

[jira] [Updated] (HUDI-1016) [Minor] Code optimization

2020-06-09 Thread Hong Shen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Shen updated HUDI-1016: Status: Open (was: New) > [Minor] Code optimization > - > > Key:

[jira] [Resolved] (HUDI-1016) [Minor] Code optimization

2020-06-09 Thread Hong Shen (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hong Shen resolved HUDI-1016. - Resolution: Fixed > [Minor] Code optimization > - > > Key:

[GitHub] [hudi] codecov-commenter edited a comment on pull request #1721: Cache the explodeRecordRDDWithFileComparisons instead of commuting it…

2020-06-09 Thread GitBox
codecov-commenter edited a comment on pull request #1721: URL: https://github.com/apache/hudi/pull/1721#issuecomment-641622744 # [Codecov](https://codecov.io/gh/apache/hudi/pull/1721?src=pr=h1) Report > Merging [#1721](https://codecov.io/gh/apache/hudi/pull/1721?src=pr=desc) into

[GitHub] [hudi] codecov-commenter commented on pull request #1721: Cache the explodeRecordRDDWithFileComparisons instead of commuting it…

2020-06-09 Thread GitBox
codecov-commenter commented on pull request #1721: URL: https://github.com/apache/hudi/pull/1721#issuecomment-641622744 # [Codecov](https://codecov.io/gh/apache/hudi/pull/1721?src=pr=h1) Report > Merging [#1721](https://codecov.io/gh/apache/hudi/pull/1721?src=pr=desc) into

[GitHub] [hudi] sathyaprakashg commented on pull request #1664: HUDI-942 Increase default value number of delta commits for inline compaction

2020-06-09 Thread GitBox
sathyaprakashg commented on pull request #1664: URL: https://github.com/apache/hudi/pull/1664#issuecomment-641619354 Thanks @vinothchandar. @bhasudha Please refer here for the issue i am facing https://www.mail-archive.com/dev@hudi.apache.org/msg02967.html Please suggest on how to

[jira] [Created] (HUDI-1017) Integration test failure

2020-06-09 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-1017: - Summary: Integration test failure Key: HUDI-1017 URL: https://issues.apache.org/jira/browse/HUDI-1017 Project: Apache Hudi Issue Type: Bug

[GitHub] [hudi] xushiyan commented on pull request #1095: [HUDI-210] Implement prometheus metrics reporter

2020-06-09 Thread GitBox
xushiyan commented on pull request #1095: URL: https://github.com/apache/hudi/pull/1095#issuecomment-641542940 > @xushiyan hello,how is the progress Unfortunately I have to de-prioritize this as the test improvements are more needed at the moment. I may only be able to come back to

[GitHub] [hudi] garyli1019 commented on pull request #1719: [HUDI-1006]deltastreamer use kafkaSource with offset reset strategy:latest can't consume data

2020-06-09 Thread GitBox
garyli1019 commented on pull request #1719: URL: https://github.com/apache/hudi/pull/1719#issuecomment-641539959 @Litianye Thanks for making this PR. Will review soon. This is an automated message from the Apache Git

[jira] [Closed] (HUDI-905) Support PrunedFilteredScan for Spark Datasource

2020-06-09 Thread Yanjia Gary Li (Jira)
[ https://issues.apache.org/jira/browse/HUDI-905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanjia Gary Li closed HUDI-905. --- Resolution: Not A Problem TableScan already supported filter and projection pushdown. > Support

[jira] [Updated] (HUDI-610) MOR table Impala read support

2020-06-09 Thread Yanjia Gary Li (Jira)
[ https://issues.apache.org/jira/browse/HUDI-610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanjia Gary Li updated HUDI-610: Summary: MOR table Impala read support (was: Impala nea real time table support) > MOR table

[jira] [Assigned] (HUDI-610) Impala nea real time table support

2020-06-09 Thread Yanjia Gary Li (Jira)
[ https://issues.apache.org/jira/browse/HUDI-610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanjia Gary Li reassigned HUDI-610: --- Assignee: (was: Yanjia Gary Li) > Impala nea real time table support >

[jira] [Resolved] (HUDI-494) [DEBUGGING] Huge amount of tasks when writing files into HDFS

2020-06-09 Thread Yanjia Gary Li (Jira)
[ https://issues.apache.org/jira/browse/HUDI-494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanjia Gary Li resolved HUDI-494. - Resolution: Fixed > [DEBUGGING] Huge amount of tasks when writing files into HDFS >

[jira] [Closed] (HUDI-494) [DEBUGGING] Huge amount of tasks when writing files into HDFS

2020-06-09 Thread Yanjia Gary Li (Jira)
[ https://issues.apache.org/jira/browse/HUDI-494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanjia Gary Li closed HUDI-494. --- > [DEBUGGING] Huge amount of tasks when writing files into HDFS >

[jira] [Resolved] (HUDI-822) Decouple hoodie related methods with Hoodie Input Formats

2020-06-09 Thread Yanjia Gary Li (Jira)
[ https://issues.apache.org/jira/browse/HUDI-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanjia Gary Li resolved HUDI-822. - Resolution: Fixed > Decouple hoodie related methods with Hoodie Input Formats >

[jira] [Closed] (HUDI-822) Decouple hoodie related methods with Hoodie Input Formats

2020-06-09 Thread Yanjia Gary Li (Jira)
[ https://issues.apache.org/jira/browse/HUDI-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanjia Gary Li closed HUDI-822. --- > Decouple hoodie related methods with Hoodie Input Formats >

[GitHub] [hudi] garyli1019 closed pull request #1700: [Draft]Hudi 69 draft

2020-06-09 Thread GitBox
garyli1019 closed pull request #1700: URL: https://github.com/apache/hudi/pull/1700 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [hudi] garyli1019 opened a new pull request #1722: [HUDI-69] Support Spark Datasource for MOR table

2020-06-09 Thread GitBox
garyli1019 opened a new pull request #1722: URL: https://github.com/apache/hudi/pull/1722 ## What is the purpose of the pull request This PR implement Spark Datasource for MOR table ## Brief change log - Implemented realtimeRelation - Implemented

[jira] [Updated] (HUDI-69) Support realtime view in Spark datasource #136

2020-06-09 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-69?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-69: --- Labels: pull-request-available (was: ) > Support realtime view in Spark datasource #136 >

[jira] [Commented] (HUDI-651) Incremental Query on Hive via Spark SQL does not return expected results

2020-06-09 Thread Bhavani Sudha (Jira)
[ https://issues.apache.org/jira/browse/HUDI-651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129734#comment-17129734 ] Bhavani Sudha commented on HUDI-651: I have pushed it to my repo in this branch - 

[GitHub] [hudi] EdwinGuo opened a new pull request #1721: Cache the explodeRecordRDDWithFileComparisons instead of commuting it…

2020-06-09 Thread GitBox
EdwinGuo opened a new pull request #1721: URL: https://github.com/apache/hudi/pull/1721 … twice in lookUpIndex ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.*

[jira] [Commented] (HUDI-781) Re-design test utilities

2020-06-09 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129642#comment-17129642 ] Raymond Xu commented on HUDI-781: - [~vinoth] Make sense. I've paused #1 as it's targeting from a different

[GitHub] [hudi] vinothchandar merged pull request #1592: [HUDI-822] decouple Hudi related logics from HoodieInputFormat

2020-06-09 Thread GitBox
vinothchandar merged pull request #1592: URL: https://github.com/apache/hudi/pull/1592 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [hudi] vinothchandar commented on a change in pull request #1720: [HUDI-1003] Handle partitions correctly for syncing hudi non-parititioned table to hive

2020-06-09 Thread GitBox
vinothchandar commented on a change in pull request #1720: URL: https://github.com/apache/hudi/pull/1720#discussion_r437539185 ## File path: hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala ## @@ -247,7 +247,13 @@ private[hudi] object HoodieSparkSqlWriter {

[GitHub] [hudi] Litianye opened a new pull request #1719: [HUDI-1006]deltastreamer use kafkaSource with offset reset strategy:latest can't consume data

2020-06-09 Thread GitBox
Litianye opened a new pull request #1719: URL: https://github.com/apache/hudi/pull/1719 ## What is the purpose of the pull request This pull request fix deltastreamer use kafkasource (such as JsonKafkaSource / AvroKafkaSource) with offset reset strategy:latest can't consume data

[GitHub] [hudi] leesf merged pull request #1718: [HUDI-1016] [Minor] Code optimization

2020-06-09 Thread GitBox
leesf merged pull request #1718: URL: https://github.com/apache/hudi/pull/1718 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] vinothchandar commented on pull request #1664: HUDI-942 Increase default value number of delta commits for inline compaction

2020-06-09 Thread GitBox
vinothchandar commented on pull request #1664: URL: https://github.com/apache/hudi/pull/1664#issuecomment-640550723 @bhasudha Will help you out with the integration test issue on local machine.. Must be something environmental.

[GitHub] [hudi] codecov-commenter edited a comment on pull request #1592: [HUDI-822] decouple Hudi related logics from HoodieInputFormat

2020-06-09 Thread GitBox
codecov-commenter edited a comment on pull request #1592: URL: https://github.com/apache/hudi/pull/1592#issuecomment-632985999 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [hudi] codecov-commenter edited a comment on pull request #1717: [HUDI-1012] Add unit test for snapshot reads

2020-06-09 Thread GitBox
codecov-commenter edited a comment on pull request #1717: URL: https://github.com/apache/hudi/pull/1717#issuecomment-640854839 # [Codecov](https://codecov.io/gh/apache/hudi/pull/1717?src=pr=h1) Report > Merging [#1717](https://codecov.io/gh/apache/hudi/pull/1717?src=pr=desc) into

[GitHub] [hudi] nandini57 edited a comment on issue #1705: Tracking Hudi Data along transaction time and buisness time

2020-06-09 Thread GitBox
nandini57 edited a comment on issue #1705: URL: https://github.com/apache/hudi/issues/1705#issuecomment-640599130 Yes Balaji. Each record can have 4 columns (IN_Z,OUT_Z(system dimension),FROM_Z,THRU_Z(business dimension)) .If you see the code above,i am creating different unique keys and

[GitHub] [hudi] vinothchandar commented on a change in pull request #1711: [HUDI-974] fix fields out of order in MOR mode when using Hive

2020-06-09 Thread GitBox
vinothchandar commented on a change in pull request #1711: URL: https://github.com/apache/hudi/pull/1711#discussion_r436633121 ## File path: hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/RealtimeUnmergedRecordReader.java ## @@ -82,7 +82,7 @@ public

[GitHub] [hudi] vinothchandar edited a comment on issue #1550: Hudi 0.5.2 inability save complex type with nullable = true [SUPPORT]

2020-06-09 Thread GitBox
vinothchandar edited a comment on issue #1550: URL: https://github.com/apache/hudi/issues/1550#issuecomment-640542938 @nsivabalan is driving the release.. We are planning to do a 0.5.3 this week. right siva ? This release will have the fix.. @nikitap95 if interested, you can join the

[GitHub] [hudi] sbernauer commented on pull request #1647: [HUDI-867]: fixed IllegalArgumentException from graphite metrics in deltaStreamer continuous mode

2020-06-09 Thread GitBox
sbernauer commented on pull request #1647: URL: https://github.com/apache/hudi/pull/1647#issuecomment-641278957 If i read https://stackoverflow.com/a/55753138 correctly, normally you register an gauge only at startup (or first metric write) and than just update the value in every loop.

[GitHub] [hudi] leesf commented on pull request #1719: [HUDI-1006]deltastreamer use kafkaSource with offset reset strategy:latest can't consume data

2020-06-09 Thread GitBox
leesf commented on pull request #1719: URL: https://github.com/apache/hudi/pull/1719#issuecomment-641222330 @garyli1019 would you please review this one? This is an automated message from the Apache Git Service. To respond

[GitHub] [hudi] codecov-commenter edited a comment on pull request #1602: [HUDI-494] fix incorrect record size estimation

2020-06-09 Thread GitBox
codecov-commenter edited a comment on pull request #1602: URL: https://github.com/apache/hudi/pull/1602#issuecomment-632410484 # [Codecov](https://codecov.io/gh/apache/hudi/pull/1602?src=pr=h1) Report > Merging [#1602](https://codecov.io/gh/apache/hudi/pull/1602?src=pr=desc) into

[GitHub] [hudi] codecov-commenter edited a comment on pull request #1719: [HUDI-1006]deltastreamer use kafkaSource with offset reset strategy:latest can't consume data

2020-06-09 Thread GitBox
codecov-commenter edited a comment on pull request #1719: URL: https://github.com/apache/hudi/pull/1719#issuecomment-641096930 # [Codecov](https://codecov.io/gh/apache/hudi/pull/1719?src=pr=h1) Report > Merging [#1719](https://codecov.io/gh/apache/hudi/pull/1719?src=pr=desc) into

[jira] [Commented] (HUDI-896) Parallelize CI testing to reduce CI wait time

2020-06-09 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17129562#comment-17129562 ] Raymond Xu commented on HUDI-896: - [~vinoth] There is a problem with the current codecov report generation.

[GitHub] [hudi] vinothchandar commented on a change in pull request #1602: [HUDI-494] fix incorrect record size estimation

2020-06-09 Thread GitBox
vinothchandar commented on a change in pull request #1602: URL: https://github.com/apache/hudi/pull/1602#discussion_r436636150 ## File path: hudi-client/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java ## @@ -54,6 +54,12 @@ public static final String

[GitHub] [hudi] nikitap95 edited a comment on issue #1550: Hudi 0.5.2 inability save complex type with nullable = true [SUPPORT]

2020-06-09 Thread GitBox
nikitap95 edited a comment on issue #1550: URL: https://github.com/apache/hudi/issues/1550#issuecomment-640574748 @vinothchandar Thanks for your prompt response. Will wait for the release in that case rather than using the patch. Sure, I'll get myself added to it, would be great to be

[jira] [Updated] (HUDI-1006) deltastreamer use kafkaSource with offset reset strategy: latest can't consume data

2020-06-09 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1006: - Labels: pull-request-available (was: ) > deltastreamer use kafkaSource with offset reset

[GitHub] [hudi] codecov-commenter commented on pull request #1719: [HUDI-1006]deltastreamer use kafkaSource with offset reset strategy:latest can't consume data

2020-06-09 Thread GitBox
codecov-commenter commented on pull request #1719: URL: https://github.com/apache/hudi/pull/1719#issuecomment-641096930 # [Codecov](https://codecov.io/gh/apache/hudi/pull/1719?src=pr=h1) Report > Merging [#1719](https://codecov.io/gh/apache/hudi/pull/1719?src=pr=desc) into

[GitHub] [hudi] garyli1019 commented on pull request #1592: [HUDI-822] decouple Hudi related logics from HoodieInputFormat

2020-06-09 Thread GitBox
garyli1019 commented on pull request #1592: URL: https://github.com/apache/hudi/pull/1592#issuecomment-640758035 @vinothchandar this one passed with rebase too This is an automated message from the Apache Git Service. To

[GitHub] [hudi] bvaradar commented on pull request #1690: [HUDI-908] Add decimals to HoodieTestDataGenerator

2020-06-09 Thread GitBox
bvaradar commented on pull request #1690: URL: https://github.com/apache/hudi/pull/1690#issuecomment-641301829 @shenh062326 : It makes sense to cover other data-types in a single PR. Can you also add them to this PR. Also, Can you let us know what the missing data types are ?

[GitHub] [hudi] nandini57 commented on issue #1705: Tracking Hudi Data along transaction time and buisness time

2020-06-09 Thread GitBox
nandini57 commented on issue #1705: URL: https://github.com/apache/hudi/issues/1705#issuecomment-640599130 Yes Balaji. Each record can have 4 columns (IN_Z,OUT_Z(system dimension),FROM_Z,THRU_Z(business dimension)) .If you see the code above,i am creating different unique keys and

[GitHub] [hudi] leesf merged pull request #1652: [HUDI-918] Fix kafkaOffsetGen can not read kafka data bug

2020-06-09 Thread GitBox
leesf merged pull request #1652: URL: https://github.com/apache/hudi/pull/1652 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] vinothchandar merged pull request #1714: [HUDI-1005] fix NPE in HoodieWriteClient.clean

2020-06-09 Thread GitBox
vinothchandar merged pull request #1714: URL: https://github.com/apache/hudi/pull/1714 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [hudi] UZi5136225 commented on pull request #1095: [HUDI-210] Implement prometheus metrics reporter

2020-06-09 Thread GitBox
UZi5136225 commented on pull request #1095: URL: https://github.com/apache/hudi/pull/1095#issuecomment-641078938 @xushiyan hello,how is the progress This is an automated message from the Apache Git Service. To respond to

[GitHub] [hudi] nsivabalan commented on a change in pull request #1683: Updating release docs for release-0.5.3

2020-06-09 Thread GitBox
nsivabalan commented on a change in pull request #1683: URL: https://github.com/apache/hudi/pull/1683#discussion_r436711094 ## File path: docs/_pages/releases.md ## @@ -3,8 +3,40 @@ title: "Releases" permalink: /releases layout: releases toc: true -last_modified_at:

[GitHub] [hudi] codecov-commenter edited a comment on pull request #1716: [HUDI-875] Introduce a new pom module named hudi-common-sync

2020-06-09 Thread GitBox
codecov-commenter edited a comment on pull request #1716: URL: https://github.com/apache/hudi/pull/1716#issuecomment-641229684 This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [hudi] luoyajun526 opened a new pull request #1720: [HUDI-1003] Handle partitions correctly for syncing hudi non-parititioned table to hive

2020-06-09 Thread GitBox
luoyajun526 opened a new pull request #1720: URL: https://github.com/apache/hudi/pull/1720 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of

[GitHub] [hudi] codecov-commenter edited a comment on pull request #1711: [HUDI-974] fix fields out of order in MOR mode when using Hive

2020-06-09 Thread GitBox
codecov-commenter edited a comment on pull request #1711: URL: https://github.com/apache/hudi/pull/1711#issuecomment-640326551 # [Codecov](https://codecov.io/gh/apache/hudi/pull/1711?src=pr=h1) Report > Merging [#1711](https://codecov.io/gh/apache/hudi/pull/1711?src=pr=desc) into

[GitHub] [hudi] n3nash commented on pull request #1638: HUDI-515 Resolve API conflict for Hive 2 & Hive 3

2020-06-09 Thread GitBox
n3nash commented on pull request #1638: URL: https://github.com/apache/hudi/pull/1638#issuecomment-640892145 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [hudi] wangxianghu closed pull request #1665: [HUDI-910]Introduce HoodieWriteInput for hudi write client

2020-06-09 Thread GitBox
wangxianghu closed pull request #1665: URL: https://github.com/apache/hudi/pull/1665 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [hudi] garyli1019 commented on pull request #1602: [HUDI-494] fix incorrect record size estimation

2020-06-09 Thread GitBox
garyli1019 commented on pull request #1602: URL: https://github.com/apache/hudi/pull/1602#issuecomment-640757660 @vinothchandar CI passed with rebase. This is an automated message from the Apache Git Service. To respond to

[GitHub] [hudi] leesf commented on pull request #1652: [HUDI-918] Fix kafkaOffsetGen can not read kafka data bug

2020-06-09 Thread GitBox
leesf commented on pull request #1652: URL: https://github.com/apache/hudi/pull/1652#issuecomment-640580726 merging this. cc @garyli1019 This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [hudi] codecov-commenter commented on pull request #1716: [HUDI-875] Introduce a new pom module named hudi-common-sync

2020-06-09 Thread GitBox
codecov-commenter commented on pull request #1716: URL: https://github.com/apache/hudi/pull/1716#issuecomment-641229684 # [Codecov](https://codecov.io/gh/apache/hudi/pull/1716?src=pr=h1) Report > Merging [#1716](https://codecov.io/gh/apache/hudi/pull/1716?src=pr=desc) into

[GitHub] [hudi] leesf merged pull request #1711: [HUDI-974] fix fields out of order in MOR mode when using Hive

2020-06-09 Thread GitBox
leesf merged pull request #1711: URL: https://github.com/apache/hudi/pull/1711 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [hudi] vinothchandar commented on pull request #1602: [HUDI-494] fix incorrect record size estimation

2020-06-09 Thread GitBox
vinothchandar commented on pull request #1602: URL: https://github.com/apache/hudi/pull/1602#issuecomment-640555982 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [hudi] wangxianghu commented on pull request #1665: [HUDI-910]Introduce HoodieWriteInput for hudi write client

2020-06-09 Thread GitBox
wangxianghu commented on pull request #1665: URL: https://github.com/apache/hudi/pull/1665#issuecomment-641275707 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [hudi] vinothchandar commented on issue #1550: Hudi 0.5.2 inability save complex type with nullable = true [SUPPORT]

2020-06-09 Thread GitBox
vinothchandar commented on issue #1550: URL: https://github.com/apache/hudi/issues/1550#issuecomment-640542938 @nsivabalan is driving the release.. We are planning to do a 0.5.3 this week. right siva ? This is an automated

[GitHub] [hudi] xushiyan commented on pull request #1707: [HUDI-988] fix more unit tests flakiness

2020-06-09 Thread GitBox
xushiyan commented on pull request #1707: URL: https://github.com/apache/hudi/pull/1707#issuecomment-640766975 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [hudi] vinothchandar merged pull request #1602: [HUDI-494] fix incorrect record size estimation

2020-06-09 Thread GitBox
vinothchandar merged pull request #1602: URL: https://github.com/apache/hudi/pull/1602 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [hudi] nsivabalan commented on a change in pull request #1712: Cherry picking HUDI-988 and HUDI-990 to release-0.5.3

2020-06-09 Thread GitBox
nsivabalan commented on a change in pull request #1712: URL: https://github.com/apache/hudi/pull/1712#discussion_r436940846 ## File path: hudi-cli/src/test/java/org/apache/hudi/cli/commands/AbstractShellIntegrationTest.java ## @@ -58,4 +58,13 @@ public void teardown() throws

[GitHub] [hudi] shenh062326 commented on pull request #1690: [HUDI-908] Add decimals to HoodieTestDataGenerator

2020-06-09 Thread GitBox
shenh062326 commented on pull request #1690: URL: https://github.com/apache/hudi/pull/1690#issuecomment-640978958 @bvaradar Should I add all data types to this pr or open another pr. My original idea was that this pr fixes the bug of parsing decimal type, and another pr is added to add

[GitHub] [hudi] codecov-commenter commented on pull request #1720: [HUDI-1003] Handle partitions correctly for syncing hudi non-parititioned table to hive

2020-06-09 Thread GitBox
codecov-commenter commented on pull request #1720: URL: https://github.com/apache/hudi/pull/1720#issuecomment-641194386 # [Codecov](https://codecov.io/gh/apache/hudi/pull/1720?src=pr=h1) Report > Merging [#1720](https://codecov.io/gh/apache/hudi/pull/1720?src=pr=desc) into

[GitHub] [hudi] shenh062326 commented on pull request #1714: [HUDI-1005] fix NPE in HoodieWriteClient.clean

2020-06-09 Thread GitBox
shenh062326 commented on pull request #1714: URL: https://github.com/apache/hudi/pull/1714#issuecomment-640974416 @vinothchandar can you take a look at this? This is an automated message from the Apache Git Service. To

[GitHub] [hudi] leesf commented on a change in pull request #1711: [HUDI-974] fix fields out of order in MOR mode when using Hive

2020-06-09 Thread GitBox
leesf commented on a change in pull request #1711: URL: https://github.com/apache/hudi/pull/1711#discussion_r436636431 ## File path: hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/RealtimeUnmergedRecordReader.java ## @@ -82,7 +82,7 @@ public

[GitHub] [hudi] shenh062326 opened a new pull request #1718: [HUDI-1016] [Minor] Code optimization

2020-06-09 Thread GitBox
shenh062326 opened a new pull request #1718: URL: https://github.com/apache/hudi/pull/1718 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of

[GitHub] [hudi] vinothchandar commented on a change in pull request #1683: Updating release docs for release-0.5.3

2020-06-09 Thread GitBox
vinothchandar commented on a change in pull request #1683: URL: https://github.com/apache/hudi/pull/1683#discussion_r434991598 ## File path: docs/_pages/releases.md ## @@ -3,8 +3,40 @@ title: "Releases" permalink: /releases layout: releases toc: true -last_modified_at:

  1   2   >