[GitHub] [incubator-hudi] codecov-commenter edited a comment on pull request #1645: [HUDI-707]Add unit test for StatsCommand

2020-05-21 Thread GitBox
codecov-commenter edited a comment on pull request #1645: URL: https://github.com/apache/incubator-hudi/pull/1645#issuecomment-631841340 # [Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1645?src=pr=h1) Report > Merging

[jira] [Updated] (HUDI-917) Calculation of 'stats wa' need to be modified

2020-05-21 Thread yaojingyi (Jira)
[ https://issues.apache.org/jira/browse/HUDI-917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yaojingyi updated HUDI-917: --- Description: 'Write Amplification Factor' = 'Total Written' / 'Total Upserted'  'Total Written' is always

[jira] [Updated] (HUDI-917) Calculation of 'stats wa' need to be modified

2020-05-21 Thread yaojingyi (Jira)
[ https://issues.apache.org/jira/browse/HUDI-917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yaojingyi updated HUDI-917: --- Description: 'Write Amplification Factor' = 'Total Written' / 'Total Upserted'  'Total Written' is always

[jira] [Updated] (HUDI-917) Calculation of 'stats wa' need to be modified

2020-05-21 Thread yaojingyi (Jira)
[ https://issues.apache.org/jira/browse/HUDI-917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yaojingyi updated HUDI-917: --- Description: 'Write Amplification Factor' = 'Total Written' / 'Total Upserted'  'Total Written' is always

[GitHub] [incubator-hudi] yanghua commented on a change in pull request #1644: [HUDI-811] Restructure test packages in hudi-common

2020-05-21 Thread GitBox
yanghua commented on a change in pull request #1644: URL: https://github.com/apache/incubator-hudi/pull/1644#discussion_r428506563 ## File path: hudi-common/src/test/java/org/apache/hudi/common/fs/inline/TestInLineFileSystemHFileInLining.java ## @@ -40,18 +42,18 @@ import

[GitHub] [incubator-hudi] hddong commented on pull request #1645: [HUDI-707]Add unit test for StatsCommand

2020-05-21 Thread GitBox
hddong commented on pull request #1645: URL: https://github.com/apache/incubator-hudi/pull/1645#issuecomment-631977694 @yanghua Thanks for your review, had address them. This is an automated message from the Apache Git

[jira] [Updated] (HUDI-917) Calculation of 'stats wa' need to be modified

2020-05-21 Thread yaojingyi (Jira)
[ https://issues.apache.org/jira/browse/HUDI-917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yaojingyi updated HUDI-917: --- Description: 'Write Amplification Factor' = 'Total Written' / 'Total Upserted'  'Total Written' is always

[jira] [Updated] (HUDI-917) Calculation of 'stats wa' need to be modified

2020-05-21 Thread yaojingyi (Jira)
[ https://issues.apache.org/jira/browse/HUDI-917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yaojingyi updated HUDI-917: --- Attachment: image-2020-05-21-14-21-33-244.png > Calculation of 'stats wa' need to be modified >

[jira] [Updated] (HUDI-917) Calculation of 'stats wa' need to be modified

2020-05-21 Thread yaojingyi (Jira)
[ https://issues.apache.org/jira/browse/HUDI-917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yaojingyi updated HUDI-917: --- Attachment: image-2020-05-21-14-22-39-624.png > Calculation of 'stats wa' need to be modified >

[GitHub] [incubator-hudi] codecov-commenter edited a comment on pull request #1645: [HUDI-707]Add unit test for StatsCommand

2020-05-21 Thread GitBox
codecov-commenter edited a comment on pull request #1645: URL: https://github.com/apache/incubator-hudi/pull/1645#issuecomment-631841340 # [Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1645?src=pr=h1) Report > Merging

[jira] [Created] (HUDI-917) Calculation of 'stats wa' need to be modified

2020-05-21 Thread yaojingyi (Jira)
yaojingyi created HUDI-917: -- Summary: Calculation of 'stats wa' need to be modified Key: HUDI-917 URL: https://issues.apache.org/jira/browse/HUDI-917 Project: Apache Hudi (incubating) Issue Type:

[jira] [Updated] (HUDI-917) Calculation of 'stats wa' need to be modified

2020-05-21 Thread yaojingyi (Jira)
[ https://issues.apache.org/jira/browse/HUDI-917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yaojingyi updated HUDI-917: --- Description: 'Write Amplification Factor' = 'Total Written' / 'Total Upserted'  'Total Written' is always

[GitHub] [incubator-hudi] codecov-commenter edited a comment on pull request #1645: [HUDI-707]Add unit test for StatsCommand

2020-05-21 Thread GitBox
codecov-commenter edited a comment on pull request #1645: URL: https://github.com/apache/incubator-hudi/pull/1645#issuecomment-631841340 # [Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1645?src=pr=h1) Report > Merging

[GitHub] [incubator-hudi] hddong commented on pull request #1645: [HUDI-707]Add unit test for StatsCommand

2020-05-21 Thread GitBox
hddong commented on pull request #1645: URL: https://github.com/apache/incubator-hudi/pull/1645#issuecomment-631917437 @yanghua : it's ready now. This is an automated message from the Apache Git Service. To respond to the

[jira] [Updated] (HUDI-917) Calculation of 'stats wa' need to be modified

2020-05-21 Thread yaojingyi (Jira)
[ https://issues.apache.org/jira/browse/HUDI-917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yaojingyi updated HUDI-917: --- Attachment: (was: image-2020-05-21-14-21-33-244.png) > Calculation of 'stats wa' need to be modified >

[jira] [Updated] (HUDI-917) Calculation of 'stats wa' need to be modified

2020-05-21 Thread yaojingyi (Jira)
[ https://issues.apache.org/jira/browse/HUDI-917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yaojingyi updated HUDI-917: --- Attachment: (was: image-2020-05-21-14-10-03-871.png) > Calculation of 'stats wa' need to be modified >

[GitHub] [incubator-hudi] yanghua commented on a change in pull request #1645: [HUDI-707]Add unit test for StatsCommand

2020-05-21 Thread GitBox
yanghua commented on a change in pull request #1645: URL: https://github.com/apache/incubator-hudi/pull/1645#discussion_r428490566 ## File path: hudi-cli/src/test/java/org/apache/hudi/cli/commands/TestStatsCommand.java ## @@ -0,0 +1,176 @@ +/* + * Licensed to the Apache

[jira] [Updated] (HUDI-861) Add Github and Twitter Widget on Hudi's official website

2020-05-21 Thread hong dongdong (Jira)
[ https://issues.apache.org/jira/browse/HUDI-861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hong dongdong updated HUDI-861: --- Status: Open (was: New) > Add Github and Twitter Widget on Hudi's official website >

[jira] [Updated] (HUDI-861) Add Github and Twitter Widget on Hudi's official website

2020-05-21 Thread hong dongdong (Jira)
[ https://issues.apache.org/jira/browse/HUDI-861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hong dongdong updated HUDI-861: --- Status: In Progress (was: Open) > Add Github and Twitter Widget on Hudi's official website >

[jira] [Updated] (HUDI-707) Add unit test for StatsCommand

2020-05-21 Thread vinoyang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang updated HUDI-707: -- Fix Version/s: 0.6.0 > Add unit test for StatsCommand > -- > > Key:

[GitHub] [incubator-hudi] yanghua merged pull request #1645: [HUDI-707]Add unit test for StatsCommand

2020-05-21 Thread GitBox
yanghua merged pull request #1645: URL: https://github.com/apache/incubator-hudi/pull/1645 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[jira] [Updated] (HUDI-707) Add unit test for StatsCommand

2020-05-21 Thread vinoyang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang updated HUDI-707: -- Status: Open (was: New) > Add unit test for StatsCommand > -- > >

[jira] [Closed] (HUDI-707) Add unit test for StatsCommand

2020-05-21 Thread vinoyang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang closed HUDI-707. - Resolution: Done Done via master branch: 802d16c8c9793156ef7fef0c59088040800fe025 > Add unit test for

[incubator-hudi] branch master updated: [HUDI-707] Add unit test for StatsCommand (#1645)

2020-05-21 Thread vinoyang
This is an automated email from the ASF dual-hosted git repository. vinoyang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git The following commit(s) were added to refs/heads/master by this push: new 802d16c [HUDI-707] Add unit test for

[jira] [Updated] (HUDI-918) Hudi can't get data

2020-05-21 Thread liujinhui (Jira)
[ https://issues.apache.org/jira/browse/HUDI-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liujinhui updated HUDI-918: --- Status: Open (was: New) > Hudi can't get data > --- > > Key: HUDI-918 >

[jira] [Created] (HUDI-918) Hudi can't get data

2020-05-21 Thread liujinhui (Jira)
liujinhui created HUDI-918: -- Summary: Hudi can't get data Key: HUDI-918 URL: https://issues.apache.org/jira/browse/HUDI-918 Project: Apache Hudi (incubating) Issue Type: Bug Components:

[incubator-hudi] branch hudi_test_suite_refactor updated (566f245 -> a048bf3)

2020-05-21 Thread nagarwal
This is an automated email from the ASF dual-hosted git repository. nagarwal pushed a change to branch hudi_test_suite_refactor in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git. discard 566f245 [HUDI-394] Provide a basic implementation of test suite add a048bf3

[GitHub] [incubator-hudi] bhasudha commented on issue #1653: [SUPPORT]: Hudi Deltastreamer OffsetoutofRange Exception reading from Kafka topic (12 partitions)

2020-05-21 Thread GitBox
bhasudha commented on issue #1653: URL: https://github.com/apache/incubator-hudi/issues/1653#issuecomment-632405067 @prashanthpdesai quick questions. Where do you checkpoint the offsets between mini batches? and how do you configure that to deltastreamer? Do you have the

[GitHub] [incubator-hudi] wangxianghu removed a comment on pull request #1652: [HUDI-918] Fix kafkaOffsetGen can not read kafka data bug

2020-05-21 Thread GitBox
wangxianghu removed a comment on pull request #1652: URL: https://github.com/apache/incubator-hudi/pull/1652#issuecomment-632429037 Hi @garyli1019, @UZi5136225 means when the configed "sourceLimit" is lesser than the partitions of kafka, kafkaOffetsGen will consume no data

[GitHub] [incubator-hudi] UZi5136225 commented on pull request #1652: [HUDI-918] Fix kafkaOffsetGen can not read kafka data bug

2020-05-21 Thread GitBox
UZi5136225 commented on pull request #1652: URL: https://github.com/apache/incubator-hudi/pull/1652#issuecomment-632430145 The specific reason is caused by the upward transformation @garyli1019 This is an automated

[GitHub] [incubator-hudi] nsivabalan commented on a change in pull request #1433: [HUDI-728]: Implement custom key generator

2020-05-21 Thread GitBox
nsivabalan commented on a change in pull request #1433: URL: https://github.com/apache/incubator-hudi/pull/1433#discussion_r428953503 ## File path: hudi-spark/src/main/java/org/apache/hudi/keygen/CustomKeyGenerator.java ## @@ -0,0 +1,128 @@ +/* + * Licensed to the Apache

[GitHub] [incubator-hudi] nsivabalan commented on pull request #1602: [HUDI-494] fix incorrect record size estimation

2020-05-21 Thread GitBox
nsivabalan commented on pull request #1602: URL: https://github.com/apache/incubator-hudi/pull/1602#issuecomment-632389785 @garyli1019 : thanks for clarifying. that was helpful. This is an automated message from the Apache

[GitHub] [incubator-hudi] nsivabalan commented on a change in pull request #1602: [HUDI-494] fix incorrect record size estimation

2020-05-21 Thread GitBox
nsivabalan commented on a change in pull request #1602: URL: https://github.com/apache/incubator-hudi/pull/1602#discussion_r428959265 ## File path: hudi-client/src/test/java/org/apache/hudi/common/HoodieTestDataGenerator.java ## @@ -70,7 +70,9 @@ public class

[GitHub] [incubator-hudi] nsivabalan commented on a change in pull request #1647: [HUDI-867]: fixed IllegalArgumentException from graphite metrics in deltaStreamer continuous mode

2020-05-21 Thread GitBox
nsivabalan commented on a change in pull request #1647: URL: https://github.com/apache/incubator-hudi/pull/1647#discussion_r428967198 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/HoodieDeltaStreamer.java ## @@ -416,10 +425,12 @@ public

[GitHub] [incubator-hudi] garyli1019 commented on a change in pull request #1602: [HUDI-494] fix incorrect record size estimation

2020-05-21 Thread GitBox
garyli1019 commented on a change in pull request #1602: URL: https://github.com/apache/incubator-hudi/pull/1602#discussion_r428971840 ## File path: hudi-client/src/test/java/org/apache/hudi/common/HoodieTestDataGenerator.java ## @@ -70,7 +70,9 @@ public class

[GitHub] [incubator-hudi] prashanthpdesai edited a comment on issue #1653: [SUPPORT]: Hudi Deltastreamer OffsetoutofRange Exception reading from Kafka topic (12 partitions)

2020-05-21 Thread GitBox
prashanthpdesai edited a comment on issue #1653: URL: https://github.com/apache/incubator-hudi/issues/1653#issuecomment-632406451 @bhasudha : I presume that offset for each partitions will be stored the commit metadata in HDFS path, since its first run we used auto.offset.reset earliest

[GitHub] [incubator-hudi] sungjuly commented on issue #661: Tracking ticket for reporting Hudi usages from the community

2020-05-21 Thread GitBox
sungjuly commented on issue #661: URL: https://github.com/apache/incubator-hudi/issues/661#issuecomment-632430324 At Udemy ([https://www.udemy.com/]) we're using Apache Hudi(0.5.0) on AWS EMR (5.29.0) to ingest MySQL change data capture. Thank you for open sourcing a great project.

[GitHub] [incubator-hudi] bvaradar commented on issue #1646: [SUPPORT]: Unable to query Hive table through Spark SQL

2020-05-21 Thread GitBox
bvaradar commented on issue #1646: URL: https://github.com/apache/incubator-hudi/issues/1646#issuecomment-632430489 You can take a look at https://hudi.apache.org/docs/docker_demo.html#step-4-b-run-spark-sql-queries for a working demo spark sql setup. There is no need to include

[GitHub] [incubator-hudi] hddong commented on a change in pull request #1558: [HUDI-796]: added deduping logic for upserts case

2020-05-21 Thread GitBox
hddong commented on a change in pull request #1558: URL: https://github.com/apache/incubator-hudi/pull/1558#discussion_r428056297 ## File path: hudi-cli/src/main/java/org/apache/hudi/cli/commands/SparkMain.java ## @@ -263,13 +265,26 @@ private static int

[GitHub] [incubator-hudi] nsivabalan commented on pull request #1648: [HUDI-916]: added support for multiple input formats in TimestampBasedKeyGenerator

2020-05-21 Thread GitBox
nsivabalan commented on pull request #1648: URL: https://github.com/apache/incubator-hudi/pull/1648#issuecomment-632392315 @pratyakshsharma : was this patch already reviewed as part of[ #1597](https://github.com/apache/incubator-hudi/pull/1597)? or do I need to review it from scratch

[GitHub] [incubator-hudi] codecov-commenter edited a comment on pull request #1602: [HUDI-494] fix incorrect record size estimation

2020-05-21 Thread GitBox
codecov-commenter edited a comment on pull request #1602: URL: https://github.com/apache/incubator-hudi/pull/1602#issuecomment-632410484 # [Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1602?src=pr=h1) Report > Merging

[GitHub] [incubator-hudi] codecov-commenter commented on pull request #1602: [HUDI-494] fix incorrect record size estimation

2020-05-21 Thread GitBox
codecov-commenter commented on pull request #1602: URL: https://github.com/apache/incubator-hudi/pull/1602#issuecomment-632410484 # [Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1602?src=pr=h1) Report > Merging

[GitHub] [incubator-hudi] garyli1019 commented on pull request #1574: [HUDI-701]Add unit test for HDFSParquetImportCommand

2020-05-21 Thread GitBox
garyli1019 commented on pull request #1574: URL: https://github.com/apache/incubator-hudi/pull/1574#issuecomment-632339112 Hi @hddong , thanks for your contribution on these tests. There are some tests failed in my local build in `hudi-cli` module. I believe it could be related to the

[GitHub] [incubator-hudi] prashanthpdesai opened a new issue #1653: [SUPPORT]: Hudi Deltastreamer OffsetoutofRange Exception reading from Kafka topic (12 partitions)

2020-05-21 Thread GitBox
prashanthpdesai opened a new issue #1653: URL: https://github.com/apache/incubator-hudi/issues/1653 - Have you gone through our [FAQs](https://cwiki.apache.org/confluence/display/HUDI/FAQ)? Yes - Join the mailing list to engage in conversations and get faster support at

[GitHub] [incubator-hudi] prashanthpdesai commented on issue #1653: [SUPPORT]: Hudi Deltastreamer OffsetoutofRange Exception reading from Kafka topic (12 partitions)

2020-05-21 Thread GitBox
prashanthpdesai commented on issue #1653: URL: https://github.com/apache/incubator-hudi/issues/1653#issuecomment-632406451 @bhasudha : I presume that offset for each partitions will be stored the commit metadata in HDFS path, since its first run we used auto.offset.reset earliest to

[GitHub] [incubator-hudi] afeldman1 commented on issue #933: Support for multiple level partitioning in Hudi

2020-05-21 Thread GitBox
afeldman1 commented on issue #933: URL: https://github.com/apache/incubator-hudi/issues/933#issuecomment-632436139 @vinothchandar Yes, I can do that. When you say the "`writing_data` page", are you referring to adding them to the wiki, to the DataSourceWriteOptions object, or to both? I'm

[jira] [Created] (HUDI-919) Run hudi-cli ITTest in docker.

2020-05-21 Thread hong dongdong (Jira)
hong dongdong created HUDI-919: -- Summary: Run hudi-cli ITTest in docker. Key: HUDI-919 URL: https://issues.apache.org/jira/browse/HUDI-919 Project: Apache Hudi (incubating) Issue Type:

[jira] [Commented] (HUDI-767) Support transformation when export to Hudi

2020-05-21 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113612#comment-17113612 ] sivabalan narayanan commented on HUDI-767: -- done.  > Support transformation when export to Hudi >

[GitHub] [incubator-hudi] prashanthpdesai edited a comment on issue #1653: [SUPPORT]: Hudi Deltastreamer OffsetoutofRange Exception reading from Kafka topic (12 partitions)

2020-05-21 Thread GitBox
prashanthpdesai edited a comment on issue #1653: URL: https://github.com/apache/incubator-hudi/issues/1653#issuecomment-632406451 @bhasudha : I presume that offset for each partitions will be stored the commit metadata in HDFS path, since its first run we used auto.offset.reset earliest

[jira] [Updated] (HUDI-918) Fix kafkaOffsetGen can not read kafka data bug

2020-05-21 Thread wangxianghu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wangxianghu updated HUDI-918: - Summary: Fix kafkaOffsetGen can not read kafka data bug (was: deltastreamer bug is no new data) > Fix

[GitHub] [incubator-hudi] wangxianghu edited a comment on pull request #1652: [HUDI-918] Fix kafkaOffsetGen can not read kafka data bug

2020-05-21 Thread GitBox
wangxianghu edited a comment on pull request #1652: URL: https://github.com/apache/incubator-hudi/pull/1652#issuecomment-632429037 Hi @garyli1019, @UZi5136225 means when the configed "sourceLimit" is lesser than the partitions of kafka, kafkaOffetsGen will consume no data

[GitHub] [incubator-hudi] bvaradar commented on issue #1649: [SUPPORT] Not more than one spark.sql is working on Hoodie Parquet format

2020-05-21 Thread GitBox
bvaradar commented on issue #1649: URL: https://github.com/apache/incubator-hudi/issues/1649#issuecomment-632433768 Does the path : s3a://gat-datalake-refined-dev/reports/player/dat/2020/04/23 actually exist. Have you enabled eventual consistency guard ?

[GitHub] [incubator-hudi] wangxianghu commented on pull request #1652: [HUDI-918] Fix kafkaOffsetGen can not read kafka data bug

2020-05-21 Thread GitBox
wangxianghu commented on pull request #1652: URL: https://github.com/apache/incubator-hudi/pull/1652#issuecomment-632429037 Hi @garyli1019, @UZi5136225 means when the configed "sourceLimit" is lesser than the partitions of kafka, kafkaOffetsGen will consumer no data.

[jira] [Updated] (HUDI-918) Fix kafkaOffsetGen can not read kafka data bug

2020-05-21 Thread liujinhui (Jira)
[ https://issues.apache.org/jira/browse/HUDI-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liujinhui updated HUDI-918: --- Description: When the sourcelimit is less than the number of Kafka partitions, Hudi cannot get the data

[GitHub] [incubator-hudi] UZi5136225 commented on pull request #1652: [HUDI-918] Fix kafkaOffsetGen can not read kafka data bug

2020-05-21 Thread GitBox
UZi5136225 commented on pull request #1652: URL: https://github.com/apache/incubator-hudi/pull/1652#issuecomment-632429031 Steps to reproduce: 1、Use deltastreamer to consume data from kafka 2、Set the value of sourceLimit to be less than the value of kafka partition 3、INFO

[GitHub] [incubator-hudi] bvaradar closed issue #1641: [SUPPORT] Failed to merge old record into new file for key xxx from old file 123.parquet to new file 456.parquet

2020-05-21 Thread GitBox
bvaradar closed issue #1641: URL: https://github.com/apache/incubator-hudi/issues/1641 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [incubator-hudi] wangxianghu edited a comment on pull request #1652: [HUDI-918] Fix kafkaOffsetGen can not read kafka data bug

2020-05-21 Thread GitBox
wangxianghu edited a comment on pull request #1652: URL: https://github.com/apache/incubator-hudi/pull/1652#issuecomment-632429037 Hi @garyli1019, @UZi5136225 means when the configed "sourceLimit" is lesser than the partitions of kafka, kafkaOffetsGen will consume no data.

[GitHub] [incubator-hudi] hddong commented on pull request #1574: [HUDI-701]Add unit test for HDFSParquetImportCommand

2020-05-21 Thread GitBox
hddong commented on pull request #1574: URL: https://github.com/apache/incubator-hudi/pull/1574#issuecomment-632434767 @garyli1019 : I had try docker before, it usually use `execStartCmd` to exec cmd directly. But for hudi-cli, we need exec cmd in interactive mode. I will try it again

[GitHub] [incubator-hudi] wangxianghu commented on pull request #1652: [HUDI-918] Fix kafkaOffsetGen can not read kafka data bug

2020-05-21 Thread GitBox
wangxianghu commented on pull request #1652: URL: https://github.com/apache/incubator-hudi/pull/1652#issuecomment-632445691 Hi @UZi5136225, It may be better to give some reminders on the description of "--source-limit"

[GitHub] [incubator-hudi] nsivabalan commented on a change in pull request #1433: [HUDI-728]: Implement custom key generator

2020-05-21 Thread GitBox
nsivabalan commented on a change in pull request #1433: URL: https://github.com/apache/incubator-hudi/pull/1433#discussion_r428954499 ## File path: hudi-spark/src/test/java/org/apache/hudi/keygen/TestSimpleKeyGenerator.java ## @@ -0,0 +1,97 @@ +/* + * Licensed to the Apache

[GitHub] [incubator-hudi] nsivabalan commented on a change in pull request #1616: [HUDI-786] Fixing read beyond inline length in InlineFS

2020-05-21 Thread GitBox
nsivabalan commented on a change in pull request #1616: URL: https://github.com/apache/incubator-hudi/pull/1616#discussion_r428970158 ## File path: hudi-common/src/main/java/org/apache/hudi/common/fs/inline/InLineFsDataInputStream.java ## @@ -56,24 +56,29 @@ public long

Build failed in Jenkins: hudi-snapshot-deployment-0.5 #285

2020-05-21 Thread Apache Jenkins Server
See Changes: -- [...truncated 2.35 KB...] /home/jenkins/tools/maven/apache-maven-3.5.4/conf: logging settings.xml toolchains.xml

[jira] [Assigned] (HUDI-905) Support PrunedFilteredScan for Spark Datasource

2020-05-21 Thread Yanjia Gary Li (Jira)
[ https://issues.apache.org/jira/browse/HUDI-905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanjia Gary Li reassigned HUDI-905: --- Assignee: Yanjia Gary Li > Support PrunedFilteredScan for Spark Datasource >

[GitHub] [incubator-hudi] garyli1019 commented on pull request #1652: [HUDI-918] HUDI small bug

2020-05-21 Thread GitBox
garyli1019 commented on pull request #1652: URL: https://github.com/apache/incubator-hudi/pull/1652#issuecomment-632238289 Hi @UZi5136225 , thanks for submitting this PR. I am not sure if I understand the bug you are referring to. Would you explain a little bit more?

[GitHub] [incubator-hudi] xushiyan commented on a change in pull request #1644: [HUDI-811] Restructure test packages in hudi-common

2020-05-21 Thread GitBox
xushiyan commented on a change in pull request #1644: URL: https://github.com/apache/incubator-hudi/pull/1644#discussion_r428757663 ## File path: hudi-common/src/test/java/org/apache/hudi/common/util/collection/TestRocksDBDAO.java ## @@ -45,7 +45,7 @@ /** * Tests RocksDB

[incubator-hudi] branch hudi_test_suite_refactor updated (894ab75 -> 566f245)

2020-05-21 Thread nagarwal
This is an automated email from the ASF dual-hosted git repository. nagarwal pushed a change to branch hudi_test_suite_refactor in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git. discard 894ab75 [HUDI-394] Provide a basic implementation of test suite add 566f245

[GitHub] [incubator-hudi] prashanthpdesai commented on issue #661: Tracking ticket for reporting Hudi usages from the community

2020-05-21 Thread GitBox
prashanthpdesai commented on issue #661: URL: https://github.com/apache/incubator-hudi/issues/661#issuecomment-632311231 @vinothchandar : sure , yes we are still in pre prod . This is an automated message from the

[GitHub] [incubator-hudi] maduxi edited a comment on issue #661: Tracking ticket for reporting Hudi usages from the community

2020-05-21 Thread GitBox
maduxi edited a comment on issue #661: URL: https://github.com/apache/incubator-hudi/issues/661#issuecomment-632323358 We are using it at an online casino based in Malta. We are using it in production, but only for a small part of our dataset. It's a large table that has frequent updates,

[GitHub] [incubator-hudi] maduxi commented on issue #661: Tracking ticket for reporting Hudi usages from the community

2020-05-21 Thread GitBox
maduxi commented on issue #661: URL: https://github.com/apache/incubator-hudi/issues/661#issuecomment-632323358 We are using it at an online casino based on Malta. We are using it in production, but only for a small part of our dataset. It's a large table that has frequent updates, and

[GitHub] [incubator-hudi] broussea1901 edited a comment on issue #661: Tracking ticket for reporting Hudi usages from the community

2020-05-21 Thread GitBox
broussea1901 edited a comment on issue #661: URL: https://github.com/apache/incubator-hudi/issues/661#issuecomment-632176944 We're currently deploying HUDI 0.5.0 in a EU Bank. Not in Prod yet. HUDI will be used to provide ACID ability for data ingestion batches and streams needing such

[GitHub] [incubator-hudi] xushiyan commented on a change in pull request #1644: [HUDI-811] Restructure test packages in hudi-common

2020-05-21 Thread GitBox
xushiyan commented on a change in pull request #1644: URL: https://github.com/apache/incubator-hudi/pull/1644#discussion_r428760599 ## File path: hudi-common/src/test/java/org/apache/hudi/common/fs/inline/TestInLineFileSystemHFileInLining.java ## @@ -40,18 +42,18 @@ import

[GitHub] [incubator-hudi] garyli1019 commented on issue #661: Tracking ticket for reporting Hudi usages from the community

2020-05-21 Thread GitBox
garyli1019 commented on issue #661: URL: https://github.com/apache/incubator-hudi/issues/661#issuecomment-632255322 We have been using HUDI to manage a data lake with 500+TB manufacturing data for almost a year now. In the IoT world, late arrival and update is a very common scenario and

[GitHub] [incubator-hudi] garyli1019 edited a comment on issue #661: Tracking ticket for reporting Hudi usages from the community

2020-05-21 Thread GitBox
garyli1019 edited a comment on issue #661: URL: https://github.com/apache/incubator-hudi/issues/661#issuecomment-632255322 We have been using HUDI to manage a data lake with 500+TB manufacturing data for almost a year now. In the IoT world, late arrival and update is a very common

[GitHub] [incubator-hudi] vinothchandar commented on issue #661: Tracking ticket for reporting Hudi usages from the community

2020-05-21 Thread GitBox
vinothchandar commented on issue #661: URL: https://github.com/apache/incubator-hudi/issues/661#issuecomment-632305767 @garyli1019 do you mind sharing your company name/logo and is it okay to list this on powered_by? @prashanthpdesai Let's offline the small file issue.. Interested

[GitHub] [incubator-hudi] prashanthpdesai commented on issue #661: Tracking ticket for reporting Hudi usages from the community

2020-05-21 Thread GitBox
prashanthpdesai commented on issue #661: URL: https://github.com/apache/incubator-hudi/issues/661#issuecomment-632263515 we are trying to use HUDI Deltastreamer to read from compacted Kafka topic in production environment and pull the messages incrementally, tried initially with MOR with

[GitHub] [incubator-hudi] lamber-ken commented on pull request #1651: [MINOR] add impala release and spark partition discovery

2020-05-21 Thread GitBox
lamber-ken commented on pull request #1651: URL: https://github.com/apache/incubator-hudi/pull/1651#issuecomment-632314208 @garyli1019 it would be nice to add more context about the pr next time : ) This is an automated

[GitHub] [incubator-hudi] prashanthpdesai edited a comment on issue #661: Tracking ticket for reporting Hudi usages from the community

2020-05-21 Thread GitBox
prashanthpdesai edited a comment on issue #661: URL: https://github.com/apache/incubator-hudi/issues/661#issuecomment-632263515 we are trying to use HUDI Deltastreamer to read from compacted Kafka topic in production environment and pull the messages incrementally and persist the data in

[GitHub] [incubator-hudi] garyli1019 commented on issue #661: Tracking ticket for reporting Hudi usages from the community

2020-05-21 Thread GitBox
garyli1019 commented on issue #661: URL: https://github.com/apache/incubator-hudi/issues/661#issuecomment-632312446 @vinothchandar Will do once I clear some internal process This is an automated message from the Apache Git

[GitHub] [incubator-hudi] garyli1019 commented on pull request #1602: [HUDI-494] fix incorrect record size estimation

2020-05-21 Thread GitBox
garyli1019 commented on pull request #1602: URL: https://github.com/apache/incubator-hudi/pull/1602#issuecomment-632241626 ![image](https://user-images.githubusercontent.com/23007841/82587156-85ee5e80-9b4d-11ea-839f-798633fdd1a4.png) Hi @vinothchandar , unfortunately this bug happened

[GitHub] [incubator-hudi] broussea1901 commented on issue #661: Tracking ticket for reporting Hudi usages from the community

2020-05-21 Thread GitBox
broussea1901 commented on issue #661: URL: https://github.com/apache/incubator-hudi/issues/661#issuecomment-632176944 We're currently deploying HUDI 0.5.0 in a EU Bank. Not in Prod yet. HUDI will be used to provide ACID ability for data ingestion batch and stream needing such feature

[GitHub] [incubator-hudi] codecov-commenter edited a comment on pull request #1644: [HUDI-811] Restructure test packages in hudi-common

2020-05-21 Thread GitBox
codecov-commenter edited a comment on pull request #1644: URL: https://github.com/apache/incubator-hudi/pull/1644#issuecomment-631673778 # [Codecov](https://codecov.io/gh/apache/incubator-hudi/pull/1644?src=pr=h1) Report > Merging

[GitHub] [incubator-hudi] prashanthpdesai edited a comment on issue #661: Tracking ticket for reporting Hudi usages from the community

2020-05-21 Thread GitBox
prashanthpdesai edited a comment on issue #661: URL: https://github.com/apache/incubator-hudi/issues/661#issuecomment-632263515 we are trying to use HUDI Deltastreamer to read from compacted Kafka topic in production environment and pull the messages incrementally and persist the data in

[GitHub] [incubator-hudi] xushiyan commented on a change in pull request #1644: [HUDI-811] Restructure test packages in hudi-common

2020-05-21 Thread GitBox
xushiyan commented on a change in pull request #1644: URL: https://github.com/apache/incubator-hudi/pull/1644#discussion_r428757663 ## File path: hudi-common/src/test/java/org/apache/hudi/common/util/collection/TestRocksDBDAO.java ## @@ -45,7 +45,7 @@ /** * Tests RocksDB

[GitHub] [incubator-hudi] lamber-ken commented on pull request #1651: [MINOR] add impala release and spark partition discovery

2020-05-21 Thread GitBox
lamber-ken commented on pull request #1651: URL: https://github.com/apache/incubator-hudi/pull/1651#issuecomment-632313692  @garyli1019 This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [incubator-hudi] UZi5136225 opened a new pull request #1652: [HUDI-918] HUDI small bug

2020-05-21 Thread GitBox
UZi5136225 opened a new pull request #1652: URL: https://github.com/apache/incubator-hudi/pull/1652 ## What is the purpose of the pull request sourceLimit should not be less than the number of kafka partitions ## Committer checklist - [ ] Has a corresponding JIRA

[jira] [Updated] (HUDI-918) deltastreamer bug is no new data

2020-05-21 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-918: Labels: pull-request-available (was: ) > deltastreamer bug is no new data >

[jira] [Updated] (HUDI-918) deltastreamer bug is no new data

2020-05-21 Thread liujinhui (Jira)
[ https://issues.apache.org/jira/browse/HUDI-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liujinhui updated HUDI-918: --- Summary: deltastreamer bug is no new data (was: Hudi can't get data) > deltastreamer bug is no new data >