[GitHub] [hudi] rshanmugam1 commented on issue #2609: [SUPPORT] Presto hudi query slow when compared to parquet

2021-02-27 Thread GitBox
rshanmugam1 commented on issue #2609: URL: https://github.com/apache/hudi/issues/2609#issuecomment-787403720 @lw309637554 thanks for your response. **_1. about first attempt parquet is 23 secs, but hudi is 40 secs. i see metadata init cost some time in the log._** yes, 2 major

[GitHub] [hudi] codecov-io edited a comment on pull request #2608: [HUDI-1478] Introduce HoodieBloomIndex to hudi-java-client

2021-02-27 Thread GitBox
codecov-io edited a comment on pull request #2608: URL: https://github.com/apache/hudi/pull/2608#issuecomment-787040525 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2608?src=pr=h1) Report > Merging [#2608](https://codecov.io/gh/apache/hudi/pull/2608?src=pr=desc) (aa215e2) into

[GitHub] [hudi] codecov-io edited a comment on pull request #2608: [HUDI-1478] Introduce HoodieBloomIndex to hudi-java-client

2021-02-27 Thread GitBox
codecov-io edited a comment on pull request #2608: URL: https://github.com/apache/hudi/pull/2608#issuecomment-787040525 This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [hudi] codecov-io edited a comment on pull request #2608: [HUDI-1478] Introduce HoodieBloomIndex to hudi-java-client

2021-02-27 Thread GitBox
codecov-io edited a comment on pull request #2608: URL: https://github.com/apache/hudi/pull/2608#issuecomment-787040525 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2608?src=pr=h1) Report > Merging [#2608](https://codecov.io/gh/apache/hudi/pull/2608?src=pr=desc) (aa215e2) into

[jira] [Resolved] (HUDI-1347) Hbase index partition changes cause data duplication problems

2021-02-27 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan resolved HUDI-1347. --- Fix Version/s: 0.8.0 Resolution: Fixed > Hbase index partition changes cause

[jira] [Updated] (HUDI-1347) Hbase index partition changes cause data duplication problems

2021-02-27 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1347: -- Status: In Progress (was: Open) > Hbase index partition changes cause data duplication

[jira] [Updated] (HUDI-1347) Hbase index partition changes cause data duplication problems

2021-02-27 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1347: -- Status: Open (was: New) > Hbase index partition changes cause data duplication

[jira] [Commented] (HUDI-1539) Bug in HoodieCombineRealtimeRecordReader returns wrong results

2021-02-27 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17292291#comment-17292291 ] sivabalan narayanan commented on HUDI-1539: --- [~satishkotha]: can you close this Jira if merged

[jira] [Assigned] (HUDI-1539) Bug in HoodieCombineRealtimeRecordReader returns wrong results

2021-02-27 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan reassigned HUDI-1539: - Assignee: satish > Bug in HoodieCombineRealtimeRecordReader returns wrong

[jira] [Commented] (HUDI-651) Incremental Query on Hive via Spark SQL does not return expected results

2021-02-27 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17292288#comment-17292288 ] sivabalan narayanan commented on HUDI-651: -- [~bhavanisudha]: incremental queries on MOR is already

[GitHub] [hudi] vburenin commented on a change in pull request #2598: [WIP] Added custom kafka meta fields and custom kafka avro decoder.

2021-02-27 Thread GitBox
vburenin commented on a change in pull request #2598: URL: https://github.com/apache/hudi/pull/2598#discussion_r584225622 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/decoders/KafkaAvroSchemaDeserializer.java ## @@ -0,0 +1,175 @@ +/* + * Licensed to

[GitHub] [hudi] lw309637554 edited a comment on issue #2609: [SUPPORT] Presto hudi query slow when compared to parquet

2021-02-27 Thread GitBox
lw309637554 edited a comment on issue #2609: URL: https://github.com/apache/hudi/issues/2609#issuecomment-787225505 @rshanmugam1 1、about first attempt parquet is 23 secs, but hudi is 40 secs. i see metadata init cost some time in the log. 2、about second attempt parquet is very

[GitHub] [hudi] lw309637554 commented on issue #2609: [SUPPORT] Presto hudi query slow when compared to parquet

2021-02-27 Thread GitBox
lw309637554 commented on issue #2609: URL: https://github.com/apache/hudi/issues/2609#issuecomment-787225505 1、about first attempt parquet is 23 secs, but hudi is 40 secs. i see metadata init cost some time in the log. 2、about second attempt parquet is very fast,maybe presto support

[GitHub] [hudi] nsivabalan commented on a change in pull request #2598: [WIP] Added custom kafka meta fields and custom kafka avro decoder.

2021-02-27 Thread GitBox
nsivabalan commented on a change in pull request #2598: URL: https://github.com/apache/hudi/pull/2598#discussion_r584213651 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/decoders/KafkaAvroSchemaDeserializer.java ## @@ -0,0 +1,175 @@ +/* + * Licensed to

[GitHub] [hudi] nsivabalan commented on a change in pull request #2598: [WIP] Added custom kafka meta fields and custom kafka avro decoder.

2021-02-27 Thread GitBox
nsivabalan commented on a change in pull request #2598: URL: https://github.com/apache/hudi/pull/2598#discussion_r584212538 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/schema/SchemaRegistryProvider.java ## @@ -58,30 +66,67 @@ private static String

[GitHub] [hudi] nsivabalan commented on issue #2563: [Feature Request] Full Schema Evolution

2021-02-27 Thread GitBox
nsivabalan commented on issue #2563: URL: https://github.com/apache/hudi/issues/2563#issuecomment-787195135 Adding a new column should work w/ hudi. https://gist.github.com/nsivabalan/dd604527bd5ad62a08272a34425f5fad Can you revisit if you are creating the new column w/ null value

[jira] [Commented] (HUDI-1640) Implement Spark Datasource option to read hudi configs from properties file

2021-02-27 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17292244#comment-17292244 ] sivabalan narayanan commented on HUDI-1640: --- yeah. def this seems helpful.  Let me summarize

[GitHub] [hudi] rshanmugam1 opened a new issue #2609: [SUPPORT] Presto hudi query slow when compared to parquet

2021-02-27 Thread GitBox
rshanmugam1 opened a new issue #2609: URL: https://github.com/apache/hudi/issues/2609 **Describe the problem you faced** Presto query performance with hudi table takes ~2x extra time when compared to parquet for simple query . data stored in s3. hudi metadata store enabled.

[GitHub] [hudi] codecov-io edited a comment on pull request #2608: [HUDI-1478] Introduce HoodieBloomIndex to hudi-java-client

2021-02-27 Thread GitBox
codecov-io edited a comment on pull request #2608: URL: https://github.com/apache/hudi/pull/2608#issuecomment-787040525 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2608?src=pr=h1) Report > Merging [#2608](https://codecov.io/gh/apache/hudi/pull/2608?src=pr=desc) (10a8177) into

[GitHub] [hudi] yanghua commented on a change in pull request #2596: [HUDI-1636] Support Builder Pattern To Build Table Properties For Hoo…

2021-02-27 Thread GitBox
yanghua commented on a change in pull request #2596: URL: https://github.com/apache/hudi/pull/2596#discussion_r584110778 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java ## @@ -258,4 +260,142 @@ public String getArchivelogFolder() {

[GitHub] [hudi] codecov-io commented on pull request #2608: [HUDI-1478] Introduce HoodieBloomIndex to hudi-java-client

2021-02-27 Thread GitBox
codecov-io commented on pull request #2608: URL: https://github.com/apache/hudi/pull/2608#issuecomment-787040525 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2608?src=pr=h1) Report > Merging [#2608](https://codecov.io/gh/apache/hudi/pull/2608?src=pr=desc) (302400e) into

[GitHub] [hudi] prashantwason commented on a change in pull request #2595: [HUDI-1634] Re-bootstrap metadata table when un-synced instants have been archived.

2021-02-27 Thread GitBox
prashantwason commented on a change in pull request #2595: URL: https://github.com/apache/hudi/pull/2595#discussion_r584093038 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java ## @@ -235,6 +235,29 @@

[GitHub] [hudi] n3nash commented on a change in pull request #2595: [HUDI-1634] Re-bootstrap metadata table when un-synced instants have been archived.

2021-02-27 Thread GitBox
n3nash commented on a change in pull request #2595: URL: https://github.com/apache/hudi/pull/2595#discussion_r584092113 ## File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java ## @@ -235,6 +235,29 @@ protected

[jira] [Updated] (HUDI-1478) Introduce HoodieBloomIndex to hudi-java-client

2021-02-27 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-1478: - Labels: pull-request-available (was: ) > Introduce HoodieBloomIndex to hudi-java-client >

[GitHub] [hudi] shenh062326 opened a new pull request #2608: [HUDI-1478] Introduce HoodieBloomIndex to hudi-java-client

2021-02-27 Thread GitBox
shenh062326 opened a new pull request #2608: URL: https://github.com/apache/hudi/pull/2608 ## *Tips* - *Thank you very much for contributing to Apache Hudi.* - *Please review https://hudi.apache.org/contributing.html before opening a pull request.* ## What is the purpose of