[GitHub] [hudi] pengzhiwei2018 commented on a change in pull request #2497: [HUDI-1550] Incorrect query result for MOR table when merge base data…

2021-01-28 Thread GitBox
pengzhiwei2018 commented on a change in pull request #2497: URL: https://github.com/apache/hudi/pull/2497#discussion_r566617430 ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieMergeOnReadRDD.scala ## @@ -285,7 +289,14 @@ class HoodieMergeOnR

[GitHub] [hudi] jiangjiguang commented on a change in pull request #2505: [MINOR] Quickstart.generateUpdates method add check

2021-01-28 Thread GitBox
jiangjiguang commented on a change in pull request #2505: URL: https://github.com/apache/hudi/pull/2505#discussion_r566617008 ## File path: hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/TestQuickstartUtils.java ## @@ -0,0 +1,36 @@ +/* + * Licensed to the Apach

[GitHub] [hudi] vinothchandar commented on issue #2149: Help with Reading Kafka topic written using Debezium Connector - Deltastreamer

2021-01-28 Thread GitBox
vinothchandar commented on issue #2149: URL: https://github.com/apache/hudi/issues/2149#issuecomment-769607086 @toninis this is kind of weird, given the snippet that has the constructor. the class seems to be there in the build. do you have a branch where you have the code stashed? We c

[jira] [Updated] (HUDI-864) parquet schema conflict: optional binary (UTF8) is not a group

2021-01-28 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinoth Chandar updated HUDI-864: Labels: user-support-issues (was: ) > parquet schema conflict: optional binary (UTF8) is not a grou

[jira] [Commented] (HUDI-864) parquet schema conflict: optional binary (UTF8) is not a group

2021-01-28 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17274169#comment-17274169 ] Vinoth Chandar commented on HUDI-864: - [~shivnarayan] there are issues like these, that

[GitHub] [hudi] vinothchandar edited a comment on issue #2100: [SUPPORT] 0.6.0 - using keytab authentication gives issues

2021-01-28 Thread GitBox
vinothchandar edited a comment on issue #2100: URL: https://github.com/apache/hudi/issues/2100#issuecomment-769600129 > I feel that the independent timeline service may be helpful in identifying hudi tables in a cluster. @cdmikechen This is actually very interesting to me too. Can we

[GitHub] [hudi] jiangjiguang commented on a change in pull request #2505: [MINOR] Quickstart.generateUpdates method add check

2021-01-28 Thread GitBox
jiangjiguang commented on a change in pull request #2505: URL: https://github.com/apache/hudi/pull/2505#discussion_r566599679 ## File path: hudi-spark-datasource/hudi-spark/src/main/java/org/apache/hudi/QuickstartUtils.java ## @@ -176,14 +175,17 @@ public HoodieRecord generate

[GitHub] [hudi] vinothchandar commented on issue #2100: [SUPPORT] 0.6.0 - using keytab authentication gives issues

2021-01-28 Thread GitBox
vinothchandar commented on issue #2100: URL: https://github.com/apache/hudi/issues/2100#issuecomment-769600129 > I feel that the independent timeline service may be helpful in identifying hudi tables in a cluster. @cdmikechen This is actually very interesting to me too. Can we start a D

[GitHub] [hudi] codecov-io edited a comment on pull request #2502: [HUDI-1555] Remove isEmpty to improve clustering execution performance

2021-01-28 Thread GitBox
codecov-io edited a comment on pull request #2502: URL: https://github.com/apache/hudi/pull/2502#issuecomment-769063418 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2502?src=pr&el=h1) Report > Merging [#2502](https://codecov.io/gh/apache/hudi/pull/2502?src=pr&el=desc) (f903c85) in

[GitHub] [hudi] codecov-io edited a comment on pull request #2496: [HUDI-1554] Introduced buffering for streams in HUDI.

2021-01-28 Thread GitBox
codecov-io edited a comment on pull request #2496: URL: https://github.com/apache/hudi/pull/2496#issuecomment-768170324 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[GitHub] [hudi] codecov-io edited a comment on pull request #2502: [HUDI-1555] Remove isEmpty to improve clustering execution performance

2021-01-28 Thread GitBox
codecov-io edited a comment on pull request #2502: URL: https://github.com/apache/hudi/pull/2502#issuecomment-769063418 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2502?src=pr&el=h1) Report > Merging [#2502](https://codecov.io/gh/apache/hudi/pull/2502?src=pr&el=desc) (f903c85) in

[GitHub] [hudi] wangxianghu commented on pull request #2419: [HUDI-1421] Improvement of failure recovery for HoodieFlinkStreamer.

2021-01-28 Thread GitBox
wangxianghu commented on pull request #2419: URL: https://github.com/apache/hudi/pull/2419#issuecomment-769568185 @loukey-lj sorry for the delay, please fix the conficts first This is an automated message from the Apache Git

[GitHub] [hudi] wangxianghu commented on a change in pull request #2505: [MINOR] Quickstart.generateUpdates method add check

2021-01-28 Thread GitBox
wangxianghu commented on a change in pull request #2505: URL: https://github.com/apache/hudi/pull/2505#discussion_r566572544 ## File path: hudi-spark-datasource/hudi-spark/src/main/java/org/apache/hudi/QuickstartUtils.java ## @@ -176,14 +175,17 @@ public HoodieRecord generateU

[GitHub] [hudi] vinothchandar commented on issue #1829: [SUPPORT] S3 slow file listing causes Hudi read performance.

2021-01-28 Thread GitBox
vinothchandar commented on issue #1829: URL: https://github.com/apache/hudi/issues/1829#issuecomment-769566404 0.7.0 is out! This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [hudi] codecov-io edited a comment on pull request #2505: [MINOR] Quickstart.generateUpdates method add check

2021-01-28 Thread GitBox
codecov-io edited a comment on pull request #2505: URL: https://github.com/apache/hudi/pull/2505#issuecomment-769557231 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2505?src=pr&el=h1) Report > Merging [#2505](https://codecov.io/gh/apache/hudi/pull/2505?src=pr&el=desc) (a1ab5a8) in

[GitHub] [hudi] vinothchandar commented on a change in pull request #2494: [HUDI-1552] Improve performance of key lookups from base file in Metadata Table.

2021-01-28 Thread GitBox
vinothchandar commented on a change in pull request #2494: URL: https://github.com/apache/hudi/pull/2494#discussion_r566564182 ## File path: hudi-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadata.java ## @@ -188,41 +196,51 @@ private synchronized void ope

[GitHub] [hudi] codecov-io commented on pull request #2505: [MINOR] Quickstart.generateUpdates method add check

2021-01-28 Thread GitBox
codecov-io commented on pull request #2505: URL: https://github.com/apache/hudi/pull/2505#issuecomment-769557231 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2505?src=pr&el=h1) Report > Merging [#2505](https://codecov.io/gh/apache/hudi/pull/2505?src=pr&el=desc) (a1ab5a8) into [ma

[GitHub] [hudi] garyli1019 commented on a change in pull request #2497: [HUDI-1550] Incorrect query result for MOR table when merge base data…

2021-01-28 Thread GitBox
garyli1019 commented on a change in pull request #2497: URL: https://github.com/apache/hudi/pull/2497#discussion_r566552361 ## File path: hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieMergeOnReadRDD.scala ## @@ -285,7 +289,14 @@ class HoodieMergeOnReadR

[jira] [Commented] (HUDI-1523) Avoid excessive mkdir calls when creating new files

2021-01-28 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17274130#comment-17274130 ] Vinoth Chandar commented on HUDI-1523: -- I don't think we need a config here. if shoul

[jira] [Commented] (HUDI-1111) Highlight Hudi guarantees in documentation section of website

2021-01-28 Thread Vinoth Chandar (Jira)
[ https://issues.apache.org/jira/browse/HUDI-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17274127#comment-17274127 ] Vinoth Chandar commented on HUDI-: -- yes . please > Highlight Hudi guarantees in

[GitHub] [hudi] vinothchandar commented on pull request #1975: [HUDI-1194] Refactor HoodieHiveClient based on the way to call Hive API

2021-01-28 Thread GitBox
vinothchandar commented on pull request #1975: URL: https://github.com/apache/hudi/pull/1975#issuecomment-769525257 @lw309637554 could you review this as well? This is an automated message from the Apache Git Service. To res

[GitHub] [hudi] vinothchandar commented on pull request #2452: [HUDI-1531] Introduce HoodiePartitionCleaner to delete specific partition

2021-01-28 Thread GitBox
vinothchandar commented on pull request #2452: URL: https://github.com/apache/hudi/pull/2452#issuecomment-769518711 @n3nash can you also please review this? This is an automated message from the Apache Git Service. To respond

[GitHub] [hudi] vinothchandar commented on pull request #2431: [HUDI-1526] Translate the api partitionBy to hoodie.datasource.write.partitionpath.field

2021-01-28 Thread GitBox
vinothchandar commented on pull request #2431: URL: https://github.com/apache/hudi/pull/2431#issuecomment-769518155 @nsivabalan @zhedoubushishi also to review. This is an automated message from the Apache Git Service. To res

[hudi] branch master updated (bc0325f -> 23f2ef3)

2021-01-28 Thread vinoth
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git. from bc0325f [HUDI-1522] Add a new pipeline for Flink writer (#2430) add 23f2ef3 [HUDI-623] Remove UpgradePayloadFrom

[GitHub] [hudi] vinothchandar merged pull request #2455: [HUDI-623] Remove UpgradePayloadFromUberToApache

2021-01-28 Thread GitBox
vinothchandar merged pull request #2455: URL: https://github.com/apache/hudi/pull/2455 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [hudi] vinothchandar commented on pull request #2455: [HUDI-623] Remove UpgradePayloadFromUberToApache

2021-01-28 Thread GitBox
vinothchandar commented on pull request #2455: URL: https://github.com/apache/hudi/pull/2455#issuecomment-769517737 Its been a while. So okay to drop this now. This is an automated message from the Apache Git Service. To res

[GitHub] [hudi] vinothchandar commented on pull request #2475: [HUDI-1527] automatically infer the data directory, users only need to specify the table directory

2021-01-28 Thread GitBox
vinothchandar commented on pull request #2475: URL: https://github.com/apache/hudi/pull/2475#issuecomment-769517183 @zhedoubushishi @umehrot2 could you please take a first pass This is an automated message from the Apache Git

[GitHub] [hudi] vinothchandar commented on a change in pull request #2476: [HUDI-1538] Try to init class trying different signatures instead of checking its name

2021-01-28 Thread GitBox
vinothchandar commented on a change in pull request #2476: URL: https://github.com/apache/hudi/pull/2476#discussion_r566524170 ## File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java ## @@ -96,19 +94,21 @@ private static final Logger LOG = LogM

[GitHub] [hudi] vinothchandar commented on pull request #2497: [HUDI-1550] Incorrect query result for MOR table when merge base data…

2021-01-28 Thread GitBox
vinothchandar commented on pull request #2497: URL: https://github.com/apache/hudi/pull/2497#issuecomment-769513904 cc @nsivabalan to also triage This is an automated message from the Apache Git Service. To respond to the mes

[GitHub] [hudi] vinothchandar commented on pull request #2500: [HUDI-1496] Fixing detection of GCS FileSystem

2021-01-28 Thread GitBox
vinothchandar commented on pull request #2500: URL: https://github.com/apache/hudi/pull/2500#issuecomment-769513621 cc @vburenin could you please review this as well? This is an automated message from the Apache Git Service.

[jira] [Updated] (HUDI-57) [UMBRELLA] Support ORC Storage

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-57?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-57: Labels: pull-request-available (was: pull-request-available user-support-issues) > [UMBRELLA

[jira] [Updated] (HUDI-89) Clean up placement, naming, defaults of HoodieWriteConfig

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-89?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-89: Labels: (was: user-support-issues) > Clean up placement, naming, defaults of HoodieWriteConf

[jira] [Updated] (HUDI-274) Consolidate all scripts under top level scripts directory

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-274: - Labels: starter (was: starter user-support-issues) > Consolidate all scripts under top lev

[jira] [Updated] (HUDI-259) Hadoop 3 support for Hudi writing

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-259: - Status: Open (was: New) > Hadoop 3 support for Hudi writing >

[jira] [Updated] (HUDI-318) Update Migration Guide to Include Delta Streamer

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-318: - Labels: doc (was: doc user-support-issues) > Update Migration Guide to Include Delta Strea

[jira] [Updated] (HUDI-395) hudi does not support scheme s3n when wrtiing to S3

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-395: - Labels: (was: user-support-issues) > hudi does not support scheme s3n when wrtiing to S3

[jira] [Updated] (HUDI-849) Turn on incremental Syncing by default for DeltaStreamer and spark streaming cases

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-849: - Labels: (was: user-support-issues) > Turn on incremental Syncing by default for DeltaStre

[jira] [Updated] (HUDI-984) Support Hive 1.x out of box

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-984: - Labels: (was: user-support-issues) > Support Hive 1.x out of box > --

[jira] [Updated] (HUDI-893) Add spark datasource V2 reader support for Hudi tables

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-893: - Labels: (was: user-support-issues) > Add spark datasource V2 reader support for Hudi tabl

[jira] [Updated] (HUDI-1088) hive version 1.1.0 integrated with hudi,select * from hudi_table error in HUE

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1088: -- Labels: (was: user-support-issues) > hive version 1.1.0 integrated with hudi,select *

[jira] [Updated] (HUDI-1127) Handling late arriving Deletes

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1127: -- Labels: (was: user-support-issues) > Handling late arriving Deletes >

[jira] [Updated] (HUDI-1269) Make whether the failure of connect hive affects hudi ingest process configurable

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1269: -- Labels: pull-request-available user-support-issues (was: pull-request-available) > Mak

[jira] [Updated] (HUDI-1269) Make whether the failure of connect hive affects hudi ingest process configurable

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1269: -- Labels: pull-request-available (was: pull-request-available user-support-issues) > Mak

[jira] [Updated] (HUDI-1271) Add utility scripts to perform Restores

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1271: -- Labels: (was: user-support-issues) > Add utility scripts to perform Restores > ---

[jira] [Updated] (HUDI-1272) Add utility scripts to manage Savepoints

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1272: -- Labels: (was: user-support-issues) > Add utility scripts to manage Savepoints > --

[jira] [Updated] (HUDI-1292) [Umbrella] RFC-15 : File Listing and Query Planning Optimizations

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1292: -- Labels: pull-request-available (was: pull-request-available user-support-issues) > [Um

[jira] [Updated] (HUDI-1341) hudi cli command such as rollback 、bootstrap support spark sql implement

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1341: -- Labels: (was: user-support-issues) > hudi cli command such as rollback 、bootstrap supp

[jira] [Updated] (HUDI-1280) Add tool to capture earliest or latest offsets in kafka topics

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1280: -- Labels: (was: user-support-issues) > Add tool to capture earliest or latest offsets in

[jira] [Updated] (HUDI-1296) Implement Spark DataSource using range metadata for file/partition pruning

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1296: -- Labels: (was: user-support-issues) > Implement Spark DataSource using range metadata f

[jira] [Updated] (HUDI-1342) hudi-dla-sync support modify table properties

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1342: -- Labels: (was: user-support-issues) > hudi-dla-sync support modify table properties > -

[jira] [Updated] (HUDI-1371) Implement Spark datasource by fetching file listing from metadata table

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1371: -- Labels: (was: user-support-issues) > Implement Spark datasource by fetching file listi

[jira] [Updated] (HUDI-1355) Allowing multipleSourceOrdering fields for doing the preCombine on payload

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1355: -- Labels: patch starter (was: patch starter user-support-issues) > Allowing multipleSourc

[jira] [Updated] (HUDI-1413) Need binary release of Hudi to distribute tools like hudi-cli.sh and hudi-sync

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1413: -- Labels: (was: user-support-issues) > Need binary release of Hudi to distribute tools l

[jira] [Updated] (HUDI-55) Investigate support for bucketed tables ala Hive #74

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-55?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-55: Labels: (was: user-support-issues) > Investigate support for bucketed tables ala Hive #74 >

[jira] [Updated] (HUDI-74) Improve compaction support in HoodieDeltaStreamer & CLI

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-74?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-74: Labels: (was: user-support-issues) > Improve compaction support in HoodieDeltaStreamer & CLI

[jira] [Updated] (HUDI-151) Fix Realtime queries on Hive on Spark engine

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-151: - Status: In Progress (was: Open) > Fix Realtime queries on Hive on Spark engine > -

[jira] [Resolved] (HUDI-151) Fix Realtime queries on Hive on Spark engine

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan resolved HUDI-151. -- Fix Version/s: 0.5.2 Resolution: Fixed [~nishith29]: please reopen if the issue st

[jira] [Updated] (HUDI-280) Integrate Hudi to bigtop

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-280: - Labels: (was: user-support-issues) > Integrate Hudi to bigtop >

[jira] [Updated] (HUDI-310) DynamoDB/Kinesis Change Capture using Delta Streamer

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-310: - Labels: (was: user-support-issues) > DynamoDB/Kinesis Change Capture using Delta Streamer

[jira] [Commented] (HUDI-396) Provide an documentation to describe how to use test suite

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17274075#comment-17274075 ] sivabalan narayanan commented on HUDI-396: -- [~yanghua]: we already have a readme.

[jira] [Updated] (HUDI-360) Add github stale action workflow for issue management

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-360: - Labels: (was: user-support-issues) > Add github stale action workflow for issue managemen

[jira] [Updated] (HUDI-619) Investigate and implement mechanism to have hive/presto/sparksql queries avoid stitching and return null values for hoodie columns

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-619: - Labels: (was: user-support-issues) > Investigate and implement mechanism to have hive/pre

[jira] [Updated] (HUDI-648) Implement error log/table for Datasource/DeltaStreamer/WriteClient/Compaction writes

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-648: - Labels: (was: user-support-issues) > Implement error log/table for Datasource/DeltaStream

[jira] [Updated] (HUDI-767) Support transformation when export to Hudi

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-767: - Labels: (was: user-support-issues) > Support transformation when export to Hudi > ---

[jira] [Resolved] (HUDI-824) Register hudi-spark package with spark packages repo for easier usage of Hudi

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan resolved HUDI-824. -- Fix Version/s: 0.5.2 Resolution: Fixed > Register hudi-spark package with spark pa

[jira] [Updated] (HUDI-824) Register hudi-spark package with spark packages repo for easier usage of Hudi

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-824: - Status: In Progress (was: Open) > Register hudi-spark package with spark packages repo for

[jira] [Updated] (HUDI-829) Efficiently reading hudi tables through spark-shell

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-829: - Labels: (was: user-support-issues) > Efficiently reading hudi tables through spark-shell

[jira] [Updated] (HUDI-829) Efficiently reading hudi tables through spark-shell

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-829: - Labels: user-support-issues (was: ) > Efficiently reading hudi tables through spark-shell

[jira] [Updated] (HUDI-865) Improve Hive Syncing by directly translating avro schema to Hive types

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-865: - Labels: pull-request-available starter (was: pull-request-available starter user-support-i

[jira] [Updated] (HUDI-873) kafka connector support hudi sink

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-873: - Labels: (was: user-support-issues) > kafka connector support hudi sink > ---

[jira] [Updated] (HUDI-914) support different target data clusters

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-914: - Labels: (was: user-support-issues) > support different target data clusters > ---

[jira] [Commented] (HUDI-1024) Document S3 related guide and tips

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17274071#comment-17274071 ] sivabalan narayanan commented on HUDI-1024: --- [~uditme]: Can you ask one of aws f

[jira] [Updated] (HUDI-1020) Making timeline server as an external long running service and extending it to be able to plugin business metadata

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1020: -- Labels: (was: user-support-issues) > Making timeline server as an external long runnin

[GitHub] [hudi] nsivabalan commented on issue #2367: [SUPPORT] Seek error when querying MOR Tables in GCP

2021-01-28 Thread GitBox
nsivabalan commented on issue #2367: URL: https://github.com/apache/hudi/issues/2367#issuecomment-769488067 @stackfun : have you encountered the issue reported here: https://issues.apache.org/jira/browse/HUDI-1063 if not, would you mind responding to it if you know the fix/workaround. t

[jira] [Updated] (HUDI-1066) Provide way to provide all versions of a given set of records in incremental/snapshot queries

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1066: -- Labels: (was: user-support-issues) > Provide way to provide all versions of a given se

[jira] [Commented] (HUDI-1081) Document AWS Hudi integration

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17274069#comment-17274069 ] sivabalan narayanan commented on HUDI-1081: --- [~uditme]: Do you think you can tak

[jira] [Commented] (HUDI-1111) Highlight Hudi guarantees in documentation section of website

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17274068#comment-17274068 ] sivabalan narayanan commented on HUDI-: --- [~vinoth]: Can I take a stab at thi

[jira] [Updated] (HUDI-1114) Explore Spark Structure Streaming for Hudi Dataset

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1114: -- Labels: (was: user-support-issues) > Explore Spark Structure Streaming for Hudi Datase

[jira] [Updated] (HUDI-1116) Support time travel using timestamp type

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1116: -- Labels: (was: user-support-issues) > Support time travel using timestamp type > --

[jira] [Resolved] (HUDI-1195) Ensure uniformity in schema usage in HoodieAvroUtils.avroToBytes and HoodieAvroUtils.bytesToAvro

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan resolved HUDI-1195. --- Resolution: Duplicate https://issues.apache.org/jira/browse/HUDI-1128 and https://iss

[jira] [Updated] (HUDI-1195) Ensure uniformity in schema usage in HoodieAvroUtils.avroToBytes and HoodieAvroUtils.bytesToAvro

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1195: -- Status: In Progress (was: Open) > Ensure uniformity in schema usage in HoodieAvroUtils.

[jira] [Updated] (HUDI-1195) Ensure uniformity in schema usage in HoodieAvroUtils.avroToBytes and HoodieAvroUtils.bytesToAvro

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1195: -- Status: Open (was: New) > Ensure uniformity in schema usage in HoodieAvroUtils.avroToBy

[jira] [Updated] (HUDI-1201) HoodieDeltaStreamer: Allow user overrides to read from earliest kafka offset when commit files do not have checkpoint

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1201: -- Labels: (was: user-support-issues) > HoodieDeltaStreamer: Allow user overrides to read

[jira] [Updated] (HUDI-1212) GDPR: Support deletions of records on all versions of Hudi dataset

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1212: -- Labels: (was: user-support-issues) > GDPR: Support deletions of records on all versio

[GitHub] [hudi] codecov-io edited a comment on pull request #2485: [HUDI-1109] Support Spark Structured Streaming read from Hudi table

2021-01-28 Thread GitBox
codecov-io edited a comment on pull request #2485: URL: https://github.com/apache/hudi/pull/2485#issuecomment-766519181 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2485?src=pr&el=h1) Report > Merging [#2485](https://codecov.io/gh/apache/hudi/pull/2485?src=pr&el=desc) (8d2ff66) in

[jira] [Updated] (HUDI-1267) Additional Metadata Details for Hudi Transactions

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1267: -- Labels: features (was: features user-support-issues) > Additional Metadata Details for

[jira] [Comment Edited] (HUDI-1278) Need a generic payload class which can skip late arriving data based on specific fields

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17270965#comment-17270965 ] sivabalan narayanan edited comment on HUDI-1278 at 1/28/21, 11:59 PM: --

[jira] [Updated] (HUDI-1297) [Umbrella] Revamp Spark Datasource support using Spark 3 APIs

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1297: -- Labels: (was: user-support-issues) > [Umbrella] Revamp Spark Datasource support using

[jira] [Issue Comment Deleted] (HUDI-1297) [Umbrella] Revamp Spark Datasource support using Spark 3 APIs

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1297: -- Comment: was deleted (was: oops. noticed this is an umbrella ticket. ) > [Umbrella] Rev

[jira] [Updated] (HUDI-1290) Implement Debezium avro source for Delta Streamer

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1290: -- Labels: (was: user-support-issues) > Implement Debezium avro source for Delta Streamer

[jira] [Issue Comment Deleted] (HUDI-1297) [Umbrella] Revamp Spark Datasource support using Spark 3 APIs

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1297: -- Comment: was deleted (was: [~vinoth]: are we good to close this ticket? ) > [Umbrella]

[jira] [Updated] (HUDI-1362) Make deltastreamer support full overwrite

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1362: -- Labels: (was: user-support-issues) > Make deltastreamer support full overwrite > -

[jira] [Resolved] (HUDI-1546) Fix hive sync tool path in website documentation

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan resolved HUDI-1546. --- Resolution: Duplicate https://issues.apache.org/jira/browse/HUDI-1379 > Fix hive sync

[jira] [Updated] (HUDI-1546) Fix hive sync tool path in website documentation

2021-01-28 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-1546: -- Status: In Progress (was: Open) > Fix hive sync tool path in website documentation > --

[GitHub] [hudi] codecov-io edited a comment on pull request #2485: [HUDI-1109] Support Spark Structured Streaming read from Hudi table

2021-01-28 Thread GitBox
codecov-io edited a comment on pull request #2485: URL: https://github.com/apache/hudi/pull/2485#issuecomment-766519181 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2485?src=pr&el=h1) Report > Merging [#2485](https://codecov.io/gh/apache/hudi/pull/2485?src=pr&el=desc) (8d2ff66) in

[GitHub] [hudi] nsivabalan commented on issue #2323: [SUPPORT] GLOBAL_BLOOM index significantly slowing down processing time

2021-01-28 Thread GitBox
nsivabalan commented on issue #2323: URL: https://github.com/apache/hudi/issues/2323#issuecomment-769433405 got it. Hudi is looking to add record level indexing in next release, and global lookup should become lot faster with that. Hopefully it helps you. Can we close this ticket if you do

[GitHub] [hudi] nsivabalan commented on issue #2498: [SUPPORT] Hudi MERGE_ON_READ load to dataframe fails for the versions [0.6.0],[0.7.0] and runs for [0.5.3]

2021-01-28 Thread GitBox
nsivabalan commented on issue #2498: URL: https://github.com/apache/hudi/issues/2498#issuecomment-769423274 Can you try this config "hoodie.datasource.write.table.type" and set it to MERGE_ON_READ This is an automated mes

[GitHub] [hudi] codecov-io commented on pull request #2506: [HUDI-1557] Make Flink write pipeline write task scalable

2021-01-28 Thread GitBox
codecov-io commented on pull request #2506: URL: https://github.com/apache/hudi/pull/2506#issuecomment-769390677 # [Codecov](https://codecov.io/gh/apache/hudi/pull/2506?src=pr&el=h1) Report > Merging [#2506](https://codecov.io/gh/apache/hudi/pull/2506?src=pr&el=desc) (ed3f2f8) into [ma

[GitHub] [hudi] satishkotha commented on a change in pull request #2502: [HUDI-1555] Remove isEmpty to improve clustering execution performance

2021-01-28 Thread GitBox
satishkotha commented on a change in pull request #2502: URL: https://github.com/apache/hudi/pull/2502#discussion_r566378842 ## File path: hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/functional/TestStructuredStreaming.scala ## @@ -243,17 +243,24 @@ class Te

[GitHub] [hudi] prashantwason commented on a change in pull request #2496: [HUDI-1554] Introduced buffering for streams in HUDI.

2021-01-28 Thread GitBox
prashantwason commented on a change in pull request #2496: URL: https://github.com/apache/hudi/pull/2496#discussion_r566356415 ## File path: hudi-common/src/main/java/org/apache/hudi/common/fs/FSUtils.java ## @@ -415,17 +420,18 @@ public static boolean isLogFile(Path logPath) {

  1   2   >