[GitHub] [hudi] hudi-bot commented on pull request #7664: [HUDI-5551] support seconds unit on event_time

2023-01-17 Thread GitBox
hudi-bot commented on PR #7664: URL: https://github.com/apache/hudi/pull/7664#issuecomment-1385019069 ## CI report: * 2f4ee14477c6868151f3d14eb1f3535d3eafb11d Azure:

[GitHub] [hudi] hudi-bot commented on pull request #7680: [HUDI-5548] spark sql update hudi's table properties

2023-01-17 Thread GitBox
hudi-bot commented on PR #7680: URL: https://github.com/apache/hudi/pull/7680#issuecomment-1385019241 ## CI report: * 7f5f3ef01829ff5ffb79543d2281bfc08e575c3e Azure:

[GitHub] [hudi] weimingdiit opened a new pull request, #7684: [HUDI-5567] Modified to make bootstrapping exception message clearer

2023-01-17 Thread GitBox
weimingdiit opened a new pull request, #7684: URL: https://github.com/apache/hudi/pull/7684 ### Change Logs Exception message maybe can clearer when determine schema from the data files in bootstrap. ### Impact nothing ### Risk level (write none, low medium or

[jira] [Updated] (HUDI-5567) Modified to make bootstrapping exception message clearer

2023-01-17 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-5567: - Labels: pull-request-available (was: ) > Modified to make bootstrapping exception message

[GitHub] [hudi] hudi-bot commented on pull request #7680: [HUDI-5548] spark sql update hudi's table properties

2023-01-17 Thread GitBox
hudi-bot commented on PR #7680: URL: https://github.com/apache/hudi/pull/7680#issuecomment-1385026730 ## CI report: * 7f5f3ef01829ff5ffb79543d2281bfc08e575c3e Azure:

[GitHub] [hudi] hudi-bot commented on pull request #7684: [HUDI-5567] Modified to make bootstrapping exception message clearer

2023-01-17 Thread GitBox
hudi-bot commented on PR #7684: URL: https://github.com/apache/hudi/pull/7684#issuecomment-1385026811 ## CI report: * 31fe16b17e99594573abc1ad273ee2d007c56bc9 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[GitHub] [hudi] hudi-bot commented on pull request #7664: [HUDI-5551] support seconds unit on event_time

2023-01-17 Thread GitBox
hudi-bot commented on PR #7664: URL: https://github.com/apache/hudi/pull/7664#issuecomment-1385026602 ## CI report: * 7fa0b38ff13bce16a12b35a9f009b414854c9fe6 Azure:

[jira] [Updated] (HUDI-5246) Improve validation for partition path

2023-01-17 Thread Hemanth Gowda (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hemanth Gowda updated HUDI-5246: Status: Open (was: In Progress) > Improve validation for partition path >

[jira] [Updated] (HUDI-5568) incorrect use of fileSystemView

2023-01-17 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-5568: - Labels: pull-request-available (was: ) > incorrect use of fileSystemView >

[GitHub] [hudi] hudi-bot commented on pull request #7684: [HUDI-5567] Modified to make bootstrapping exception message clearer

2023-01-17 Thread GitBox
hudi-bot commented on PR #7684: URL: https://github.com/apache/hudi/pull/7684#issuecomment-1385115523 ## CI report: * 31fe16b17e99594573abc1ad273ee2d007c56bc9 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #7619: [MINOR] Optimizing schema validation in Metadata table

2023-01-17 Thread GitBox
hudi-bot commented on PR #7619: URL: https://github.com/apache/hudi/pull/7619#issuecomment-1385115006 ## CI report: * dd59c7370a986b881a4f8e980915484f0c9021c3 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #7679: [HUDI-5563] Check table exist before drop table

2023-01-17 Thread GitBox
hudi-bot commented on PR #7679: URL: https://github.com/apache/hudi/pull/7679#issuecomment-1385127333 ## CI report: * e4aabbcc465e71d9184ad1ecb3a53690e98fc291 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #7685: [HUDI-5568]

2023-01-17 Thread GitBox
hudi-bot commented on PR #7685: URL: https://github.com/apache/hudi/pull/7685#issuecomment-1385127424 ## CI report: * 5b6f0d1e629ec97859bf54f673597ee9c19399f1 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[jira] [Updated] (HUDI-5565) Application restart may cause data lose when task parallelism is changed

2023-01-17 Thread lei w (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lei w updated HUDI-5565: Description: [HUDI-2084|https://github.com/apache/hudi/pull/3168] Resend the uncommitted write metadata when start

[GitHub] [hudi] loukey-lj commented on pull request #7685: [HUDI-5568] incorrect use of fileSystemView

2023-01-17 Thread GitBox
loukey-lj commented on PR #7685: URL: https://github.com/apache/hudi/pull/7685#issuecomment-1385197027 hi @danny0405 could you please take a look -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [hudi] hudi-bot commented on pull request #7677: [HUDI-5559] Support CDC for flink bounded source

2023-01-17 Thread GitBox
hudi-bot commented on PR #7677: URL: https://github.com/apache/hudi/pull/7677#issuecomment-1385274511 ## CI report: * c81f60f80a945dd2377e2fff4bc6207cc63ef576 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #7582: [HUDI-5488] Make sure Disrupt queue start first, then insert records

2023-01-17 Thread GitBox
hudi-bot commented on PR #7582: URL: https://github.com/apache/hudi/pull/7582#issuecomment-1385273943 ## CI report: * a94ec9cf09ce55b684fa059ce1ede73bead0e991 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #7669: [HUDI-5553] Prevent partition(s) from being dropped if there are pending…

2023-01-17 Thread GitBox
hudi-bot commented on PR #7669: URL: https://github.com/apache/hudi/pull/7669#issuecomment-1385274376 ## CI report: * dae2ca6c5ab37f7865789823dae7ec3033c7b452 Azure:

[GitHub] [hudi] zhuanshenbsj1 commented on pull request #7159: [HUDI-5173]Skip if there is only one file in clusteringGroup

2023-01-17 Thread GitBox
zhuanshenbsj1 commented on PR #7159: URL: https://github.com/apache/hudi/pull/7159#issuecomment-1385053453 > # Issue > Issue at hand: Clustering will be performed for inputGroups with only 1 fileSlice, which may cause unnecessary file re-writes and write amplifications should there be

[GitHub] [hudi] loukey-lj opened a new pull request, #7685: [HUDI 5568]

2023-01-17 Thread GitBox
loukey-lj opened a new pull request, #7685: URL: https://github.com/apache/hudi/pull/7685 ### Change Logs writeClient.getHoodieTable().getFileSystemView() always return the local fileSystemView, should use writeClient. getHoodieTable(). getHoodieView() to determine the

[GitHub] [hudi] TengHuo commented on a diff in pull request #7626: [HUDI-5516] Reduce memory footprint on workload with thousand active partitions

2023-01-17 Thread GitBox
TengHuo commented on code in PR #7626: URL: https://github.com/apache/hudi/pull/7626#discussion_r1072001434 ## hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/client/BucketHandles.java: ## @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [hudi] boneanxs commented on pull request #7582: [HUDI-5488] Make sure Disrupt queue start first, then insert records

2023-01-17 Thread GitBox
boneanxs commented on PR #7582: URL: https://github.com/apache/hudi/pull/7582#issuecomment-1385170676 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [hudi] hudi-bot commented on pull request #5926: [HUDI-3475] Initialize hudi table management module

2023-01-17 Thread GitBox
hudi-bot commented on PR #5926: URL: https://github.com/apache/hudi/pull/5926#issuecomment-1385280793 ## CI report: * 9c6308712dc95b2062fd0dfe64163e723aa46561 Azure:

[GitHub] [hudi] zhuanshenbsj1 commented on pull request #7159: [HUDI-5173]Skip if there is only one file in clusteringGroup

2023-01-17 Thread GitBox
zhuanshenbsj1 commented on PR #7159: URL: https://github.com/apache/hudi/pull/7159#issuecomment-1385054159 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [hudi] hudi-bot commented on pull request #7159: [HUDI-5173]Skip if there is only one file in clusteringGroup

2023-01-17 Thread GitBox
hudi-bot commented on PR #7159: URL: https://github.com/apache/hudi/pull/7159#issuecomment-1385125561 ## CI report: * 15ecd91180d32c7fa1905c11408f4bc23347e682 UNKNOWN * 2fe0d6a4dd0fe655a6c0b7f9c7bd3889e91a84f2 Azure:

[GitHub] [hudi] TengHuo commented on a diff in pull request #7626: [HUDI-5516] Reduce memory footprint on workload with thousand active partitions

2023-01-17 Thread GitBox
TengHuo commented on code in PR #7626: URL: https://github.com/apache/hudi/pull/7626#discussion_r1072031352 ## hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/client/BucketHandles.java: ## @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [hudi] BalaMahesh opened a new pull request, #7687: Update to handle deletes in postgres debezium

2023-01-17 Thread GitBox
BalaMahesh opened a new pull request, #7687: URL: https://github.com/apache/hudi/pull/7687 ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing feature change or any

[GitHub] [hudi] BalaMahesh commented on issue #7595: [SUPPORT] Hudi Clean and Delta commits taking ~50 mins to finish frequently

2023-01-17 Thread GitBox
BalaMahesh commented on issue #7595: URL: https://github.com/apache/hudi/issues/7595#issuecomment-1385272668 > I guess we run into some performance issue when using BloomFilter index for mor table with metadata table disabled, thanks for the feedback, let me record this issue first for

[GitHub] [hudi] hudi-bot commented on pull request #5926: [HUDI-3475] Initialize hudi table management module

2023-01-17 Thread GitBox
hudi-bot commented on PR #5926: URL: https://github.com/apache/hudi/pull/5926#issuecomment-1385289920 ## CI report: * 9c6308712dc95b2062fd0dfe64163e723aa46561 Azure:

[GitHub] [hudi] hangc0276 opened a new issue, #7686: [SUPPORT] Is there any way to delete records by specify one field value without selecting all the records out

2023-01-17 Thread GitBox
hangc0276 opened a new issue, #7686: URL: https://github.com/apache/hudi/issues/7686 **_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? - Join the mailing list to engage in conversations and get faster support at

[GitHub] [hudi] hudi-bot commented on pull request #7685: [HUDI-5568] incorrect use of fileSystemView

2023-01-17 Thread GitBox
hudi-bot commented on PR #7685: URL: https://github.com/apache/hudi/pull/7685#issuecomment-1385139873 ## CI report: * 5b6f0d1e629ec97859bf54f673597ee9c19399f1 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #7679: [HUDI-5563] Check table exist before drop table

2023-01-17 Thread GitBox
hudi-bot commented on PR #7679: URL: https://github.com/apache/hudi/pull/7679#issuecomment-1385139725 ## CI report: * e4aabbcc465e71d9184ad1ecb3a53690e98fc291 Azure:

[GitHub] [hudi] trushev commented on a diff in pull request #7626: [HUDI-5516] Reduce memory footprint on workload with thousand active partitions

2023-01-17 Thread GitBox
trushev commented on code in PR #7626: URL: https://github.com/apache/hudi/pull/7626#discussion_r1072027683 ## hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/client/BucketHandles.java: ## @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [hudi] hudi-bot commented on pull request #5926: [HUDI-3475] Initialize hudi table management module

2023-01-17 Thread GitBox
hudi-bot commented on PR #5926: URL: https://github.com/apache/hudi/pull/5926#issuecomment-1385270838 ## CI report: * 9c6308712dc95b2062fd0dfe64163e723aa46561 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #7687: Update to handle deletes in postgres debezium

2023-01-17 Thread GitBox
hudi-bot commented on PR #7687: URL: https://github.com/apache/hudi/pull/7687#issuecomment-1385284533 ## CI report: * 78d341045ff40465c1d44f377b42e5d91f7c5fc7 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[GitHub] [hudi] hudi-bot commented on pull request #7687: Update to handle deletes in postgres debezium

2023-01-17 Thread GitBox
hudi-bot commented on PR #7687: URL: https://github.com/apache/hudi/pull/7687#issuecomment-1385292589 ## CI report: * 78d341045ff40465c1d44f377b42e5d91f7c5fc7 Azure:

[GitHub] [hudi] yihua commented on issue #7430: [BUG] MOR Table Hard Deletes Create issue with Athena Querying RT Tables

2023-01-17 Thread GitBox
yihua commented on issue #7430: URL: https://github.com/apache/hudi/issues/7430#issuecomment-1386112654 Hi @soumilshah1995 would you mind creating an AWS support issue for this? That will accelerate the resolution from AWS Athena. -- This is an automated message from the Apache Git

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #7640: [HUDI-5514] Add in support for a keyless workflow

2023-01-17 Thread GitBox
alexeykudinkin commented on code in PR #7640: URL: https://github.com/apache/hudi/pull/7640#discussion_r1072995895 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/KeylessKeyGenerator.java: ## @@ -0,0 +1,239 @@ +/* + * Licensed to the Apache Software

[jira] [Closed] (HUDI-4148) Preparations and client for hudi table manager service

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu closed HUDI-4148. Reviewers: Raymond Xu Resolution: Fixed > Preparations and client for hudi table manager service >

[jira] [Updated] (HUDI-5569) Files written by first commit/delta commit if it failed is detected as valid data files

2023-01-17 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5569: -- Fix Version/s: 0.13.0 > Files written by first commit/delta commit if it failed is

[GitHub] [hudi] hudi-bot commented on pull request #7660: [MINOR] unify naming for record merger

2023-01-17 Thread GitBox
hudi-bot commented on PR #7660: URL: https://github.com/apache/hudi/pull/7660#issuecomment-1386215962 ## CI report: * a409755934848d189e0d731e4ee68a22190e5b0d Azure:

[GitHub] [hudi] hudi-bot commented on pull request #7687: Update to handle deletes in postgres debezium

2023-01-17 Thread GitBox
hudi-bot commented on PR #7687: URL: https://github.com/apache/hudi/pull/7687#issuecomment-1385988494 ## CI report: * 78d341045ff40465c1d44f377b42e5d91f7c5fc7 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #7576: [HUDI-4991] Allow kafka-like configs to set truststore and keystore for the SchemaProvider

2023-01-17 Thread GitBox
hudi-bot commented on PR #7576: URL: https://github.com/apache/hudi/pull/7576#issuecomment-1386090331 ## CI report: * f7b2c025ed416ea8607b2e6dcc116415f114f87b Azure:

[GitHub] [hudi] hudi-bot commented on pull request #7660: [MINOR] unify naming for record merger

2023-01-17 Thread GitBox
hudi-bot commented on PR #7660: URL: https://github.com/apache/hudi/pull/7660#issuecomment-1386201200 ## CI report: * a409755934848d189e0d731e4ee68a22190e5b0d Azure:

[GitHub] [hudi] hudi-bot commented on pull request #7660: [MINOR] unify naming for record merger

2023-01-17 Thread GitBox
hudi-bot commented on PR #7660: URL: https://github.com/apache/hudi/pull/7660#issuecomment-1386209381 ## CI report: * a409755934848d189e0d731e4ee68a22190e5b0d Azure:

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #7423: [HUDI-5384][Stacked on 7528] Adding optimization rule to appropriately push down filters into the `HoodieFileIndex`

2023-01-17 Thread GitBox
alexeykudinkin commented on code in PR #7423: URL: https://github.com/apache/hudi/pull/7423#discussion_r1072877999 ## hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/analysis/HoodiePruneFileSourcePartitions.scala: ## @@ -0,0 +1,126 @@ +/* + * Licensed

[GitHub] [hudi] hudi-bot commented on pull request #7632: [HUDI-3775] Allow for offline compaction of MOR tables via spark streaming

2023-01-17 Thread GitBox
hudi-bot commented on PR #7632: URL: https://github.com/apache/hudi/pull/7632#issuecomment-1386230596 ## CI report: * 8dc8184d6fbafc72835bf52f85075e2a8288061e Azure:

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #7582: [HUDI-5488] Make sure Disrupt queue start first, then insert records

2023-01-17 Thread GitBox
alexeykudinkin commented on code in PR #7582: URL: https://github.com/apache/hudi/pull/7582#discussion_r1072998596 ## hudi-common/src/main/java/org/apache/hudi/common/util/queue/DisruptorMessageQueue.java: ## @@ -60,6 +61,10 @@ public long size() { @Override public void

[GitHub] [hudi] hudi-bot commented on pull request #7582: [HUDI-5488] Make sure Disrupt queue start first, then insert records

2023-01-17 Thread GitBox
hudi-bot commented on PR #7582: URL: https://github.com/apache/hudi/pull/7582#issuecomment-1385988025 ## CI report: * a94ec9cf09ce55b684fa059ce1ede73bead0e991 Azure:

[jira] [Created] (HUDI-5569) Files written by first commit/delta commit if it failed is detected as valid data files

2023-01-17 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-5569: - Summary: Files written by first commit/delta commit if it failed is detected as valid data files Key: HUDI-5569 URL: https://issues.apache.org/jira/browse/HUDI-5569

[jira] [Updated] (HUDI-5569) Files written by first commit/delta commit if it failed is detected as valid data files

2023-01-17 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5569: -- Sprint: 0.13.0 Final Sprint 2 > Files written by first commit/delta commit if it failed

[GitHub] [hudi] soumilshah1995 commented on issue #7430: [BUG] MOR Table Hard Deletes Create issue with Athena Querying RT Tables

2023-01-17 Thread GitBox
soumilshah1995 commented on issue #7430: URL: https://github.com/apache/hudi/issues/7430#issuecomment-1386115733 Sure i will tell my company sysops to create support ticket :D -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[jira] [Updated] (HUDI-5485) Improve performance of savepoint with MDT

2023-01-17 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-5485: - Labels: pull-request-available (was: ) > Improve performance of savepoint with MDT >

[jira] [Updated] (HUDI-5323) Decouple virtual key with writing bloom filters to parquet files

2023-01-17 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5323: Status: Patch Available (was: In Progress) > Decouple virtual key with writing bloom filters to parquet

[jira] [Updated] (HUDI-5319) NPE in Bloom Filter Index

2023-01-17 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5319: Status: Patch Available (was: In Progress) > NPE in Bloom Filter Index > - > >

[jira] [Updated] (HUDI-5485) Improve performance of savepoint with MDT

2023-01-17 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5485: Status: Patch Available (was: In Progress) > Improve performance of savepoint with MDT >

[GitHub] [hudi] yihua opened a new pull request, #7690: [HUDI-5485] Add File System View API for batch listing and improve savepoint performance with metadata table

2023-01-17 Thread GitBox
yihua opened a new pull request, #7690: URL: https://github.com/apache/hudi/pull/7690 ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ ### Impact _Describe any public API or user-facing feature change or any performance

[GitHub] [hudi] yihua commented on issue #7430: [BUG] MOR Table Hard Deletes Create issue with Athena Querying RT Tables

2023-01-17 Thread GitBox
yihua commented on issue #7430: URL: https://github.com/apache/hudi/issues/7430#issuecomment-1386123258 > Sure i will tell my company sysops to create support ticket :D Appreciate that! Let us know the AWS support ticket number once it's filed. cc @umehrot2 -- This is an

[jira] [Updated] (HUDI-5569) Files written by first commit/delta commit if it failed is detected as valid data files

2023-01-17 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5569: -- Description: We have an method in HoodieFileGroup which detects whether a file group is

[jira] [Assigned] (HUDI-5569) Files written by first commit/delta commit if it failed is detected as valid data files

2023-01-17 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan reassigned HUDI-5569: - Assignee: Jonathan Vexler > Files written by first commit/delta commit if it

[GitHub] [hudi] nsivabalan commented on a diff in pull request #7632: [HUDI-3775] Allow for offline compaction of MOR tables via spark streaming

2023-01-17 Thread GitBox
nsivabalan commented on code in PR #7632: URL: https://github.com/apache/hudi/pull/7632#discussion_r1072765472 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DataSourceOptions.scala: ## @@ -455,6 +455,15 @@ object DataSourceWriteOptions { +

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #6782: [HUDI-4911][HUDI-3301] Fixing `HoodieMetadataLogRecordReader` to avoid flushing cache for every lookup

2023-01-17 Thread GitBox
alexeykudinkin commented on code in PR #6782: URL: https://github.com/apache/hudi/pull/6782#discussion_r1072896769 ## hudi-common/src/test/java/org/apache/hudi/common/functional/TestHoodieLogFormat.java: ## @@ -671,11 +658,188 @@ public void

[GitHub] [hudi] With-winds opened a new issue, #7689: [SUPPORT] PriorityBasedFileSystemView: Got error running preferred function. Trying secondary

2023-01-17 Thread GitBox
With-winds opened a new issue, #7689: URL: https://github.com/apache/hudi/issues/7689 **Describe the problem you faced** When trying to write to existing COW table using HoodieDeltaStreamer, an error occurred in the Java Spark application. **To Reproduce** **Expected

[GitHub] [hudi] hudi-bot commented on pull request #7660: [MINOR] unify naming for record merger

2023-01-17 Thread GitBox
hudi-bot commented on PR #7660: URL: https://github.com/apache/hudi/pull/7660#issuecomment-1386356546 ## CI report: * 08642ac9be198fdf55f02260253f81a0b457bcad Azure:

[jira] [Updated] (HUDI-5433) Fix the way we deduce the pending instants for MDT writes

2023-01-17 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5433: -- Story Points: 0 (was: 1) > Fix the way we deduce the pending instants for MDT writes >

[jira] [Updated] (HUDI-5408) Partially failed commits in MDT have to be rolled back in all cases

2023-01-17 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5408: -- Story Points: 0 (was: 1) > Partially failed commits in MDT have to be rolled back in

[jira] [Updated] (HUDI-5407) Rollbacks in MDT is not effective

2023-01-17 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5407: -- Story Points: 0 (was: 1) > Rollbacks in MDT is not effective >

[jira] [Updated] (HUDI-4911) Make sure LogRecordReader doesn't flush the cache before each lookup

2023-01-17 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-4911: -- Story Points: 1 (was: 4) > Make sure LogRecordReader doesn't flush the cache before

[GitHub] [hudi] hudi-bot commented on pull request #7612: [HUDI-5336] Fixing log file pattern match to ignore extraneous files

2023-01-17 Thread GitBox
hudi-bot commented on PR #7612: URL: https://github.com/apache/hudi/pull/7612#issuecomment-1386372802 ## CI report: * 66370e1d4085619050625bf32e08dc9c8cef8f76 Azure:

[GitHub] [hudi] alexeykudinkin commented on a diff in pull request #6782: [HUDI-4911][HUDI-3301] Fixing `HoodieMetadataLogRecordReader` to avoid flushing cache for every lookup

2023-01-17 Thread GitBox
alexeykudinkin commented on code in PR #6782: URL: https://github.com/apache/hudi/pull/6782#discussion_r1073024689 ## hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieMergedLogRecordScanner.java: ## @@ -108,30 +116,94 @@ protected

[GitHub] [hudi] hudi-bot commented on pull request #7612: [HUDI-5336] Fixing log file pattern match to ignore extraneous files

2023-01-17 Thread GitBox
hudi-bot commented on PR #7612: URL: https://github.com/apache/hudi/pull/7612#issuecomment-1386378924 ## CI report: * 66370e1d4085619050625bf32e08dc9c8cef8f76 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #6815: [HUDI-4937] Fix `HoodieTable` injecting non-reusable `HoodieBackedTableMetadata` aggressively flushing MT readers

2023-01-17 Thread GitBox
hudi-bot commented on PR #6815: URL: https://github.com/apache/hudi/pull/6815#issuecomment-1386377763 ## CI report: * 0025243644c03672360497938474031048a254cf Azure:

[GitHub] [hudi] LinMingQiang opened a new issue, #7691: [SUPPORT] Flink's schema conflicts with spark's schema.

2023-01-17 Thread GitBox
LinMingQiang opened a new issue, #7691: URL: https://github.com/apache/hudi/issues/7691 **_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? - Join the mailing list to engage in conversations and get faster support at

[jira] [Updated] (HUDI-5464) Fix instantiation of a new partition in MDT re-using the same instant time as a regular commit

2023-01-17 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-5464: -- Reporter: sivabalan narayanan (was: Alexey Kudinkin) > Fix instantiation of a new partition in

[jira] [Updated] (HUDI-4937) Fix HoodieTable injecting HoodieBackedTableMetadata not reusing underlying MT readers

2023-01-17 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-4937: -- Story Points: 2 (was: 4) > Fix HoodieTable injecting HoodieBackedTableMetadata not reusing

[jira] [Assigned] (HUDI-5464) Fix instantiation of a new partition in MDT re-using the same instant time as a regular commit

2023-01-17 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan reassigned HUDI-5464: - Assignee: Raymond Xu (was: Alexey Kudinkin) > Fix instantiation of a new

[jira] [Closed] (HUDI-4586) Address S3 timeouts in Bloom Index with metadata table

2023-01-17 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin closed HUDI-4586. - Resolution: Fixed > Address S3 timeouts in Bloom Index with metadata table >

[jira] [Updated] (HUDI-5485) Improve performance of savepoint with MDT

2023-01-17 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5485: Story Points: 1 (was: 0.5) > Improve performance of savepoint with MDT >

[jira] [Updated] (HUDI-5485) Improve performance of savepoint with MDT

2023-01-17 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5485: Reviewers: sivabalan narayanan > Improve performance of savepoint with MDT >

[jira] [Updated] (HUDI-5552) Too slow while using trino-hudi connector while querying partitioned tables.

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5552: - Reviewers: Alexey Kudinkin > Too slow while using trino-hudi connector while querying partitioned tables.

[jira] [Updated] (HUDI-5552) Too slow while using trino-hudi connector while querying partitioned tables.

2023-01-17 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-5552: -- Reviewers: Alexey Kudinkin (was: Alexey Kudinkin) > Too slow while using trino-hudi connector

[jira] [Updated] (HUDI-5384) Make sure predicates are appropriately pushed down to HoodieFileIndex when lazy listing

2023-01-17 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-5384: -- Story Points: 1 (was: 2) > Make sure predicates are appropriately pushed down to

[jira] [Updated] (HUDI-5417) Support to read avro from non-legacy map/list in parquet log

2023-01-17 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-5417: -- Status: In Progress (was: Open) > Support to read avro from non-legacy map/list in parquet log

[jira] [Updated] (HUDI-5417) Support to read avro from non-legacy map/list in parquet log

2023-01-17 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-5417: -- Story Points: 1 > Support to read avro from non-legacy map/list in parquet log >

[jira] [Updated] (HUDI-5417) Support to read avro from non-legacy map/list in parquet log

2023-01-17 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-5417: -- Status: Patch Available (was: In Progress) > Support to read avro from non-legacy map/list in

[GitHub] [hudi] hudi-bot commented on pull request #6815: [HUDI-4937] Fix `HoodieTable` injecting non-reusable `HoodieBackedTableMetadata` aggressively flushing MT readers

2023-01-17 Thread GitBox
hudi-bot commented on PR #6815: URL: https://github.com/apache/hudi/pull/6815#issuecomment-1386433086 ## CI report: * 13fb78850890b96b86b66d7df060feb11950ec0c UNKNOWN * 3d90e88fda205fd2cbf95c402a19b5bba2ebfa18 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #6384: [HUDI-4613] Avoid the use of regex expressions when call hoodieFileGroup#addLogFile function

2023-01-17 Thread GitBox
hudi-bot commented on PR #6384: URL: https://github.com/apache/hudi/pull/6384#issuecomment-1386432790 ## CI report: * d18a40d00cb6ff6c2ff2768b289c1435e3ceaa28 Azure:

[GitHub] [hudi] danny0405 commented on issue #7691: [SUPPORT] Flink's schema conflicts with spark's schema.

2023-01-17 Thread GitBox
danny0405 commented on issue #7691: URL: https://github.com/apache/hudi/issues/7691#issuecomment-1386486937 Okey, seems a bug, flink uses the constant namespace named 'record' when generating the avro schema, does that cause the im-compatibility? Can you fire a JIRA to address and fix

[GitHub] [hudi] hudi-bot commented on pull request #7612: [HUDI-5336] Fixing log file pattern match to ignore extraneous files

2023-01-17 Thread GitBox
hudi-bot commented on PR #7612: URL: https://github.com/apache/hudi/pull/7612#issuecomment-1386491701 ## CI report: * 1dc0a0732953fa0b470054c828981e226803e8aa Azure:

[GitHub] [hudi] hudi-bot commented on pull request #7642: [HUDI-5534][Stacked on 6782] Optimizing Bloom Index lookup when using Bloom Filters from Metadata Table

2023-01-17 Thread GitBox
hudi-bot commented on PR #7642: URL: https://github.com/apache/hudi/pull/7642#issuecomment-1386491820 ## CI report: * 01697615c3d88afaa15a59cad6d0c5548b295253 Azure:

[GitHub] [hudi] nsivabalan opened a new pull request, #7692: [HUDI-XXXX] enabling scan V2 for log record reader

2023-01-17 Thread GitBox
nsivabalan opened a new pull request, #7692: URL: https://github.com/apache/hudi/pull/7692 ### Change Logs testing ScanV2 with log record reader. ### Impact _Describe any public API or user-facing feature change or any performance impact._ ### Risk level (write

[jira] [Created] (HUDI-5574) Support auto record key generation with Spark SQL

2023-01-17 Thread Lokesh Jain (Jira)
Lokesh Jain created HUDI-5574: - Summary: Support auto record key generation with Spark SQL Key: HUDI-5574 URL: https://issues.apache.org/jira/browse/HUDI-5574 Project: Apache Hudi Issue Type:

[jira] [Updated] (HUDI-5570) Write tests for failed compaction retried w/ MDT able to serve just the required data

2023-01-17 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5570: -- Story Points: 2 > Write tests for failed compaction retried w/ MDT able to serve just

[jira] [Updated] (HUDI-5570) Write tests for failed compaction retried w/ MDT able to serve just the required data

2023-01-17 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5570: -- Epic Link: HUDI-1292 > Write tests for failed compaction retried w/ MDT able to serve

[jira] [Updated] (HUDI-5570) Write tests for failed compaction retried w/ MDT able to serve just the required data

2023-01-17 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5570: -- Sprint: 0.13.0 Final Sprint 3 > Write tests for failed compaction retried w/ MDT able

[jira] [Updated] (HUDI-5555) Set class loader for parquet data block

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-: - Reviewers: Alexey Kudinkin > Set class loader for parquet data block >

[jira] [Updated] (HUDI-5475) not able to generate utilities-slim bundle dependency tree

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5475: - Reviewers: Raymond Xu > not able to generate utilities-slim bundle dependency tree >

[jira] [Updated] (HUDI-5569) Files written by first commit/delta commit if it failed is detected as valid data files

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5569: - Story Points: 2 > Files written by first commit/delta commit if it failed is detected as valid > data

[jira] [Updated] (HUDI-5569) Files written by first commit/delta commit if it failed is detected as valid data files

2023-01-17 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5569: - Reviewers: sivabalan narayanan > Files written by first commit/delta commit if it failed is detected as

[jira] [Assigned] (HUDI-5417) support to read avro from non-legacy map/list in parquet log

2023-01-17 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin reassigned HUDI-5417: - Assignee: Frank Wong > support to read avro from non-legacy map/list in parquet log >

  1   2   3   >