[jira] [Created] (HUDI-5523) Support force rollback to a history instant

2023-01-09 Thread Danny Chen (Jira)
Danny Chen created HUDI-5523: Summary: Support force rollback to a history instant Key: HUDI-5523 URL: https://issues.apache.org/jira/browse/HUDI-5523 Project: Apache Hudi Issue Type: New Feature

[GitHub] [hudi] hudi-bot commented on pull request #7633: Fix Deletes issued without any prior commits

2023-01-09 Thread GitBox
hudi-bot commented on PR #7633: URL: https://github.com/apache/hudi/pull/7633#issuecomment-1376862712 ## CI report: * d10474141c5493f9ec6712da8c8f4fff595e94cf UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[GitHub] [hudi] hudi-bot commented on pull request #7631: [MINOR] Remove useless RollbackTimeline

2023-01-09 Thread GitBox
hudi-bot commented on PR #7631: URL: https://github.com/apache/hudi/pull/7631#issuecomment-1376862673 ## CI report: * 4e4887acea4e6197e6822eafd0aabf48a9378d78 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1420

[GitHub] [hudi] hudi-bot commented on pull request #7615: [HUDI-5510] Reload active timeline when getInstantsToArchive

2023-01-09 Thread GitBox
hudi-bot commented on PR #7615: URL: https://github.com/apache/hudi/pull/7615#issuecomment-1376862564 ## CI report: * c93470f2891c96f71c859677ad132c24f2eb373e Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=142

[GitHub] [hudi] hudi-bot commented on pull request #7372: [HUDI-5326] Fix clustering group building in SparkSizeBasedClusteringPlanStrategy

2023-01-09 Thread GitBox
hudi-bot commented on PR #7372: URL: https://github.com/apache/hudi/pull/7372#issuecomment-1376862113 ## CI report: * 46b644c42fca50efc48bde48a4a4e32c05aa50c9 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1420

[GitHub] [hudi] hudi-bot commented on pull request #7365: [HUDI-5317] Fix insert overwrite table for partitioned table

2023-01-09 Thread GitBox
hudi-bot commented on PR #7365: URL: https://github.com/apache/hudi/pull/7365#issuecomment-1376862030 ## CI report: * b8793478965fff04d0df199741ce28909d6695e7 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1418

[GitHub] [hudi] hudi-bot commented on pull request #7615: [HUDI-5510] Reload active timeline when getInstantsToArchive

2023-01-09 Thread GitBox
hudi-bot commented on PR #7615: URL: https://github.com/apache/hudi/pull/7615#issuecomment-1376856826 ## CI report: * 6d63669cac8191009b6fc7df2e9ff768463f00d1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1414

[GitHub] [hudi] hudi-bot commented on pull request #7372: [HUDI-5326] Fix clustering group building in SparkSizeBasedClusteringPlanStrategy

2023-01-09 Thread GitBox
hudi-bot commented on PR #7372: URL: https://github.com/apache/hudi/pull/7372#issuecomment-1376856211 ## CI report: * 46b644c42fca50efc48bde48a4a4e32c05aa50c9 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1420

[GitHub] [hudi] yihua commented on issue #7487: [SUPPORT] S3 Buckets reached quota limit when reading from hudi tables

2023-01-09 Thread GitBox
yihua commented on issue #7487: URL: https://github.com/apache/hudi/issues/7487#issuecomment-1376842456 503 errors mean that the throttling limit of S3 requests is hit, causing backlog or timeout, making the jobs fail easily. -- This is an automated message from the Apache Git Service. To

[GitHub] [hudi] yihua commented on issue #7487: [SUPPORT] S3 Buckets reached quota limit when reading from hudi tables

2023-01-09 Thread GitBox
yihua commented on issue #7487: URL: https://github.com/apache/hudi/issues/7487#issuecomment-1376840871 Hi @AdarshKadameriTR to fully understand where these S3 requests / API calls come from, you should enable S3 request logs by setting `log4j.logger.com.amazonaws.request=DEBUG` in log4j pr

[GitHub] [hudi] pratyakshsharma commented on pull request #6926: [HUDI-3676] Enhance tests for trigger clean every Nth commit

2023-01-09 Thread GitBox
pratyakshsharma commented on PR #6926: URL: https://github.com/apache/hudi/pull/6926#issuecomment-1376839387 this is good for another pass @yihua -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [hudi] yihua commented on a diff in pull request #7607: [HUDI-5499] Fixing Spark SQL configs not being properly propagated for CTAS and other commands

2023-01-09 Thread GitBox
yihua commented on code in PR #7607: URL: https://github.com/apache/hudi/pull/7607#discussion_r1065408280 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/ProvidesHoodieConfig.scala: ## @@ -81,10 +80,8 @@ trait ProvidesHoodieConfig extends Logg

[GitHub] [hudi] liaotian1005 opened a new pull request, #7633: Deletes issued without any prior commits

2023-01-09 Thread GitBox
liaotian1005 opened a new pull request, #7633: URL: https://github.com/apache/hudi/pull/7633 create table hudi_cow_nonpcf_tbl ( uuid int, name string, price double ) us

[GitHub] [hudi] yihua commented on issue #7545: [SUPPORT]How to sync data from Kafka to Hudi when use Flink SQL canal-json format

2023-01-09 Thread GitBox
yihua commented on issue #7545: URL: https://github.com/apache/hudi/issues/7545#issuecomment-1376827572 @With-winds I'm curious, did you find out a solution? If so, would you mind sharing the knowledge so others can benefit from it too? -- This is an automated message from the Apache Git

[jira] [Updated] (HUDI-5522) Improve docs for disaster recovery

2023-01-09 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5522: Fix Version/s: 0.13.0 > Improve docs for disaster recovery > -- > >

[GitHub] [hudi] yihua commented on issue #7589: Keep only clustered file(all) after cleaning

2023-01-09 Thread GitBox
yihua commented on issue #7589: URL: https://github.com/apache/hudi/issues/7589#issuecomment-1376825649 Hi @maheshguptags you can also create savepoints using [Spark SQL procedures](https://hudi.apache.org/docs/0.11.1/procedures#create_savepoints): `call create_savepoints(table => 'hudi_tri

[jira] [Updated] (HUDI-5522) Improve docs for disaster recovery

2023-01-09 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5522: Description: Related issue: [https://github.com/apache/hudi/issues/7589] [https://hudi.apache.org/docs/disa

[jira] [Created] (HUDI-5522) Improve docs for disaster recovery

2023-01-09 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-5522: --- Summary: Improve docs for disaster recovery Key: HUDI-5522 URL: https://issues.apache.org/jira/browse/HUDI-5522 Project: Apache Hudi Issue Type: Improvement

[GitHub] [hudi] JerryYue-M commented on pull request #6134: WIP:refactor hoodie stream source based flip-27 and support watermark

2023-01-09 Thread GitBox
JerryYue-M commented on PR #6134: URL: https://github.com/apache/hudi/pull/6134#issuecomment-1376800563 > what's the status of this pr right now? This is OnGoing this will rebase with the master and add some ut tests later -- This is an automated message from the Apache Git Service.

[GitHub] [hudi] hudi-bot commented on pull request #7631: [MINOR] Remove useless RollbackTimeline

2023-01-09 Thread GitBox
hudi-bot commented on PR #7631: URL: https://github.com/apache/hudi/pull/7631#issuecomment-137671 ## CI report: * 4e4887acea4e6197e6822eafd0aabf48a9378d78 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1420

[GitHub] [hudi] hudi-bot commented on pull request #6133: [HUDI-1575] Early Conflict Detection For Multi-writer

2023-01-09 Thread GitBox
hudi-bot commented on PR #6133: URL: https://github.com/apache/hudi/pull/6133#issuecomment-1376790832 ## CI report: * dbe3db845908d261baa5a1aa71d19e0db55816de UNKNOWN * 678cce4a9748cb54a90a559384a0cb0443082535 UNKNOWN * 6fc5bf1ce7921bf25acc3659565457264d8b9dc2 UNKNOWN * 0b

[GitHub] [hudi] danny0405 commented on a diff in pull request #7627: [HUDI-5517] HoodieTimeline support filter instants by state transition time

2023-01-09 Thread GitBox
danny0405 commented on code in PR #7627: URL: https://github.com/apache/hudi/pull/7627#discussion_r1065348765 ## hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieInstant.java: ## @@ -46,14 +55,35 @@ public class HoodieInstant implements Serializable, Compar

[GitHub] [hudi] zhangyue19921010 commented on pull request #6133: [HUDI-1575] Early Conflict Detection For Multi-writer

2023-01-09 Thread GitBox
zhangyue19921010 commented on PR #6133: URL: https://github.com/apache/hudi/pull/6133#issuecomment-1376756472 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

[GitHub] [hudi] racc commented on issue #6808: [SUPPORT] Cannot sync to spark embedded derby hive meta store (the default one)

2023-01-09 Thread GitBox
racc commented on issue #6808: URL: https://github.com/apache/hudi/issues/6808#issuecomment-1376748977 ```java public class MetaStoreTxnDbUtilPrep extends MetaStoreInitListener { public MetaStoreTxnDbUtilPrep(Configuration config) { super(config); } @Ove

[GitHub] [hudi] hudi-bot commented on pull request #6133: [HUDI-1575] Early Conflict Detection For Multi-writer

2023-01-09 Thread GitBox
hudi-bot commented on PR #6133: URL: https://github.com/apache/hudi/pull/6133#issuecomment-1376737665 ## CI report: * dbe3db845908d261baa5a1aa71d19e0db55816de UNKNOWN * 678cce4a9748cb54a90a559384a0cb0443082535 UNKNOWN * 6fc5bf1ce7921bf25acc3659565457264d8b9dc2 UNKNOWN * 0b

[GitHub] [hudi] hudi-bot commented on pull request #7605: [HUDI-5349] Clean up partially failed restore

2023-01-09 Thread GitBox
hudi-bot commented on PR #7605: URL: https://github.com/apache/hudi/pull/7605#issuecomment-1376735097 ## CI report: * 5c77de2450baf4d3dcb153ee53a57e006926a612 UNKNOWN * 6eacbe50e932dfef39439f3f025cdb946326c180 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2

[GitHub] [hudi] kepplertreet commented on issue #7628: [SUPPORT] Hudi Metadata Column Stats Fail

2023-01-09 Thread GitBox
kepplertreet commented on issue #7628: URL: https://github.com/apache/hudi/issues/7628#issuecomment-1376709063 Hi @alexeykudinkin We are using an integer id column as the `hoodie.datasource.write.recordkey.field` Listing a few sample values ``` 1263633528 1263633530

[GitHub] [hudi] davidshtian commented on issue #7591: [SUPPORT] Kinesis Data Analytics Flink1.13 to HUDI

2023-01-09 Thread GitBox
davidshtian commented on issue #7591: URL: https://github.com/apache/hudi/issues/7591#issuecomment-1376707180 From the logs _java.util.concurrent.CompletionException: org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: zeppelin-flin

[GitHub] [hudi] boneanxs commented on pull request #7582: [HUDI-5488]Make sure Disrupt queue start first, then insert records

2023-01-09 Thread GitBox
boneanxs commented on PR #7582: URL: https://github.com/apache/hudi/pull/7582#issuecomment-1376703720 Gentle ping @alexeykudinkin -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

[GitHub] [hudi] davidshtian commented on issue #7591: [SUPPORT] Kinesis Data Analytics Flink1.13 to HUDI

2023-01-09 Thread GitBox
davidshtian commented on issue #7591: URL: https://github.com/apache/hudi/issues/7591#issuecomment-1376687802 @soumilshah1995 I tried again, it worked as below, for your reference. Thanks~ **Step 1 – Kinesis Stream** https://user-images.githubusercontent.com/14228056/211456528-eb52

[GitHub] [hudi] hudi-bot commented on pull request #7615: [HUDI-5510] Reload active timeline when getInstantsToArchive

2023-01-09 Thread GitBox
hudi-bot commented on PR #7615: URL: https://github.com/apache/hudi/pull/7615#issuecomment-1376686175 ## CI report: * 6d63669cac8191009b6fc7df2e9ff768463f00d1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1414

[GitHub] [hudi] hudi-bot commented on pull request #7365: [HUDI-5317] Fix insert overwrite table for partitioned table

2023-01-09 Thread GitBox
hudi-bot commented on PR #7365: URL: https://github.com/apache/hudi/pull/7365#issuecomment-1376685879 ## CI report: * b8793478965fff04d0df199741ce28909d6695e7 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1418

[GitHub] [hudi] hudi-bot commented on pull request #7615: [HUDI-5510] Reload active timeline when getInstantsToArchive

2023-01-09 Thread GitBox
hudi-bot commented on PR #7615: URL: https://github.com/apache/hudi/pull/7615#issuecomment-1376682774 ## CI report: * 6d63669cac8191009b6fc7df2e9ff768463f00d1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1414

[GitHub] [hudi] hudi-bot commented on pull request #7365: [HUDI-5317] Fix insert overwrite table for partitioned table

2023-01-09 Thread GitBox
hudi-bot commented on PR #7365: URL: https://github.com/apache/hudi/pull/7365#issuecomment-1376682421 ## CI report: * b8793478965fff04d0df199741ce28909d6695e7 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=1418

[GitHub] [hudi] hudi-bot commented on pull request #7607: [HUDI-5499] Fixing Spark SQL configs not being properly propagated for CTAS and other commands

2023-01-09 Thread GitBox
hudi-bot commented on PR #7607: URL: https://github.com/apache/hudi/pull/7607#issuecomment-1376679068 ## CI report: * 32033e4a4ed91005a237aa88afa2c6adcb51169f UNKNOWN * 05cbda8ddcca0944c7965bd7c32448e29f97 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2

[GitHub] [hudi] trushev commented on a diff in pull request #7626: [HUDI-5516] Reduce memory footprint on workload with thousand active partitions

2023-01-09 Thread GitBox
trushev commented on code in PR #7626: URL: https://github.com/apache/hudi/pull/7626#discussion_r1065293780 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteFunction.java: ## @@ -449,6 +450,7 @@ private boolean flushBucket(DataBucket bucket) {

[GitHub] [hudi] trushev commented on a diff in pull request #7626: [HUDI-5516] Reduce memory footprint on workload with thousand active partitions

2023-01-09 Thread GitBox
trushev commented on code in PR #7626: URL: https://github.com/apache/hudi/pull/7626#discussion_r1065294170 ## hudi-client/hudi-flink-client/src/main/java/org/apache/hudi/client/HoodieFlinkWriteClient.java: ## @@ -94,7 +94,7 @@ * FileID to write handle mapping in order to re

[jira] [Assigned] (HUDI-5430) Fix multi-writer handling w/ rollback blocks in MOR table (log record reader)

2023-01-09 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin reassigned HUDI-5430: - Assignee: sivabalan narayanan (was: Alexey Kudinkin) > Fix multi-writer handling w/ roll

[jira] [Assigned] (HUDI-5464) Fix instantiation of a new partition in MDT re-using the same instant time as a regular commit

2023-01-09 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin reassigned HUDI-5464: - Assignee: sivabalan narayanan (was: Alexey Kudinkin) > Fix instantiation of a new partit

[jira] [Updated] (HUDI-5521) Make sure MT Bloom Index partition is covered by tests for Bloom Index

2023-01-09 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-5521: -- Sprint: 0.13.0 Final Sprint 2 > Make sure MT Bloom Index partition is covered by tests for Bloom

[jira] [Updated] (HUDI-5423) Flaky test: ColumnStatsTestCase(MERGE_ON_READ,true,true)

2023-01-09 Thread Alexey Kudinkin (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexey Kudinkin updated HUDI-5423: -- Sprint: 0.13.0 Final Sprint (was: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2) > Flaky test: Co

[GitHub] [hudi] trushev commented on pull request #7626: [HUDI-5516] Reduce memory footprint on workload with thousand active partitions

2023-01-09 Thread GitBox
trushev commented on PR #7626: URL: https://github.com/apache/hudi/pull/7626#issuecomment-1376652385 > Nice catch, @trushev , curious why the closed handle is also taking huge resource, we may need to figure it out first. > > But I still think the change is valid. Thank you for

[jira] [Created] (HUDI-5521) Make sure MT Bloom Index partition is covered by tests for Bloom Index

2023-01-09 Thread Alexey Kudinkin (Jira)
Alexey Kudinkin created HUDI-5521: - Summary: Make sure MT Bloom Index partition is covered by tests for Bloom Index Key: HUDI-5521 URL: https://issues.apache.org/jira/browse/HUDI-5521 Project: Apache

[jira] [Updated] (HUDI-5463) Apply rollback commits from data table as rollbacks in MDT instead of Delta commit

2023-01-09 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5463: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2 (was: 0.13.0 Final Sprint) > Apply rollback commits f

[jira] [Updated] (HUDI-5276) Hudi getAllQueryPartitionPaths use regular match caused Invalid input path add

2023-01-09 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5276: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2 (was: 0.13.0 Final Sprint) > Hudi getAllQueryPartitio

[jira] [Updated] (HUDI-5503) Optimize flink table factory option check

2023-01-09 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5503: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2 (was: 0.13.0 Final Sprint) > Optimize flink table fac

[jira] [Updated] (HUDI-3601) Support multi-arch builds in docker setup

2023-01-09 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3601: - Sprint: 2022/09/05, 2022/09/19, 2022/10/04, 2022/10/18, 2022/11/01, 2022/11/15, 2022/11/29, 2022/12/12, 0.

[jira] [Updated] (HUDI-5498) Update docs for reading Hudi tables on Databricks runtime

2023-01-09 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5498: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2 (was: 0.13.0 Final Sprint) > Update docs for reading

[jira] [Updated] (HUDI-3249) Performance Improvements

2023-01-09 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3249: - Sprint: 2022/08/22, 2022/09/05, 2022/09/19, 2022/10/04, 2022/10/18, 2022/11/01, 2022/11/15, 2022/11/29, 20

[jira] [Updated] (HUDI-4937) Fix HoodieTable injecting HoodieBackedTableMetadata not reusing underlying MT readers

2023-01-09 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-4937: - Sprint: 2022/10/04, 2022/10/18, 2022/11/01, 2022/11/15, 2022/11/29, 2022/12/12, 0.13.0 Final Sprint, 0.13.

[jira] [Updated] (HUDI-5499) Make sure CTAS always uses Bulk Insert

2023-01-09 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5499: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2 (was: 0.13.0 Final Sprint) > Make sure CTAS always us

[jira] [Updated] (HUDI-5465) Fix compaction and rollback handling in MDT for multi-writer scenarios in DT

2023-01-09 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5465: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2 (was: 0.13.0 Final Sprint) > Fix compaction and rollb

[jira] [Updated] (HUDI-5430) Fix multi-writer handling w/ rollback blocks in MOR table (log record reader)

2023-01-09 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5430: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2 (was: 0.13.0 Final Sprint) > Fix multi-writer handlin

[jira] [Updated] (HUDI-5433) Fix the way we deduce the pending instants for MDT writes

2023-01-09 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5433: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2 (was: 0.13.0 Final Sprint) > Fix the way we deduce th

[jira] [Updated] (HUDI-4613) Avoid the use of regex expressions when call hoodieFileGroup#addLogFile function

2023-01-09 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-4613: - Sprint: 2022/09/05, 2022/12/12, 0.13.0 Final Sprint, 0.13.0 Final Sprint 2 (was: 2022/09/05, 2022/12/12,

[jira] [Updated] (HUDI-3775) Allow for offline compaction of MOR tables via spark streaming

2023-01-09 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3775: - Sprint: 2022/09/05, 0.13.0 Final Sprint, 0.13.0 Final Sprint 2 (was: 2022/09/05, 0.13.0 Final Sprint) >

[jira] [Updated] (HUDI-5023) Add new Executor avoiding Queueing in the write-path

2023-01-09 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5023: - Sprint: 2022/11/15, 2022/11/29, 0.13.0 Final Sprint, 0.13.0 Final Sprint 2 (was: 2022/11/15, 2022/11/29,

[jira] [Updated] (HUDI-5520) Fail MDT when list of log files grow > 1000

2023-01-09 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5520: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2 (was: 0.13.0 Final Sprint) > Fail MDT when list of lo

[jira] [Updated] (HUDI-5349) Clean up partially failed restore if any

2023-01-09 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5349: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2 (was: 0.13.0 Final Sprint) > Clean up partially faile

[jira] [Updated] (HUDI-83) Map Timestamp type in spark to corresponding Timestamp type in Hive during Hive sync

2023-01-09 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-83?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-83: --- Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2 (was: 0.13.0 Final Sprint) > Map Timestamp type in spark to

[jira] [Updated] (HUDI-5407) Rollbacks in MDT is not effective

2023-01-09 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5407: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2 (was: 0.13.0 Final Sprint) > Rollbacks in MDT is not

[jira] [Updated] (HUDI-5401) Hivemetastore URI set in hudi conf not respected.

2023-01-09 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5401: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2 (was: 0.13.0 Final Sprint) > Hivemetastore URI set in

[jira] [Updated] (HUDI-5475) not able to generate utilities-slim bundle dependency tree

2023-01-09 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5475: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2 (was: 0.13.0 Final Sprint) > not able to generate uti

[jira] [Updated] (HUDI-5432) Fix adding back a log block w/ same commit time as previously rolled back one

2023-01-09 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5432: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2 (was: 0.13.0 Final Sprint) > Fix adding back a log bl

[jira] [Updated] (HUDI-5464) Fix instantiation of a new partition in MDT re-using the same instant time as a regular commit

2023-01-09 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5464: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2 (was: 0.13.0 Final Sprint) > Fix instantiation of a n

[jira] [Updated] (HUDI-5408) Partially failed commits in MDT have to be rolled back in all cases

2023-01-09 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5408: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2 (was: 0.13.0 Final Sprint) > Partially failed commits

[jira] [Updated] (HUDI-5485) Improve performance of savepoint with MDT

2023-01-09 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5485: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2 (was: 0.13.0 Final Sprint) > Improve performance of s

[jira] [Updated] (HUDI-3967) Automatic savepoint in Hudi

2023-01-09 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3967: - Sprint: 2022/08/22, 2022/09/05, 2022/09/19, 2022/10/04, 2022/10/18, 2022/11/01, 2022/11/15, 2022/11/29, 20

[jira] [Updated] (HUDI-5319) NPE in Bloom Filter Index

2023-01-09 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5319: - Sprint: 2022/12/12, 0.13.0 Final Sprint, 0.13.0 Final Sprint 2 (was: 2022/12/12, 0.13.0 Final Sprint) >

[jira] [Updated] (HUDI-3529) Improve dependency management and bundling

2023-01-09 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3529: - Sprint: 2022/08/22, 2022/09/05, 2022/09/19, 2022/10/04, 2022/10/18, 2022/11/01, 2022/11/15, 2022/11/29, 20

[jira] [Updated] (HUDI-5442) Fix HiveHoodieTableFileIndex to use lazy listing

2023-01-09 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5442: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2 (was: 0.13.0 Final Sprint) > Fix HiveHoodieTableFileI

[jira] [Updated] (HUDI-5075) Add support to rollback residual clustering after disabling clustering

2023-01-09 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5075: - Sprint: 2022/10/18, 2022/11/01, 2022/11/15, 2022/11/29, 2022/12/12, 0.13.0 Final Sprint, 0.13.0 Final Spri

[jira] [Updated] (HUDI-1574) Trim existing unit tests to finish in much shorter amount of time

2023-01-09 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-1574: - Sprint: 2022/08/22, 2022/09/05, 2022/09/19, 2022/10/04, 2022/10/18, 2022/11/01, 2022/11/15, 2022/11/29, 20

[jira] [Updated] (HUDI-5352) Jackson fails to serialize LocalDate when updating Delta Commit metadata

2023-01-09 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5352: - Sprint: 2022/12/12, 0.13.0 Final Sprint, 0.13.0 Final Sprint 2 (was: 2022/12/12, 0.13.0 Final Sprint) >

[jira] [Updated] (HUDI-5434) Fix archival in MDT to not rely on rollbacks/clean in DT

2023-01-09 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5434: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2 (was: 0.13.0 Final Sprint) > Fix archival in MDT to n

[jira] [Updated] (HUDI-4586) Address S3 timeouts in Bloom Index with metadata table

2023-01-09 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-4586: - Sprint: 2022/08/08, 2022/08/22, 2022/09/05, 0.13.0 Final Sprint, 0.13.0 Final Sprint 2 (was: 2022/08/08,

[jira] [Updated] (HUDI-5443) Fix exception when querying MOR table after applying NestedSchemaPruning optimization

2023-01-09 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5443: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2 (was: 0.13.0 Final Sprint) > Fix exception when query

[jira] [Updated] (HUDI-5392) Fix Bootstrap files reader to configure arrays to be read in the new format

2023-01-09 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5392: - Sprint: 2022/12/12, 0.13.0 Final Sprint, 0.13.0 Final Sprint 2 (was: 2022/12/12, 0.13.0 Final Sprint) >

[jira] [Updated] (HUDI-3673) Add a common hudi-hbase-shaded for shaded hbase dependencies

2023-01-09 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3673: - Sprint: 2022/11/29, 2022/12/12, 0.13.0 Final Sprint, 0.13.0 Final Sprint 2 (was: 2022/11/29, 2022/12/12,

[jira] [Updated] (HUDI-5384) Make sure predicates are appropriately pushed down to HoodieFileIndex when lazy listing

2023-01-09 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5384: - Sprint: 2022/12/12, 0.13.0 Final Sprint, 0.13.0 Final Sprint 2 (was: 2022/12/12, 0.13.0 Final Sprint) >

[jira] [Updated] (HUDI-4911) Make sure LogRecordReader doesn't flush the cache before each lookup

2023-01-09 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-4911: - Sprint: 2022/11/15, 2022/11/29, 2022/12/12, 0.13.0 Final Sprint, 0.13.0 Final Sprint 2 (was: 2022/11/15,

[jira] [Updated] (HUDI-3636) Clustering fails due to marker creation failure

2023-01-09 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3636: - Sprint: 2022/08/22, 2022/09/05, 2022/09/19, 2022/10/04, 2022/10/18, 2022/11/01, 2022/11/29, 2022/12/12, 0.

[jira] [Updated] (HUDI-4991) Make sure DeltaStreamer passes SSL key/truststore configs connecting to Schema Registry

2023-01-09 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-4991: - Sprint: 2022/10/04, 2022/10/18, 2022/11/01, 2022/12/12, 0.13.0 Final Sprint, 0.13.0 Final Sprint 2 (was:

[jira] [Updated] (HUDI-5321) Fix Bulk Insert ColumnSortPartitioners

2023-01-09 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5321: - Sprint: 2022/12/12, 0.13.0 Final Sprint, 0.13.0 Final Sprint 2 (was: 2022/12/12, 0.13.0 Final Sprint) >

[jira] [Updated] (HUDI-5160) Spark df saveAsTable failed with CTAS

2023-01-09 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5160: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2 (was: 0.13.0 Final Sprint) > Spark df saveAsTable fai

[jira] [Updated] (HUDI-2608) Support JSON schema in schema registry provider

2023-01-09 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2608: - Sprint: 2022/11/29, 2022/12/12, 0.13.0 Final Sprint, 0.13.0 Final Sprint 2 (was: 2022/11/29, 2022/12/12,

[jira] [Updated] (HUDI-5423) Flaky test: ColumnStatsTestCase(MERGE_ON_READ,true,true)

2023-01-09 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5423: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2 (was: 0.13.0 Final Sprint) > Flaky test: ColumnStatsT

[jira] [Updated] (HUDI-5169) Re-attempt failed rollback (regular commits, clustering) and get it to completion

2023-01-09 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5169: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2 (was: 0.13.0 Final Sprint) > Re-attempt failed rollba

[jira] [Updated] (HUDI-3517) Unicode in partition path causes it to be resolved wrongly

2023-01-09 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3517: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2 (was: 0.13.0 Final Sprint) > Unicode in partition pat

[jira] [Updated] (HUDI-5238) Hudi throwing "PipeBroken" exception during Merging on GCS

2023-01-09 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5238: - Sprint: 2022/11/15, 2022/11/29, 2022/12/12, 0.13.0 Final Sprint, 0.13.0 Final Sprint 2 (was: 2022/11/15,

[jira] [Updated] (HUDI-5323) Decouple virtual key with writing bloom filters to parquet files

2023-01-09 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5323: - Sprint: 2022/12/12, 0.13.0 Final Sprint, 0.13.0 Final Sprint 2 (was: 2022/12/12, 0.13.0 Final Sprint) >

[jira] [Updated] (HUDI-5498) Update docs for reading Hudi tables on Databricks runtime

2023-01-09 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5498: Story Points: 0.5 (was: 1) > Update docs for reading Hudi tables on Databricks runtime > --

[jira] [Updated] (HUDI-3775) Allow for offline compaction of MOR tables via spark streaming

2023-01-09 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-3775: -- Story Points: 1 (was: 2) > Allow for offline compaction of MOR tables via spark streami

[jira] [Updated] (HUDI-5349) Clean up partially failed restore if any

2023-01-09 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5349: -- Story Points: 1 > Clean up partially failed restore if any > ---

[jira] [Updated] (HUDI-5520) Fail MDT when list of log files grow > 1000

2023-01-09 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5520: -- Priority: Blocker (was: Critical) > Fail MDT when list of log files grow > 1000 > -

[jira] [Updated] (HUDI-5442) Fix HiveHoodieTableFileIndex to use lazy listing

2023-01-09 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5442: Story Points: 1 (was: 2) > Fix HiveHoodieTableFileIndex to use lazy listing > -

[jira] [Updated] (HUDI-5408) Partially failed commits in MDT have to be rolled back in all cases

2023-01-09 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5408: - Status: In Progress (was: Open) > Partially failed commits in MDT have to be rolled back in all cases > -

[jira] [Updated] (HUDI-5408) Partially failed commits in MDT have to be rolled back in all cases

2023-01-09 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5408: - Status: Patch Available (was: In Progress) > Partially failed commits in MDT have to be rolled back in al

[jira] [Updated] (HUDI-5408) Partially failed commits in MDT have to be rolled back in all cases

2023-01-09 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5408: -- Priority: Blocker (was: Critical) > Partially failed commits in MDT have to be rolled b

[jira] [Updated] (HUDI-5408) Partially failed commits in MDT have to be rolled back in all cases

2023-01-09 Thread sivabalan narayanan (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-5408: -- Story Points: 1 (was: 2) > Partially failed commits in MDT have to be rolled back in al

  1   2   3   >