[GitHub] [hudi] nbeeee commented on issue #7902: [SUPPORT].UnresolvedUnionException: Not in union exception occurred when writing data through spark

2023-02-13 Thread via GitHub
nb commented on issue #7902: URL: https://github.com/apache/hudi/issues/7902#issuecomment-1429275756 > sql: SELECT trim(compid) company_id ,trim(busno) business_id ,trim(wareid) ware_id ,if(sumqty = '', null, cast(sumqty as decimal(20,4))) as sumqty

[jira] [Updated] (HUDI-5754) Add detailed description of GCS Incr, Proto Kafka, and Pulsar Sources in Deltastreamer page

2023-02-13 Thread Sagar Sumit (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-5754: -- Status: Patch Available (was: In Progress) > Add detailed description of GCS Incr, Proto Kafka, and

[GitHub] [hudi] codope commented on a diff in pull request #7929: [HUDI-5754] Add new sources to deltastreamer docs

2023-02-13 Thread via GitHub
codope commented on code in PR #7929: URL: https://github.com/apache/hudi/pull/7929#discussion_r1105412713 ## website/docs/hoodie_deltastreamer.md: ## @@ -340,6 +388,26 @@ to trigger/processing of new or changed data as soon as it is available on S3. Insert code sample from

[GitHub] [hudi] hudi-bot commented on pull request #7938: [HUDI-5785] Enhance Spark Datasource tests

2023-02-13 Thread via GitHub
hudi-bot commented on PR #7938: URL: https://github.com/apache/hudi/pull/7938#issuecomment-1429271565 ## CI report: * 4e7da703304c7783e9771e931e39854adf6458d6 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #7918: [MINOR] Fix spark sql run clean do not exit

2023-02-13 Thread via GitHub
hudi-bot commented on PR #7918: URL: https://github.com/apache/hudi/pull/7918#issuecomment-1429271305 ## CI report: * 0f35441097e274abe020127c5bd2a5f3d46e0b99 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #7918: [MINOR] Fix spark sql run clean do not exit

2023-02-13 Thread via GitHub
hudi-bot commented on PR #7918: URL: https://github.com/apache/hudi/pull/7918#issuecomment-1429261788 ## CI report: * 0f35441097e274abe020127c5bd2a5f3d46e0b99 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #7938: [HUDI-5785] Enhance Spark Datasource tests

2023-02-13 Thread via GitHub
hudi-bot commented on PR #7938: URL: https://github.com/apache/hudi/pull/7938#issuecomment-1429261925 ## CI report: * 4e7da703304c7783e9771e931e39854adf6458d6 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[GitHub] [hudi] nsivabalan commented on a diff in pull request #7939: [MINOR] Updating Index page to include bucket and consistent hashing index

2023-02-13 Thread via GitHub
nsivabalan commented on code in PR #7939: URL: https://github.com/apache/hudi/pull/7939#discussion_r1105401313 ## website/docs/indexing.md: ## @@ -27,6 +27,13 @@ Currently, Hudi supports the following indexing options. - **HBase Index:** Manages the index mapping in an

[GitHub] [hudi] nsivabalan opened a new pull request, #7939: [MINOR] Updating Index page to include bucket and consistent hashing index

2023-02-13 Thread via GitHub
nsivabalan opened a new pull request, #7939: URL: https://github.com/apache/hudi/pull/7939 ### Change Logs Updating Index page to include bucket and consistent hashing index ### Impact Updating Index page to include bucket and consistent hashing index ### Risk

[GitHub] [hudi] hudi-bot commented on pull request #7937: [MINOR] Fixing RFC 48 title

2023-02-13 Thread via GitHub
hudi-bot commented on PR #7937: URL: https://github.com/apache/hudi/pull/7937#issuecomment-1429255402 ## CI report: * c330326ec68358875a2336bc133e33a36526925b Azure:

[GitHub] [hudi] Zouxxyy commented on a diff in pull request #7865: [HUDI-5710] Load all partitions in advance when using KEEP_LATEST_FILE_VERSIONS clean policy and MDT enable

2023-02-13 Thread via GitHub
Zouxxyy commented on code in PR #7865: URL: https://github.com/apache/hudi/pull/7865#discussion_r1105393423 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java: ## @@ -101,6 +102,16 @@ public CleanPlanner(HoodieEngineContext

[GitHub] [hudi] Zouxxyy commented on a diff in pull request #7865: [HUDI-5710] Load all partitions in advance when using KEEP_LATEST_FILE_VERSIONS clean policy and MDT enable

2023-02-13 Thread via GitHub
Zouxxyy commented on code in PR #7865: URL: https://github.com/apache/hudi/pull/7865#discussion_r1105393423 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java: ## @@ -101,6 +102,16 @@ public CleanPlanner(HoodieEngineContext

[GitHub] [hudi] Zouxxyy commented on a diff in pull request #7865: [HUDI-5710] Load all partitions in advance when using KEEP_LATEST_FILE_VERSIONS clean policy and MDT enable

2023-02-13 Thread via GitHub
Zouxxyy commented on code in PR #7865: URL: https://github.com/apache/hudi/pull/7865#discussion_r1105393423 ## hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java: ## @@ -101,6 +102,16 @@ public CleanPlanner(HoodieEngineContext

[GitHub] [hudi] stream2000 commented on pull request #7918: [MINOR] Fix spark sql run clean do not exit

2023-02-13 Thread via GitHub
stream2000 commented on PR #7918: URL: https://github.com/apache/hudi/pull/7918#issuecomment-1429231915 > > Make Timer as a daemon thread can solve the issue but currently Timer is implemented in java.utils so maybe we need to implement out own Timer to make it a daemon > >

[jira] [Updated] (HUDI-5785) Troubleshoot why TestMetadataTableWithSparkDataSource does not catch read issue

2023-02-13 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-5785: - Labels: pull-request-available (was: ) > Troubleshoot why TestMetadataTableWithSparkDataSource

[GitHub] [hudi] yihua opened a new pull request, #7938: [HUDI-5785] Enhance Spark Datasource tests

2023-02-13 Thread via GitHub
yihua opened a new pull request, #7938: URL: https://github.com/apache/hudi/pull/7938 ### Change Logs Previously, we found that Spark Datasource read of metadata table was broken and the issue is fixed by #7924. However, the `TestMetadataTableWithSparkDataSource` guarding the exact

[GitHub] [hudi] xiarixiaoyao commented on pull request #7915: [HUDI-5759] Supports add column on mor table with log

2023-02-13 Thread via GitHub
xiarixiaoyao commented on PR #7915: URL: https://github.com/apache/hudi/pull/7915#issuecomment-1429220130 @qidian99 could you pls help me reproduce this problems. thanks i use the latest master branch test on spark3.2 and spark3.3 everythings is ok. -- This is an

[jira] [Updated] (HUDI-5788) Simplify APIs of constructing Hudi Spark configs

2023-02-13 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5788: Description: Currently, there are many APIs of of constructing Hudi Spark configs in a similar way (search

[jira] [Updated] (HUDI-5788) Simplify APIs of constructing Hudi Spark configs

2023-02-13 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5788: Description: Currently there are many APIsĀ  > Simplify APIs of constructing Hudi Spark configs >

[jira] [Created] (HUDI-5788) Simplify APIs of constructing Hudi Spark configs

2023-02-13 Thread Ethan Guo (Jira)
Ethan Guo created HUDI-5788: --- Summary: Simplify APIs of constructing Hudi Spark configs Key: HUDI-5788 URL: https://issues.apache.org/jira/browse/HUDI-5788 Project: Apache Hudi Issue Type:

[jira] [Updated] (HUDI-5788) Simplify APIs of constructing Hudi Spark configs

2023-02-13 Thread Ethan Guo (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Guo updated HUDI-5788: Fix Version/s: 0.14.0 > Simplify APIs of constructing Hudi Spark configs >

[GitHub] [hudi] hudi-bot commented on pull request #7931: [HUDI-5773] Support archive command for spark sql

2023-02-13 Thread via GitHub
hudi-bot commented on PR #7931: URL: https://github.com/apache/hudi/pull/7931#issuecomment-1429207126 ## CI report: * 978d8b7b51f80bbeb22891c53d85b2ca5e166efd Azure:

[GitHub] [hudi] KnightChess commented on pull request #7918: [MINOR] Fix spark sql run clean do not exit

2023-02-13 Thread via GitHub
KnightChess commented on PR #7918: URL: https://github.com/apache/hudi/pull/7918#issuecomment-1429203571 > Make Timer as a daemon thread can solve the issue but currently Timer is implemented in java.utils so maybe we need to implement out own Timer to make it a daemon @stream2000

[GitHub] [hudi] hudi-bot commented on pull request #7931: [HUDI-5773] Support archive command for spark sql

2023-02-13 Thread via GitHub
hudi-bot commented on PR #7931: URL: https://github.com/apache/hudi/pull/7931#issuecomment-1429199154 ## CI report: * 978d8b7b51f80bbeb22891c53d85b2ca5e166efd Azure:

[GitHub] [hudi] stream2000 commented on pull request #7918: [MINOR] Fix spark sql run clean do not exit

2023-02-13 Thread via GitHub
stream2000 commented on PR #7918: URL: https://github.com/apache/hudi/pull/7918#issuecomment-1429195108 > How about make Timer as daemon thread? Make Timer as a daemon thread can solve the issue but currently Timer is implemented in java.utils so maybe we need to implement out own

[GitHub] [hudi] hudi-bot commented on pull request #7868: [HUDI-1593] Add support for "show restores" and "show restore" commands in hudi-cli

2023-02-13 Thread via GitHub
hudi-bot commented on PR #7868: URL: https://github.com/apache/hudi/pull/7868#issuecomment-1429193581 ## CI report: * 5b6f539ecdc4ba84b7b509b43bf4c3836c575dca Azure:

[jira] [Updated] (HUDI-5754) Add detailed description of GCS Incr, Proto Kafka, and Pulsar Sources in Deltastreamer page

2023-02-13 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-5754: - Labels: pull-request-available (was: ) > Add detailed description of GCS Incr, Proto Kafka, and

[GitHub] [hudi] pramodbiligiri commented on a diff in pull request #7929: [HUDI-5754] Add new sources to deltastreamer docs

2023-02-13 Thread via GitHub
pramodbiligiri commented on code in PR #7929: URL: https://github.com/apache/hudi/pull/7929#discussion_r1105341974 ## website/docs/hoodie_deltastreamer.md: ## @@ -340,6 +388,26 @@ to trigger/processing of new or changed data as soon as it is available on S3. Insert code

[hudi] branch master updated (d395f058183 -> 71a62627cdd)

2023-02-13 Thread forwardxu
This is an automated email from the ASF dual-hosted git repository. forwardxu pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from d395f058183 [HUDI-3580] [RFC-48] Create RFC for LogCompaction support to Hudi (#5041) add 71a62627cdd

[GitHub] [hudi] XuQianJin-Stars merged pull request #7928: [HUDI-5772] Align Flink clustering configuration with HoodieClusteringConfig

2023-02-13 Thread via GitHub
XuQianJin-Stars merged PR #7928: URL: https://github.com/apache/hudi/pull/7928 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [hudi] hudi-bot commented on pull request #7868: [HUDI-1593] Add support for "show restores" and "show restore" commands in hudi-cli

2023-02-13 Thread via GitHub
hudi-bot commented on PR #7868: URL: https://github.com/apache/hudi/pull/7868#issuecomment-1429156834 ## CI report: * 5b6f539ecdc4ba84b7b509b43bf4c3836c575dca Azure:

[GitHub] [hudi] pramodbiligiri commented on a diff in pull request #7868: [HUDI-1593] Add support for "show restores" and "show restore" commands in hudi-cli

2023-02-13 Thread via GitHub
pramodbiligiri commented on code in PR #7868: URL: https://github.com/apache/hudi/pull/7868#discussion_r1105318602 ## hudi-cli/src/main/java/org/apache/hudi/cli/commands/RestoresCommand.java: ## @@ -0,0 +1,172 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] [hudi] codope commented on pull request #7930: Website documention for GCS Ingestion

2023-02-13 Thread via GitHub
codope commented on PR #7930: URL: https://github.com/apache/hudi/pull/7930#issuecomment-1429153146 @pramodbiligiri Thanks for the docs. I am adding more sources to the deltastreamer docs and have taken your setup steps in https://github.com/apache/hudi/pull/7929 Closing it in favor of

[GitHub] [hudi] codope closed pull request #7930: Website documention for GCS Ingestion

2023-02-13 Thread via GitHub
codope closed pull request #7930: Website documention for GCS Ingestion URL: https://github.com/apache/hudi/pull/7930 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [hudi] hudi-bot commented on pull request #7937: [MINOR] Fixing RFC 48 title

2023-02-13 Thread via GitHub
hudi-bot commented on PR #7937: URL: https://github.com/apache/hudi/pull/7937#issuecomment-1429152928 ## CI report: * c330326ec68358875a2336bc133e33a36526925b Azure:

[jira] [Updated] (HUDI-5754) Add detailed description of GCS Incr, Proto Kafka, and Pulsar Sources in Deltastreamer page

2023-02-13 Thread Sagar Sumit (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sagar Sumit updated HUDI-5754: -- Status: In Progress (was: Open) > Add detailed description of GCS Incr, Proto Kafka, and Pulsar

[GitHub] [hudi] hudi-bot commented on pull request #7937: [MINOR] Fixing RFC 48 title

2023-02-13 Thread via GitHub
hudi-bot commented on PR #7937: URL: https://github.com/apache/hudi/pull/7937#issuecomment-1429147695 ## CI report: * c330326ec68358875a2336bc133e33a36526925b UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the

[GitHub] [hudi] nsivabalan opened a new pull request, #7937: [MINOR] Fixing RFC 48 title

2023-02-13 Thread via GitHub
nsivabalan opened a new pull request, #7937: URL: https://github.com/apache/hudi/pull/7937 ### Change Logs Fixing RFC 48 title ### Impact _Describe any public API or user-facing feature change or any performance impact._ ### Risk level (write none, low medium or

[GitHub] [hudi] pramodbiligiri commented on a diff in pull request #7864: [HUDI-5688] Small workaround that can prevent NPE of EmptyRelation.schema

2023-02-13 Thread via GitHub
pramodbiligiri commented on code in PR #7864: URL: https://github.com/apache/hudi/pull/7864#discussion_r1105298405 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DefaultSource.scala: ## @@ -241,7 +241,12 @@ object DefaultSource { } if

[GitHub] [hudi] KnightChess commented on issue #7835: [SUPPORT] Hudi bootstrapping with METADATA_ONLY option is re-writing the complete dataset instead of just creating HUDI metadata skeleton files se

2023-02-13 Thread via GitHub
KnightChess commented on issue #7835: URL: https://github.com/apache/hudi/issues/7835#issuecomment-1429116011 what abount set operation type bootstrap, look like this parameter be missing, `DataSourceWriteOptions.OPERATION_OPT_KEY, DataSourceWriteOptions.BOOTSTRAP_OPERATION_OPT_VAL` --

[jira] [Updated] (HUDI-5665) Re-use table configs for subsequent writes

2023-02-13 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5665: - Sprint: Sprint 2023-02-14 > Re-use table configs for subsequent writes >

[jira] [Updated] (HUDI-5744) 0.13.0 release note part 2

2023-02-13 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5744: - Sprint: Sprint 2023-01-31, Sprint 2023-02-14 (was: Sprint 2023-01-31) > 0.13.0 release note part 2 >

[jira] [Updated] (HUDI-3601) Support multi-arch builds in docker setup

2023-02-13 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3601: - Sprint: 2022/09/05, 2022/09/19, 2022/10/04, 2022/10/18, 2022/11/01, 2022/11/15, 2022/11/29, 2022/12/12,

[jira] [Updated] (HUDI-5753) Add feature docs for Record Payload

2023-02-13 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5753: - Sprint: Sprint 2023-01-31, Sprint 2023-02-14 (was: Sprint 2023-01-31) > Add feature docs for Record

[jira] [Updated] (HUDI-5552) Too slow while using trino-hudi connector while querying partitioned tables.

2023-02-13 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5552: - Sprint: 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3, Sprint 2023-01-31, Sprint 2023-02-14 (was: 0.13.0

[jira] [Updated] (HUDI-5656) Metadata Bootstrap flow resulting in NPE

2023-02-13 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5656: - Sprint: 0.13.0 Final Sprint 3, Sprint 2023-01-31, Sprint 2023-02-14 (was: 0.13.0 Final Sprint 3, Sprint

[jira] [Updated] (HUDI-5745) 0.13.0 release note part 3

2023-02-13 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5745: - Sprint: Sprint 2023-01-31, Sprint 2023-02-14 (was: Sprint 2023-01-31) > 0.13.0 release note part 3 >

[jira] [Updated] (HUDI-5769) Partitions created by Async indexer could be deleted by regular writers

2023-02-13 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5769: - Sprint: Sprint 2023-01-31, Sprint 2023-02-14 (was: Sprint 2023-01-31) > Partitions created by Async

[jira] [Updated] (HUDI-5510) The latest written commit is not used when getInstantsToArchive

2023-02-13 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5510: - Sprint: 0.13.0 Final Sprint 3, Sprint 2023-01-31, Sprint 2023-02-14 (was: 0.13.0 Final Sprint 3, Sprint

[jira] [Updated] (HUDI-5642) Enable schema reconciliation by default

2023-02-13 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5642: - Sprint: 0.13.0 Final Sprint 3, Sprint 2023-01-31, Sprint 2023-02-14 (was: 0.13.0 Final Sprint 3, Sprint

[jira] [Updated] (HUDI-5323) Decouple virtual key with writing bloom filters to parquet files

2023-02-13 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5323: - Sprint: 2022/12/12, 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3, Sprint 2023-01-31,

[jira] [Updated] (HUDI-5464) Fix instantiation of a new partition in MDT re-using the same instant time as a regular commit

2023-02-13 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5464: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3, Sprint 2023-01-31, Sprint

[jira] [Updated] (HUDI-5758) MOR table w/ delete block in 0.12.2 not readable in 0.13 and also not compactable

2023-02-13 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5758: - Sprint: Sprint 2023-01-31, Sprint 2023-02-14 (was: Sprint 2023-01-31) > MOR table w/ delete block in

[jira] [Updated] (HUDI-5649) Unify all the loggers to slf4j

2023-02-13 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5649: - Sprint: 0.13.0 Final Sprint 3, Sprint 2023-01-31, Sprint 2023-02-14 (was: 0.13.0 Final Sprint 3, Sprint

[jira] [Updated] (HUDI-5751) Add feature docs for Metaserver

2023-02-13 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5751: - Sprint: Sprint 2023-01-31, Sprint 2023-02-14 (was: Sprint 2023-01-31) > Add feature docs for Metaserver

[jira] [Updated] (HUDI-5750) 0.13.0 release note part 8

2023-02-13 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5750: - Sprint: Sprint 2023-01-31, Sprint 2023-02-14 (was: Sprint 2023-01-31) > 0.13.0 release note part 8 >

[jira] [Updated] (HUDI-5746) 0.13.0 release note part 4

2023-02-13 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5746: - Sprint: Sprint 2023-01-31, Sprint 2023-02-14 (was: Sprint 2023-01-31) > 0.13.0 release note part 4 >

[jira] [Updated] (HUDI-5754) Add detailed description of GCS Incr, Proto Kafka, and Pulsar Sources in Deltastreamer page

2023-02-13 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5754: - Sprint: Sprint 2023-01-31, Sprint 2023-02-14 (was: Sprint 2023-01-31) > Add detailed description of GCS

[jira] [Updated] (HUDI-5743) 0.13.0 release note part 1

2023-02-13 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5743: - Sprint: Sprint 2023-01-31, Sprint 2023-02-14 (was: Sprint 2023-01-31) > 0.13.0 release note part 1 >

[jira] [Updated] (HUDI-5752) Add feature docs for Change Data Capture

2023-02-13 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5752: - Sprint: Sprint 2023-01-31, Sprint 2023-02-14 (was: Sprint 2023-01-31) > Add feature docs for Change Data

[jira] [Updated] (HUDI-5641) Streamline Advanced Schema Evolution flow

2023-02-13 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5641: - Sprint: 0.13.0 Final Sprint 3, Sprint 2023-01-31, Sprint 2023-02-14 (was: 0.13.0 Final Sprint 3, Sprint

[jira] [Updated] (HUDI-5520) Fail MDT when list of log files grows unboundedly

2023-02-13 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5520: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, Sprint 2023-01-31, Sprint 2023-02-14 (was: 0.13.0

[jira] [Updated] (HUDI-5475) not able to generate utilities-slim bundle dependency tree

2023-02-13 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5475: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, Sprint 2023-01-31, Sprint 2023-02-14 (was: 0.13.0

[jira] [Updated] (HUDI-5767) Add known regression of Hive Sync performance to release notes

2023-02-13 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5767: - Sprint: Sprint 2023-01-31, Sprint 2023-02-14 (was: Sprint 2023-01-31) > Add known regression of Hive

[jira] [Updated] (HUDI-5685) Fix performance gap in Bulk Insert row-writing path with enabled de-duplication

2023-02-13 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5685: - Sprint: Sprint 2023-01-31, Sprint 2023-02-14 (was: Sprint 2023-01-31) > Fix performance gap in Bulk

[jira] [Updated] (HUDI-5755) Add detailed description of OCC early conflict detection to concurrency control docs

2023-02-13 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5755: - Sprint: Sprint 2023-01-31, Sprint 2023-02-14 (was: Sprint 2023-01-31) > Add detailed description of OCC

[jira] [Updated] (HUDI-5756) Add Consistent Hashing Index to Indexing docs

2023-02-13 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5756: - Sprint: Sprint 2023-01-31, Sprint 2023-02-14 (was: Sprint 2023-01-31) > Add Consistent Hashing Index to

[jira] [Updated] (HUDI-5757) Add Log Compaction to Write Operation docs

2023-02-13 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5757: - Sprint: Sprint 2023-01-31, Sprint 2023-02-14 (was: Sprint 2023-01-31) > Add Log Compaction to Write

[jira] [Updated] (HUDI-5677) [DOCS] Update AWS libs version

2023-02-13 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5677: - Sprint: Sprint 2023-01-31, Sprint 2023-02-14 (was: Sprint 2023-01-31) > [DOCS] Update AWS libs version >

[jira] [Updated] (HUDI-5747) 0.13.0 release note part 5

2023-02-13 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5747: - Sprint: Sprint 2023-01-31, Sprint 2023-02-14 (was: Sprint 2023-01-31) > 0.13.0 release note part 5 >

[jira] [Updated] (HUDI-5748) 0.13.0 release note part 6

2023-02-13 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5748: - Sprint: Sprint 2023-01-31, Sprint 2023-02-14 (was: Sprint 2023-01-31) > 0.13.0 release note part 6 >

[jira] [Updated] (HUDI-3636) Clustering fails due to marker creation failure

2023-02-13 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3636: - Sprint: 2022/08/22, 2022/09/05, 2022/09/19, 2022/10/04, 2022/10/18, 2022/11/01, 2022/11/29, 2022/12/12,

[jira] [Updated] (HUDI-3775) Allow for offline compaction of MOR tables via spark streaming

2023-02-13 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3775: - Sprint: 2022/09/05, 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3, Sprint 2023-01-31,

[jira] [Updated] (HUDI-1574) Trim existing unit tests to finish in much shorter amount of time

2023-02-13 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-1574: - Sprint: 2022/08/22, 2022/09/05, 2022/09/19, 2022/10/04, 2022/10/18, 2022/11/01, 2022/11/15, 2022/11/29,

[jira] [Updated] (HUDI-5442) Fix HiveHoodieTableFileIndex to use lazy listing

2023-02-13 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5442: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3, Sprint 2023-01-31, Sprint

[jira] [Updated] (HUDI-5569) Files written by first commit/delta commit if it failed is detected as valid data files

2023-02-13 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5569: - Sprint: 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3, Sprint 2023-01-31, Sprint 2023-02-14 (was: 0.13.0

[jira] [Updated] (HUDI-2681) Make hoodie record_key and preCombine_key optional

2023-02-13 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-2681: - Sprint: 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3, Sprint 2023-01-31, Sprint 2023-02-14 (was: 0.13.0

[jira] [Updated] (HUDI-5616) Docs update for specifying org.apache.spark.HoodieSparkKryoRegistrar

2023-02-13 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5616: - Sprint: 0.13.0 Final Sprint 3, Sprint 2023-01-31, Sprint 2023-02-14 (was: 0.13.0 Final Sprint 3, Sprint

[jira] [Updated] (HUDI-3529) Improve dependency management and bundling

2023-02-13 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3529: - Sprint: 2022/08/22, 2022/09/05, 2022/09/19, 2022/10/04, 2022/10/18, 2022/11/01, 2022/11/15, 2022/11/29,

[jira] [Updated] (HUDI-5602) Troubleshoot METADATA_ONLY bootstrapped table not being able to read back partition path

2023-02-13 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5602: - Sprint: Sprint 2023-01-31, Sprint 2023-02-14 (was: Sprint 2023-01-31) > Troubleshoot METADATA_ONLY

[jira] [Updated] (HUDI-5352) Jackson fails to serialize LocalDate when updating Delta Commit metadata

2023-02-13 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5352: - Sprint: 2022/12/12, 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3, Sprint 2023-01-31,

[jira] [Updated] (HUDI-3967) Automatic savepoint in Hudi

2023-02-13 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3967: - Sprint: 2022/08/22, 2022/09/05, 2022/09/19, 2022/10/04, 2022/10/18, 2022/11/01, 2022/11/15, 2022/11/29,

[jira] [Updated] (HUDI-3249) Performance Improvements

2023-02-13 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-3249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-3249: - Sprint: 2022/08/22, 2022/09/05, 2022/09/19, 2022/10/04, 2022/10/18, 2022/11/01, 2022/11/15, 2022/11/29,

[jira] [Updated] (HUDI-5423) Flaky test: ColumnStatsTestCase(MERGE_ON_READ,true,true)

2023-02-13 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5423: - Sprint: 0.13.0 Final Sprint, Sprint 2023-01-31, Sprint 2023-02-14 (was: 0.13.0 Final Sprint, Sprint

[jira] [Updated] (HUDI-4613) Avoid the use of regex expressions when call hoodieFileGroup#addLogFile function

2023-02-13 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-4613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-4613: - Sprint: 2022/09/05, 2022/12/12, 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3, Sprint

[jira] [Updated] (HUDI-5321) Fix Bulk Insert ColumnSortPartitioners

2023-02-13 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5321: - Sprint: 2022/12/12, 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final Sprint 3, Sprint 2023-01-31,

[jira] [Updated] (HUDI-5238) Hudi throwing "PipeBroken" exception during Merging on GCS

2023-02-13 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5238: - Sprint: 2022/11/15, 2022/11/29, 2022/12/12, 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, 0.13.0 Final

[jira] [Updated] (HUDI-5498) Update docs for reading Hudi tables on Databricks runtime

2023-02-13 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-5498: - Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, Sprint 2023-01-31, Sprint 2023-02-14 (was: 0.13.0

[jira] [Updated] (HUDI-83) Map Timestamp type in spark to corresponding Timestamp type in Hive during Hive sync

2023-02-13 Thread Raymond Xu (Jira)
[ https://issues.apache.org/jira/browse/HUDI-83?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raymond Xu updated HUDI-83: --- Sprint: 0.13.0 Final Sprint, 0.13.0 Final Sprint 2, Sprint 2023-01-31, Sprint 2023-02-14 (was: 0.13.0 Final

[GitHub] [hudi] KnightChess commented on pull request #7918: [MINOR] Fix spark sql run clean do not exit

2023-02-13 Thread via GitHub
KnightChess commented on PR #7918: URL: https://github.com/apache/hudi/pull/7918#issuecomment-1429101102 How about make Timer as daemon thread? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [hudi] hudi-bot commented on pull request #7918: [MINOR] Fix spark sql run clean do not exit

2023-02-13 Thread via GitHub
hudi-bot commented on PR #7918: URL: https://github.com/apache/hudi/pull/7918#issuecomment-1429099675 ## CI report: * 0f35441097e274abe020127c5bd2a5f3d46e0b99 Azure:

[GitHub] [hudi] hudi-bot commented on pull request #7792: [MINOR] Improving logging and tracking of failures for metadata validator

2023-02-13 Thread via GitHub
hudi-bot commented on PR #7792: URL: https://github.com/apache/hudi/pull/7792#issuecomment-1429099455 ## CI report: * 6795e0cbe9f6ebc75f54d2a7e6a6594f8ed3622a UNKNOWN * ff77639bedacc62b7026784674f1c353d41c791c Azure:

[GitHub] [hudi] KnightChess commented on pull request #7808: [MINOR] use ExecutorFactory in BootstrapHandler

2023-02-13 Thread via GitHub
KnightChess commented on PR #7808: URL: https://github.com/apache/hudi/pull/7808#issuecomment-1429095976 @nsivabalan cc :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[jira] [Assigned] (HUDI-5787) HoodieHiveCatalog should not delete data for dropping external table

2023-02-13 Thread Nicholas Jiang (Jira)
[ https://issues.apache.org/jira/browse/HUDI-5787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Jiang reassigned HUDI-5787: Assignee: Nicholas Jiang > HoodieHiveCatalog should not delete data for dropping external

[jira] [Created] (HUDI-5787) HoodieHiveCatalog should not delete data for dropping external table

2023-02-13 Thread Nicholas Jiang (Jira)
Nicholas Jiang created HUDI-5787: Summary: HoodieHiveCatalog should not delete data for dropping external table Key: HUDI-5787 URL: https://issues.apache.org/jira/browse/HUDI-5787 Project: Apache

[GitHub] [hudi] koochiswathiTR commented on issue #7909: Failed to create Marker file

2023-02-13 Thread via GitHub
koochiswathiTR commented on issue #7909: URL: https://github.com/apache/hudi/issues/7909#issuecomment-1429074291 @xushiyan xushiyan This issue is repeating. We see this issue more often. -- This is an automated message from the Apache Git Service. To respond to the message, please

[jira] [Created] (HUDI-5786) Add a new config to specifies the cache level for the rdd spark write to hudi

2023-02-13 Thread ShiHang Gao (Jira)
ShiHang Gao created HUDI-5786: - Summary: Add a new config to specifies the cache level for the rdd spark write to hudi Key: HUDI-5786 URL: https://issues.apache.org/jira/browse/HUDI-5786 Project: Apache

[GitHub] [hudi] xiarixiaoyao commented on a diff in pull request #7931: [HUDI-5773] Support archive command for spark sql

2023-02-13 Thread via GitHub
xiarixiaoyao commented on code in PR #7931: URL: https://github.com/apache/hudi/pull/7931#discussion_r1105243406 ## hudi-spark-datasource/hudi-spark/src/main/java/org/apache/hudi/cli/ArchiveExecutorUtils.java: ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Software

[GitHub] [hudi] Leoyzen opened a new issue, #7936: [SUPPORT]Flink HiveCatalog should respect 'managed_table' options to avoid deleting data unexpectable.

2023-02-13 Thread via GitHub
Leoyzen opened a new issue, #7936: URL: https://github.com/apache/hudi/issues/7936 **Describe the problem you faced** Currently it is unacceptable when using drop table statement which the table managed by hive catalog will unexpectedly deleting data. There is also an

[GitHub] [hudi] xiarixiaoyao commented on a diff in pull request #7915: [HUDI-5759] Supports add column on mor table with log

2023-02-13 Thread via GitHub
xiarixiaoyao commented on code in PR #7915: URL: https://github.com/apache/hudi/pull/7915#discussion_r1105237778 ## hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestUpdateTable.scala: ## @@ -204,4 +204,48 @@ class TestUpdateTable extends

  1   2   3   4   >