Re: [PR] [HUDI-6872] Simplify Out Of Box Schema Evolution Functionality [hudi]
hudi-bot commented on PR #9743: URL: https://github.com/apache/hudi/pull/9743#issuecomment-1803296619 ## CI report: * 097ef6176650413eef2a4c3581ca6e48ea43788f UNKNOWN * e32b58f7ce1880568566be0c8a6940ae2f3a1016 UNKNOWN * d3e58fee4a8cfb636419d9eeb50fa2480b56e2c1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20758) * a1ab15d9c36c9afd9626b63b30897f08efadbf69 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20765) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6872][DNM] Simplify Out Of Box Schema Evolution Functionality (test PR to check CI) [hudi]
hudi-bot commented on PR #10028: URL: https://github.com/apache/hudi/pull/10028#issuecomment-1803289243 ## CI report: * 4c36398beb08a1eb91959ce151ebf8fd159bb8d6 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20763) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7054][FOLLOW_UP] HoodieCatalogTable should ignore lazily deleted partitions [hudi]
hudi-bot commented on PR #10024: URL: https://github.com/apache/hudi/pull/10024#issuecomment-1803289144 ## CI report: * 17566af6a57a43de3b2a5d5ab7bcd8bd28336307 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20752) * 7c0ff25207cc19c5496dc4e7e688c3a8527663d2 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20766) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6872] Simplify Out Of Box Schema Evolution Functionality [hudi]
hudi-bot commented on PR #9743: URL: https://github.com/apache/hudi/pull/9743#issuecomment-1803288697 ## CI report: * 097ef6176650413eef2a4c3581ca6e48ea43788f UNKNOWN * e32b58f7ce1880568566be0c8a6940ae2f3a1016 UNKNOWN * d3e58fee4a8cfb636419d9eeb50fa2480b56e2c1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20758) * a1ab15d9c36c9afd9626b63b30897f08efadbf69 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7054][FOLLOW_UP] HoodieCatalogTable should ignore lazily deleted partitions [hudi]
hudi-bot commented on PR #10024: URL: https://github.com/apache/hudi/pull/10024#issuecomment-1803281654 ## CI report: * 17566af6a57a43de3b2a5d5ab7bcd8bd28336307 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20752) * 7c0ff25207cc19c5496dc4e7e688c3a8527663d2 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6872][DNM] Simplify Out Of Box Schema Evolution Functionality (test PR to check CI) [hudi]
hudi-bot commented on PR #10028: URL: https://github.com/apache/hudi/pull/10028#issuecomment-1803281756 ## CI report: * 91a7b8640540e841522e33c001fc8c0f4abddb4a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20761) * 4c36398beb08a1eb91959ce151ebf8fd159bb8d6 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20763) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (HUDI-6992) IncrementalInputSplits incorrectly set the latestCommit attr
[ https://issues.apache.org/jira/browse/HUDI-6992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen closed HUDI-6992. Resolution: Fixed Fixed via master branch: 44ca2bbcfd1512a55155e1033a9c9aca132efae6 > IncrementalInputSplits incorrectly set the latestCommit attr > > > Key: HUDI-6992 > URL: https://issues.apache.org/jira/browse/HUDI-6992 > Project: Apache Hudi > Issue Type: Improvement > Components: reader-core >Reporter: zhuanshenbsj1 >Priority: Minor > Labels: pull-request-available > Fix For: 1.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
(hudi) branch master updated: [HUDI-6992] IncrementalInputSplits incorrectly set the latestCommit attribute (#9923)
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 44ca2bbcfd1 [HUDI-6992] IncrementalInputSplits incorrectly set the latestCommit attribute (#9923) 44ca2bbcfd1 is described below commit 44ca2bbcfd1512a55155e1033a9c9aca132efae6 Author: zhuanshenbsj1 <34104400+zhuanshenb...@users.noreply.github.com> AuthorDate: Thu Nov 9 15:04:50 2023 +0800 [HUDI-6992] IncrementalInputSplits incorrectly set the latestCommit attribute (#9923) --- .../org/apache/hudi/common/model/FileSlice.java| 13 -- .../table/timeline/CompletionTimeQueryView.java| 7 +-- .../hudi/common/table/timeline/HoodieTimeline.java | 14 ++ .../apache/hudi/common/model/TestFileSlice.java| 50 ++ .../apache/hudi/source/IncrementalInputSplits.java | 7 ++- .../hudi/source/TestIncrementalInputSplits.java| 47 .../source/TestStreamReadMonitoringFunction.java | 12 +++--- 7 files changed, 134 insertions(+), 16 deletions(-) diff --git a/hudi-common/src/main/java/org/apache/hudi/common/model/FileSlice.java b/hudi-common/src/main/java/org/apache/hudi/common/model/FileSlice.java index 3f0fcf94156..d071385ea75 100644 --- a/hudi-common/src/main/java/org/apache/hudi/common/model/FileSlice.java +++ b/hudi-common/src/main/java/org/apache/hudi/common/model/FileSlice.java @@ -18,6 +18,7 @@ package org.apache.hudi.common.model; +import org.apache.hudi.common.table.timeline.HoodieTimeline; import org.apache.hudi.common.util.Option; import java.io.Serializable; @@ -123,9 +124,15 @@ public class FileSlice implements Serializable { } /** - * Returns true if there is no data file and no log files. Happens as part of pending compaction - * - * @return + * Returns the latest instant time of the file slice. + */ + public String getLatestInstantTime() { +Option latestDeltaCommitTime = getLatestLogFile().map(HoodieLogFile::getDeltaCommitTime); +return latestDeltaCommitTime.isPresent() ? HoodieTimeline.maxInstant(latestDeltaCommitTime.get(), getBaseInstantTime()) : getBaseInstantTime(); + } + + /** + * Returns true if there is no data file and no log files. Happens as part of pending compaction. */ public boolean isEmpty() { return (baseFile == null) && (logFiles.isEmpty()); diff --git a/hudi-common/src/main/java/org/apache/hudi/common/table/timeline/CompletionTimeQueryView.java b/hudi-common/src/main/java/org/apache/hudi/common/table/timeline/CompletionTimeQueryView.java index 081cae8cb15..e53f185bffd 100644 --- a/hudi-common/src/main/java/org/apache/hudi/common/table/timeline/CompletionTimeQueryView.java +++ b/hudi-common/src/main/java/org/apache/hudi/common/table/timeline/CompletionTimeQueryView.java @@ -32,7 +32,6 @@ import java.util.concurrent.ConcurrentHashMap; import static org.apache.hudi.common.table.timeline.HoodieArchivedTimeline.COMPLETION_TIME_ARCHIVED_META_FIELD; import static org.apache.hudi.common.table.timeline.HoodieTimeline.GREATER_THAN_OR_EQUALS; import static org.apache.hudi.common.table.timeline.HoodieTimeline.LESSER_THAN; -import static org.apache.hudi.common.table.timeline.HoodieTimeline.compareTimestamps; /** * Query view for instant completion time. @@ -81,7 +80,7 @@ public class CompletionTimeQueryView implements AutoCloseable, Serializable { public CompletionTimeQueryView(HoodieTableMetaClient metaClient, String cursorInstant) { this.metaClient = metaClient; this.startToCompletionInstantTimeMap = new ConcurrentHashMap<>(); -this.cursorInstant = minInstant(cursorInstant, metaClient.getActiveTimeline().firstInstant().map(HoodieInstant::getTimestamp).orElse("")); +this.cursorInstant = HoodieTimeline.minInstant(cursorInstant, metaClient.getActiveTimeline().firstInstant().map(HoodieInstant::getTimestamp).orElse("")); // Note: use getWriteTimeline() to keep sync with the fs view visibleCommitsAndCompactionTimeline, see AbstractTableFileSystemView.refreshTimeline. this.firstNonSavepointCommit = metaClient.getActiveTimeline().getWriteTimeline().getFirstNonSavepointCommit().map(HoodieInstant::getTimestamp).orElse(""); load(); @@ -207,10 +206,6 @@ public class CompletionTimeQueryView implements AutoCloseable, Serializable { this.startToCompletionInstantTimeMap.putIfAbsent(instantTime, completionTime); } - private static String minInstant(String instant1, String instant2) { -return compareTimestamps(instant1, LESSER_THAN, instant2) ? instant1 : instant2; - } - public String getCursorInstant() { return cursorInstant; } diff --git a/hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieTimeline.java b/hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieTimeline.java index 82ec439bd25..53c7d25a00c 100644
Re: [PR] [HUDI-6992] IncrementalInputSplits incorrectly set the latestCommit attr [hudi]
danny0405 merged PR #9923: URL: https://github.com/apache/hudi/pull/9923 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Some .commit files generated by compaction with 0 size [hudi]
ad1happy2go commented on issue #10017: URL: https://github.com/apache/hudi/issues/10017#issuecomment-1803263076 @zdl1 Can you also share sample code/data what you are trying if possible. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7057] Support CopyToTableProcedure with patitial column copy [hudi]
xuzifu666 commented on PR #10025: URL: https://github.com/apache/hudi/pull/10025#issuecomment-1803255723 CI seems well done @danny0405 PTAL -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7054][FOLLOW_UP] HoodieCatalogTable should ignore lazily deleted partitions [hudi]
boneanxs commented on code in PR #10024: URL: https://github.com/apache/hudi/pull/10024#discussion_r1387540620 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/catalyst/catalog/HoodieCatalogTable.scala: ## @@ -169,9 +170,10 @@ class HoodieCatalogTable(val spark: SparkSession, var table: CatalogTable) exten lazy val partitionSchema: StructType = StructType(tableSchema.filter(f => partitionFields.contains(f.name))) /** - * All the partition paths + * All the partition paths, excludes lazily deleted partitions. */ def getPartitionPaths: Seq[String] = getAllPartitionPaths(spark, table) + .filter(!TimelineUtils.getDroppedPartitions(metaClient.getActiveTimeline).contains(_)) Review Comment: Oh! make senses, thanks for point this, let me fix it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6872][DNM] Simplify Out Of Box Schema Evolution Functionality (test PR to check CI) [hudi]
hudi-bot commented on PR #10028: URL: https://github.com/apache/hudi/pull/10028#issuecomment-1803229404 ## CI report: * 91a7b8640540e841522e33c001fc8c0f4abddb4a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20761) * Unknown: [CANCELED](TBD) * 4c36398beb08a1eb91959ce151ebf8fd159bb8d6 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20763) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7057] Support CopyToTableProcedure with patitial column copy [hudi]
hudi-bot commented on PR #10025: URL: https://github.com/apache/hudi/pull/10025#issuecomment-1803229350 ## CI report: * 80a394967d09baac64231af055c40996dbb2a7fd Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20756) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7053] Fix the filter pushdown logic for file group reader [hudi]
hudi-bot commented on PR #10018: URL: https://github.com/apache/hudi/pull/10018#issuecomment-1803229313 ## CI report: * e420cdf46604699ae6587b9aa13e6e9c0d139d38 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20755) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Change some default configs for 1.0.0-beta [hudi]
hudi-bot commented on PR #9998: URL: https://github.com/apache/hudi/pull/9998#issuecomment-1803229209 ## CI report: * bb197bc8649a9b8dfcd41d9ed3e4af8e0afbeb9b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20762) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6872] Simplify Out Of Box Schema Evolution Functionality [hudi]
hudi-bot commented on PR #9743: URL: https://github.com/apache/hudi/pull/9743#issuecomment-1803228898 ## CI report: * 097ef6176650413eef2a4c3581ca6e48ea43788f UNKNOWN * e32b58f7ce1880568566be0c8a6940ae2f3a1016 UNKNOWN * d3e58fee4a8cfb636419d9eeb50fa2480b56e2c1 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20758) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7054][FOLLOW_UP] HoodieCatalogTable should ignore lazily deleted partitions [hudi]
danny0405 commented on code in PR #10024: URL: https://github.com/apache/hudi/pull/10024#discussion_r1387533789 ## hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/catalyst/catalog/HoodieCatalogTable.scala: ## @@ -169,9 +170,10 @@ class HoodieCatalogTable(val spark: SparkSession, var table: CatalogTable) exten lazy val partitionSchema: StructType = StructType(tableSchema.filter(f => partitionFields.contains(f.name))) /** - * All the partition paths + * All the partition paths, excludes lazily deleted partitions. */ def getPartitionPaths: Seq[String] = getAllPartitionPaths(spark, table) + .filter(!TimelineUtils.getDroppedPartitions(metaClient.getActiveTimeline).contains(_)) Review Comment: Will this trigger the calculation for each partition ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7054][FOLLOW_UP] HoodieCatalogTable should ignore lazily deleted partitions [hudi]
hudi-bot commented on PR #10024: URL: https://github.com/apache/hudi/pull/10024#issuecomment-1803194775 ## CI report: * 17566af6a57a43de3b2a5d5ab7bcd8bd28336307 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20752) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Change some default configs for 1.0.0-beta [hudi]
hudi-bot commented on PR #9998: URL: https://github.com/apache/hudi/pull/9998#issuecomment-1803194686 ## CI report: * 7e450aee63b81c2d28d04d927eadf5ca006e8a19 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20748) * bb197bc8649a9b8dfcd41d9ed3e4af8e0afbeb9b Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20762) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Change some default configs for 1.0.0-beta [hudi]
hudi-bot commented on PR #9998: URL: https://github.com/apache/hudi/pull/9998#issuecomment-1803187055 ## CI report: * 7e450aee63b81c2d28d04d927eadf5ca006e8a19 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20748) * bb197bc8649a9b8dfcd41d9ed3e4af8e0afbeb9b UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6872][DNM] Simplify Out Of Box Schema Evolution Functionality (test PR to check CI) [hudi]
hudi-bot commented on PR #10028: URL: https://github.com/apache/hudi/pull/10028#issuecomment-1803180699 ## CI report: * 91a7b8640540e841522e33c001fc8c0f4abddb4a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20761) * Unknown: [CANCELED](TBD) * 4c36398beb08a1eb91959ce151ebf8fd159bb8d6 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7047] Fix Stack Overflow Caused by partition pruning [hudi]
hudi-bot commented on PR #10026: URL: https://github.com/apache/hudi/pull/10026#issuecomment-1803180657 ## CI report: * ae33aa0b78cb61bcf6c834bfd404d0c7296ac05b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20759) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7044] Skip reading records for delete blocks for positional merging [hudi]
hudi-bot commented on PR #10005: URL: https://github.com/apache/hudi/pull/10005#issuecomment-1803180577 ## CI report: * baaf10ac8ac319fb3e776b33b4386755bf034cb6 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20749) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6872] Simplify Out Of Box Schema Evolution Functionality [hudi]
hudi-bot commented on PR #9743: URL: https://github.com/apache/hudi/pull/9743#issuecomment-1803180312 ## CI report: * 097ef6176650413eef2a4c3581ca6e48ea43788f UNKNOWN * e32b58f7ce1880568566be0c8a6940ae2f3a1016 UNKNOWN * db06d7f6394d26bec65a629ed2b567754d28a46a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20750) * d3e58fee4a8cfb636419d9eeb50fa2480b56e2c1 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20758) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6993] Support Flink 1.18 [hudi]
PrabhuJoseph commented on PR #9949: URL: https://github.com/apache/hudi/pull/9949#issuecomment-1803166707 Thanks @danny0405 for the review and commit. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] DBT Merge creates duplicates [hudi]
faizhasan commented on issue #7244: URL: https://github.com/apache/hudi/issues/7244#issuecomment-1803162436 Hi @amrishlal apologies for the delay. I was able to test this and saw the following behavior. with dbt 1.6.2, dbt-spark adapter to execute models on thriftserver: - AWS EMR 6.10.1 (Hudi 0.12.2-amzn-0) : no duplicates created - AWS EMR 6.11.0 (Hudi 0.13.0-amzn-0) and above: duplicates were created -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6872][DNM] Simplify Out Of Box Schema Evolution Functionality (test PR to check CI) [hudi]
hudi-bot commented on PR #10028: URL: https://github.com/apache/hudi/pull/10028#issuecomment-1803152853 ## CI report: * 91a7b8640540e841522e33c001fc8c0f4abddb4a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20761) * Unknown: [CANCELED](TBD) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6872] Simplify Out Of Box Schema Evolution Functionality [hudi]
hudi-bot commented on PR #9743: URL: https://github.com/apache/hudi/pull/9743#issuecomment-1803152527 ## CI report: * 097ef6176650413eef2a4c3581ca6e48ea43788f UNKNOWN * e32b58f7ce1880568566be0c8a6940ae2f3a1016 UNKNOWN * db06d7f6394d26bec65a629ed2b567754d28a46a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20750) * d3e58fee4a8cfb636419d9eeb50fa2480b56e2c1 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20758) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6872] Simplify Out Of Box Schema Evolution Functionality [hudi]
nsivabalan commented on PR #9743: URL: https://github.com/apache/hudi/pull/9743#issuecomment-1803150479 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6872][DNM] Simplify Out Of Box Schema Evolution Functionality (test PR to check CI) [hudi]
nsivabalan commented on PR #10028: URL: https://github.com/apache/hudi/pull/10028#issuecomment-1803150379 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7058] check if option is empty before get in HoodieBaseFileGroupRecordBuffer [hudi]
hudi-bot commented on PR #10027: URL: https://github.com/apache/hudi/pull/10027#issuecomment-1803148099 ## CI report: * 384dbfa4bc44704f0151f9fa1330cbcff172d677 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20760) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6872][DNM] Simplify Out Of Box Schema Evolution Functionality (test PR to check CI) [hudi]
hudi-bot commented on PR #10028: URL: https://github.com/apache/hudi/pull/10028#issuecomment-1803148139 ## CI report: * 91a7b8640540e841522e33c001fc8c0f4abddb4a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7047] Fix Stack Overflow Caused by partition pruning [hudi]
hudi-bot commented on PR #10026: URL: https://github.com/apache/hudi/pull/10026#issuecomment-1803148073 ## CI report: * ae33aa0b78cb61bcf6c834bfd404d0c7296ac05b Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20759) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6872] Simplify Out Of Box Schema Evolution Functionality [hudi]
hudi-bot commented on PR #9743: URL: https://github.com/apache/hudi/pull/9743#issuecomment-1803147734 ## CI report: * 097ef6176650413eef2a4c3581ca6e48ea43788f UNKNOWN * e32b58f7ce1880568566be0c8a6940ae2f3a1016 UNKNOWN * db06d7f6394d26bec65a629ed2b567754d28a46a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20750) * d3e58fee4a8cfb636419d9eeb50fa2480b56e2c1 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7058] check if option is empty before get in HoodieBaseFileGroupRecordBuffer [hudi]
hudi-bot commented on PR #10027: URL: https://github.com/apache/hudi/pull/10027#issuecomment-1803143323 ## CI report: * 384dbfa4bc44704f0151f9fa1330cbcff172d677 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7047] Fix Stack Overflow Caused by partition pruning [hudi]
hudi-bot commented on PR #10026: URL: https://github.com/apache/hudi/pull/10026#issuecomment-1803143273 ## CI report: * ae33aa0b78cb61bcf6c834bfd404d0c7296ac05b UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6872][DNM] Simplify Out Of Box Schema Evolution Functionality (test PR to check CI) [hudi]
nsivabalan commented on PR #10028: URL: https://github.com/apache/hudi/pull/10028#issuecomment-1803142917 @hudi-bot run azure -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [HUDI-6872][DNM] Simplify Out Of Box Schema Evolution Functionality [hudi]
nsivabalan opened a new pull request, #10028: URL: https://github.com/apache/hudi/pull/10028 ### Change Logs Simplify Out Of Box Schema Evolution Functionality ### Impact - Simplify Out Of Box Schema Evolution Functionality ### Risk level (write none, low medium or high below) _If medium or high, explain what verification was done to mitigate the risks._ ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7058) HoodieBaseFileGroupRecordBuffer doesn't check if option is empty
[ https://issues.apache.org/jira/browse/HUDI-7058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7058: - Labels: pull-request-available (was: ) > HoodieBaseFileGroupRecordBuffer doesn't check if option is empty > > > Key: HUDI-7058 > URL: https://issues.apache.org/jira/browse/HUDI-7058 > Project: Apache Hudi > Issue Type: Bug >Reporter: Jonathan Vexler >Assignee: Jonathan Vexler >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > > If the option is empty an exception will be thrown when get is called. This > happens when the reader is enabled for the test > testBaseFileAndLogFileUpdateMatchesDeleteBlock > > {code:java} > Caused by: java.util.NoSuchElementException: No value present in Option > at org.apache.hudi.common.util.Option.get(Option.java:89) > at > org.apache.hudi.common.table.read.HoodieBaseFileGroupRecordBuffer.doProcessNextDataRecord(HoodieBaseFileGroupRecordBuffer.java:143) > at > org.apache.hudi.common.table.read.HoodieKeyBasedFileGroupRecordBuffer.processNextDataRecord(HoodieKeyBasedFileGroupRecordBuffer.java:90) > at > org.apache.hudi.common.table.read.HoodieKeyBasedFileGroupRecordBuffer.processDataBlock(HoodieKeyBasedFileGroupRecordBuffer.java:81) > at > org.apache.hudi.common.table.log.BaseHoodieLogRecordReader.processQueuedBlocksForInstant(BaseHoodieLogRecordReader.java:751) > at > org.apache.hudi.common.table.log.BaseHoodieLogRecordReader.scanInternalV1(BaseHoodieLogRecordReader.java:393) > ... 28 more {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [HUDI-7058] check if option is empty before get in HoodieBaseFileGroupRecordBuffer [hudi]
jonvex opened a new pull request, #10027: URL: https://github.com/apache/hudi/pull/10027 ### Change Logs HoodieBaseFileGroupRecordBuffer does a record merge and then gets the option without checking. testBaseFileAndLogFileUpdateMatchesDeleteBlock exposes that the option can be empty. The fix is to check and return an empty option if that is the case ### Impact get rid of failure point ### Risk level (write none, low medium or high below) none ### Documentation Update N/A ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-7058) HoodieBaseFileGroupRecordBuffer doesn't check if option is empty
Jonathan Vexler created HUDI-7058: - Summary: HoodieBaseFileGroupRecordBuffer doesn't check if option is empty Key: HUDI-7058 URL: https://issues.apache.org/jira/browse/HUDI-7058 Project: Apache Hudi Issue Type: Bug Reporter: Jonathan Vexler Assignee: Jonathan Vexler Fix For: 1.0.0 If the option is empty an exception will be thrown when get is called. This happens when the reader is enabled for the test testBaseFileAndLogFileUpdateMatchesDeleteBlock {code:java} Caused by: java.util.NoSuchElementException: No value present in Option at org.apache.hudi.common.util.Option.get(Option.java:89) at org.apache.hudi.common.table.read.HoodieBaseFileGroupRecordBuffer.doProcessNextDataRecord(HoodieBaseFileGroupRecordBuffer.java:143) at org.apache.hudi.common.table.read.HoodieKeyBasedFileGroupRecordBuffer.processNextDataRecord(HoodieKeyBasedFileGroupRecordBuffer.java:90) at org.apache.hudi.common.table.read.HoodieKeyBasedFileGroupRecordBuffer.processDataBlock(HoodieKeyBasedFileGroupRecordBuffer.java:81) at org.apache.hudi.common.table.log.BaseHoodieLogRecordReader.processQueuedBlocksForInstant(BaseHoodieLogRecordReader.java:751) at org.apache.hudi.common.table.log.BaseHoodieLogRecordReader.scanInternalV1(BaseHoodieLogRecordReader.java:393) ... 28 more {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [HUDI-7047] Fix Stack Overflow Caused by partition pruning [hudi]
jonvex opened a new pull request, #10026: URL: https://github.com/apache/hudi/pull/10026 ### Change Logs hasPushedDownPartitionPredicates is not set to true in all cases when listFiles is called, so the planner repeatedly tries to resolve it ### Impact no more stack overflow ### Risk level (write none, low medium or high below) low ### Documentation Update N/A ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] UPSERTs are taking time [hudi]
nsivabalan commented on issue #9976: URL: https://github.com/apache/hudi/issues/9976#issuecomment-1803119793 and yeah. upgrading to 0.14.0, you can leverage RLI and that should def boost your index latencies and write latencies. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] UPSERTs are taking time [hudi]
nsivabalan commented on issue #9976: URL: https://github.com/apache/hudi/issues/9976#issuecomment-1803119446 yeah. As I suggested before, you may want to try our MOR table. and try using SIMPLE index. in 0.10.1 hudi uses bloom index and for random keys it might incur some unnecessary overhead. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7057] Support CopyToTableProcedure with patitial column copy [hudi]
hudi-bot commented on PR #10025: URL: https://github.com/apache/hudi/pull/10025#issuecomment-1803111282 ## CI report: * 80a394967d09baac64231af055c40996dbb2a7fd Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20756) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7053] Fix the filter pushdown logic for file group reader [hudi]
hudi-bot commented on PR #10018: URL: https://github.com/apache/hudi/pull/10018#issuecomment-1803111222 ## CI report: * c5812540396b56db64df779ab7147cc0cede626a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20735) * 0f4e8c208b5614736e450657ad56810bc4060ea4 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20751) * e420cdf46604699ae6587b9aa13e6e9c0d139d38 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20755) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch master updated: [HUDI-7056] Add config for merging using positions (#10022)
This is an automated email from the ASF dual-hosted git repository. yihua pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 2e2e39377e2 [HUDI-7056] Add config for merging using positions (#10022) 2e2e39377e2 is described below commit 2e2e39377e2577bf357626d3347963bd0efd01ba Author: Jon Vexler AuthorDate: Wed Nov 8 22:31:09 2023 -0500 [HUDI-7056] Add config for merging using positions (#10022) Co-authored-by: Jonathan Vexler <=> --- .../java/org/apache/hudi/common/config/HoodieReaderConfig.java | 7 +++ .../main/scala/org/apache/hudi/HoodieHadoopFsRelationFactory.scala | 2 +- 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/hudi-common/src/main/java/org/apache/hudi/common/config/HoodieReaderConfig.java b/hudi-common/src/main/java/org/apache/hudi/common/config/HoodieReaderConfig.java index 1738f75e9ec..c572cc21adc 100644 --- a/hudi-common/src/main/java/org/apache/hudi/common/config/HoodieReaderConfig.java +++ b/hudi-common/src/main/java/org/apache/hudi/common/config/HoodieReaderConfig.java @@ -58,4 +58,11 @@ public class HoodieReaderConfig extends HoodieConfig { .markAdvanced() .sinceVersion("1.0.0") .withDocumentation("Use engine agnostic file group reader if enabled"); + + public static final ConfigProperty MERGE_USE_RECORD_POSITIONS = ConfigProperty + .key("hoodie.merge.use.record.positions") + .defaultValue(false) + .markAdvanced() + .sinceVersion("1.0.0") + .withDocumentation("Whether to use positions in the block header for data blocks containing updates and delete blocks for merging."); } diff --git a/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieHadoopFsRelationFactory.scala b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieHadoopFsRelationFactory.scala index 50249d87d97..a49fee2b740 100644 --- a/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieHadoopFsRelationFactory.scala +++ b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieHadoopFsRelationFactory.scala @@ -171,7 +171,7 @@ abstract class HoodieBaseHadoopFsRelationFactory(val sqlContext: SQLContext, protected lazy val fileGroupReaderEnabled: Boolean = checkIfAConfigurationEnabled(HoodieReaderConfig.FILE_GROUP_READER_ENABLED) - protected lazy val shouldUseRecordPosition: Boolean = checkIfAConfigurationEnabled(HoodieWriteConfig.WRITE_RECORD_POSITIONS) + protected lazy val shouldUseRecordPosition: Boolean = checkIfAConfigurationEnabled(HoodieReaderConfig.MERGE_USE_RECORD_POSITIONS) protected def queryTimestamp: Option[String] = specifiedQueryTimestamp.orElse(toScalaOption(timeline.lastInstant()).map(_.getTimestamp))
Re: [PR] [HUDI-7056] Add config for merging using positions [hudi]
yihua merged PR #10022: URL: https://github.com/apache/hudi/pull/10022 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Some .commit files generated by compaction with 0 size [hudi]
zdl1 commented on issue #10017: URL: https://github.com/apache/hudi/issues/10017#issuecomment-1803107881 > Are the inflight and requested compaction metdata files empty also? The inflight file is empty but the requested file is not. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7057] Support CopyToTableProcedure with patitial column copy [hudi]
hudi-bot commented on PR #10025: URL: https://github.com/apache/hudi/pull/10025#issuecomment-1803106701 ## CI report: * 80a394967d09baac64231af055c40996dbb2a7fd UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7053] Fix the filter pushdown logic for file group reader [hudi]
hudi-bot commented on PR #10018: URL: https://github.com/apache/hudi/pull/10018#issuecomment-1803106634 ## CI report: * c5812540396b56db64df779ab7147cc0cede626a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20735) * 0f4e8c208b5614736e450657ad56810bc4060ea4 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20751) * e420cdf46604699ae6587b9aa13e6e9c0d139d38 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Some .commit files generated by compaction with 0 size [hudi]
danny0405 commented on issue #10017: URL: https://github.com/apache/hudi/issues/10017#issuecomment-1803103939 Are the inflight and requested compaction metdata files empty also? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7050]Flink HoodieHiveCatalog supports hadoop parameters [hudi]
danny0405 commented on code in PR #10013: URL: https://github.com/apache/hudi/pull/10013#discussion_r1387440082 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/HoodieCatalogUtil.java: ## @@ -61,9 +61,10 @@ public class HoodieCatalogUtil { * @param hiveConfDir Hive conf directory path. * @return A HiveConf instance. */ - public static HiveConf createHiveConf(@Nullable String hiveConfDir) { + public static HiveConf createHiveConf(@Nullable String hiveConfDir, @Nullable org.apache.flink.configuration.Configuration flinkConf) { Review Comment: The method has only 1 caller, just make the param as `org.apache.flink.configuration.Configuration` should be fine. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Some .commit files generated by compaction with 0 size [hudi]
zdl1 commented on issue #10017: URL: https://github.com/apache/hudi/issues/10017#issuecomment-1803102881 > what release did you use? Thanks for your information! But now I am using flink1.14 and hudi0.13.1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7056] Add config for merging using positions [hudi]
hudi-bot commented on PR #10022: URL: https://github.com/apache/hudi/pull/10022#issuecomment-1803101905 ## CI report: * 8f150c02d9ff127c3a0dd0ec06d46c696de88a70 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20746) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7050]Flink HoodieHiveCatalog supports hadoop parameters [hudi]
danny0405 commented on code in PR #10013: URL: https://github.com/apache/hudi/pull/10013#discussion_r1387438530 ## hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/HoodieCatalogFactory.java: ## @@ -40,6 +40,7 @@ public class HoodieCatalogFactory implements CatalogFactory { private static final Logger LOG = LoggerFactory.getLogger(HoodieCatalogFactory.class); public static final String IDENTIFIER = "hudi"; + public static final String HADOOP_PREFIX = "hadoop."; Review Comment: We already have a constant variable in `HadoopConfigurations`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Some .commit files generated by compaction with 0 size [hudi]
danny0405 commented on issue #10017: URL: https://github.com/apache/hudi/issues/10017#issuecomment-1803093991 what release did you use? Since 0.14.0, for MOR + INSERT, we write parquets directly, you should use clustering instead of compaction. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Possible memory leak issue for org.apache.hadoop.hive.conf.HiveConf while using Flink into Hudi [hudi]
danny0405 commented on issue #10023: URL: https://github.com/apache/hudi/issues/10023#issuecomment-1803090137 There is an known Hive conf resource leak fix: https://github.com/apache/hudi/pull/8050/files, but the 0.13.0 should include this fix. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Solution for synchronizing the entire database table in flink [hudi]
danny0405 commented on issue #9965: URL: https://github.com/apache/hudi/issues/9965#issuecomment-1803084529 yeah, already on the list. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [BUG] Spark will read invalid timestamp(3) data when record in log is older than the same in parquet. [hudi]
danny0405 commented on issue #10012: URL: https://github.com/apache/hudi/issues/10012#issuecomment-1803082627 Looks like the avro schema is using `timestamp-millis` as the logical data type, so in `EventTimeAvroPayload`, did you debug a little bit where the `timestamp-micros` come from? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7057) Support CopyToTableProcedure with patitial column copy
[ https://issues.apache.org/jira/browse/HUDI-7057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xy updated HUDI-7057: - Description: currently CopyToTableProcedure only support all schema copy to a new table,but many sences user need a part columns from table schema,for example user don't need metadata column due to storage cost (was: currently CopyToTableProcedure only support all schema copy to a new table,but many sences user need a part columns from table schema ) > Support CopyToTableProcedure with patitial column copy > --- > > Key: HUDI-7057 > URL: https://issues.apache.org/jira/browse/HUDI-7057 > Project: Apache Hudi > Issue Type: Improvement >Reporter: xy >Assignee: xy >Priority: Major > Labels: pull-request-available, sparksql > > currently CopyToTableProcedure only support all schema copy to a new > table,but many sences user need a part columns from table schema,for example > user don't need metadata column due to storage cost -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (HUDI-7017) Prevent full schema evolution from wrongly falling back to OOB
[ https://issues.apache.org/jira/browse/HUDI-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen closed HUDI-7017. Resolution: Fixed Fixed via master branch: d149d787e9d3dbf425c0dd5ca0265bed5fe2795f > Prevent full schema evolution from wrongly falling back to OOB > -- > > Key: HUDI-7017 > URL: https://issues.apache.org/jira/browse/HUDI-7017 > Project: Apache Hudi > Issue Type: Bug >Reporter: voon >Assignee: voon >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Attachments: image-2023-11-01-11-41-25-604.png, > image-2023-11-01-11-43-14-149.png > > > For MOR tables that have these 2 configurations enabled: > > {code:java} > hoodie.schema.on.read.enable=true > hoodie.datasource.read.extract.partition.values.from.path=true{code} > > > BaseFileReader will use a *requiredSchemaReader* when reading some of the > parquet files. This BaseFileReader will have an empty *internalSchemaStr* > causing *Spark3XLegacyHoodieParquetInputFormat* to fall back to OOB schema > evolution. > > Although there are required safeguards that are added in HUDI-5400 to force > the code execution path to use Hudi Full Schema Evolution, we should still > fix this so that future changes that may deprecate the use of > *Spark3XLegacyHoodieParquetInputFormat* will not cause issues. > > A sample test to invoke this: > {code:java} > test("Test wrong fallback to OOB schema evolution") { > withRecordType()(withTempDir { tmp => > Seq("mor").foreach { tableType => > val tableName = generateTableName > val tablePath = s"${new Path(tmp.getCanonicalPath, > tableName).toUri.toString}" > if (HoodieSparkUtils.gteqSpark3_1) { > spark.sql("set " + SPARK_SQL_INSERT_INTO_OPERATION.key + "=upsert") > spark.sql("set hoodie.schema.on.read.enable=true") > > spark.sql("hoodie.datasource.read.extract.partition.values.from.path=true") > // NOTE: This is required since as this tests use type coercions > which were only permitted in Spark 2.x > // and are disallowed now by default in Spark 3.x > spark.sql("set spark.sql.storeAssignmentPolicy=legacy") > createAndPreparePartitionTable(spark, tableName, tablePath, tableType) > // date -> string > spark.sql(s"alter table $tableName alter column col6 type String") > checkAnswer(spark.sql(s"select col6 from $tableName where id = > 1").collect())( > Seq("2021-12-25") > ) > } > } > }) > } {code} > > Debugger snapshots: > !image-2023-11-01-11-41-25-604.png|width=1197,height=596! > As can be seen, *requiredSchema* (used as pruning input) has internalSchema > string, but *requiredDataSchema* does has a null internalSchema string. > > !image-2023-11-01-11-43-14-149.png|width=1257,height=672! > As a result, the internalSchemaStr that is passed into > Spark3XLegacyHoodieParquetFileFormat is null (which should not be the case) > -- This message was sent by Atlassian Jira (v8.20.10#820010)
(hudi) branch master updated: [HUDI-7055] Support reading only log files in file group reader-based Spark parquet file format (#10020)
This is an automated email from the ASF dual-hosted git repository. yihua pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 530640f61a5 [HUDI-7055] Support reading only log files in file group reader-based Spark parquet file format (#10020) 530640f61a5 is described below commit 530640f61a5e3f8103d0b66ff866a2de995156b7 Author: Y Ethan Guo AuthorDate: Wed Nov 8 18:45:45 2023 -0800 [HUDI-7055] Support reading only log files in file group reader-based Spark parquet file format (#10020) --- .../SparkFileFormatInternalRowReaderContext.scala | 16 +++-- .../table/read/TestHoodieFileGroupReaderBase.java | 71 -- ...odieFileGroupReaderBasedParquetFileFormat.scala | 39 +++- .../read/TestHoodieFileGroupReaderOnSpark.scala| 6 +- 4 files changed, 92 insertions(+), 40 deletions(-) diff --git a/hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/SparkFileFormatInternalRowReaderContext.scala b/hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/SparkFileFormatInternalRowReaderContext.scala index af3d3fd239c..beca8852686 100644 --- a/hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/SparkFileFormatInternalRowReaderContext.scala +++ b/hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/SparkFileFormatInternalRowReaderContext.scala @@ -44,10 +44,12 @@ import scala.collection.mutable * * This uses Spark parquet reader to read parquet data files or parquet log blocks. * - * @param baseFileReader A reader that transforms a {@link PartitionedFile} to an iterator of {@link InternalRow}. + * @param baseFileReader A reader that transforms a {@link PartitionedFile} to an iterator of + *{@link InternalRow}. This is required for reading the base file and + *not required for reading a file group with only log files. * @param partitionValues The values for a partition in which the file group lives. */ -class SparkFileFormatInternalRowReaderContext(baseFileReader: PartitionedFile => Iterator[InternalRow], +class SparkFileFormatInternalRowReaderContext(baseFileReader: Option[PartitionedFile => Iterator[InternalRow]], partitionValues: InternalRow) extends BaseSparkInternalRowReaderContext { lazy val sparkAdapter = SparkAdapterSupport.sparkAdapter lazy val sparkFileReaderFactory = new HoodieSparkFileReaderFactory @@ -62,11 +64,11 @@ class SparkFileFormatInternalRowReaderContext(baseFileReader: PartitionedFile => val fileInfo = sparkAdapter.getSparkPartitionedFileUtils .createPartitionedFile(partitionValues, filePath, start, length) if (FSUtils.isLogFile(filePath)) { - val structType: StructType = HoodieInternalRowUtils.getCachedSchema(dataSchema) + val structType: StructType = HoodieInternalRowUtils.getCachedSchema(requiredSchema) val projection: UnsafeProjection = HoodieInternalRowUtils.getCachedUnsafeProjection(structType, structType) new CloseableMappingIterator[InternalRow, UnsafeRow]( sparkFileReaderFactory.newParquetFileReader(conf, filePath).asInstanceOf[HoodieSparkParquetReader] - .getInternalRowIterator(dataSchema, dataSchema), + .getInternalRowIterator(dataSchema, requiredSchema), new java.util.function.Function[InternalRow, UnsafeRow] { override def apply(data: InternalRow): UnsafeRow = { // NOTE: We have to do [[UnsafeProjection]] of incoming [[InternalRow]] to convert @@ -75,7 +77,11 @@ class SparkFileFormatInternalRowReaderContext(baseFileReader: PartitionedFile => } }).asInstanceOf[ClosableIterator[InternalRow]] } else { - new CloseableInternalRowIterator(baseFileReader.apply(fileInfo)) + if (baseFileReader.isEmpty) { +throw new IllegalArgumentException("Base file reader is missing when instantiating " + + "SparkFileFormatInternalRowReaderContext."); + } + new CloseableInternalRowIterator(baseFileReader.get.apply(fileInfo)) } } diff --git a/hudi-common/src/test/java/org/apache/hudi/common/table/read/TestHoodieFileGroupReaderBase.java b/hudi-common/src/test/java/org/apache/hudi/common/table/read/TestHoodieFileGroupReaderBase.java index febc0d32466..439948a6cc9 100644 --- a/hudi-common/src/test/java/org/apache/hudi/common/table/read/TestHoodieFileGroupReaderBase.java +++ b/hudi-common/src/test/java/org/apache/hudi/common/table/read/TestHoodieFileGroupReaderBase.java @@ -40,8 +40,9 @@ import org.apache.hudi.metadata.HoodieTableMetadata; import org.apache.avro.Schema; import org.apache.hadoop.conf.Configuration; -import org.junit.jupiter.api.Test; import org.junit.jupiter.api.io.TempDir; +import org.junit.jupiter.params.ParameterizedTest; +import org.junit.jupiter.params.provider.ValueSource; import
Re: [PR] [HUDI-7055] Support reading only log files in file group reader-based Spark parquet file format [hudi]
yihua merged PR #10020: URL: https://github.com/apache/hudi/pull/10020 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (HUDI-6993) Support Flink 1.18
[ https://issues.apache.org/jira/browse/HUDI-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen closed HUDI-6993. Resolution: Fixed Fixed via master branch: bd136addc24f4d03608f5069b98c2748e463169f > Support Flink 1.18 > -- > > Key: HUDI-6993 > URL: https://issues.apache.org/jira/browse/HUDI-6993 > Project: Apache Hudi > Issue Type: New Feature > Components: flink >Reporter: Prabhu Joseph >Priority: Major > Labels: flink, pull-request-available, upgrade > Fix For: 1.0.0 > > > This JIRA intends to support Flink-1.18 in Hudi. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7017) Prevent full schema evolution from wrongly falling back to OOB
[ https://issues.apache.org/jira/browse/HUDI-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-7017: - Fix Version/s: 1.0.0 > Prevent full schema evolution from wrongly falling back to OOB > -- > > Key: HUDI-7017 > URL: https://issues.apache.org/jira/browse/HUDI-7017 > Project: Apache Hudi > Issue Type: Bug >Reporter: voon >Assignee: voon >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Attachments: image-2023-11-01-11-41-25-604.png, > image-2023-11-01-11-43-14-149.png > > > For MOR tables that have these 2 configurations enabled: > > {code:java} > hoodie.schema.on.read.enable=true > hoodie.datasource.read.extract.partition.values.from.path=true{code} > > > BaseFileReader will use a *requiredSchemaReader* when reading some of the > parquet files. This BaseFileReader will have an empty *internalSchemaStr* > causing *Spark3XLegacyHoodieParquetInputFormat* to fall back to OOB schema > evolution. > > Although there are required safeguards that are added in HUDI-5400 to force > the code execution path to use Hudi Full Schema Evolution, we should still > fix this so that future changes that may deprecate the use of > *Spark3XLegacyHoodieParquetInputFormat* will not cause issues. > > A sample test to invoke this: > {code:java} > test("Test wrong fallback to OOB schema evolution") { > withRecordType()(withTempDir { tmp => > Seq("mor").foreach { tableType => > val tableName = generateTableName > val tablePath = s"${new Path(tmp.getCanonicalPath, > tableName).toUri.toString}" > if (HoodieSparkUtils.gteqSpark3_1) { > spark.sql("set " + SPARK_SQL_INSERT_INTO_OPERATION.key + "=upsert") > spark.sql("set hoodie.schema.on.read.enable=true") > > spark.sql("hoodie.datasource.read.extract.partition.values.from.path=true") > // NOTE: This is required since as this tests use type coercions > which were only permitted in Spark 2.x > // and are disallowed now by default in Spark 3.x > spark.sql("set spark.sql.storeAssignmentPolicy=legacy") > createAndPreparePartitionTable(spark, tableName, tablePath, tableType) > // date -> string > spark.sql(s"alter table $tableName alter column col6 type String") > checkAnswer(spark.sql(s"select col6 from $tableName where id = > 1").collect())( > Seq("2021-12-25") > ) > } > } > }) > } {code} > > Debugger snapshots: > !image-2023-11-01-11-41-25-604.png|width=1197,height=596! > As can be seen, *requiredSchema* (used as pruning input) has internalSchema > string, but *requiredDataSchema* does has a null internalSchema string. > > !image-2023-11-01-11-43-14-149.png|width=1257,height=672! > As a result, the internalSchemaStr that is passed into > Spark3XLegacyHoodieParquetFileFormat is null (which should not be the case) > -- This message was sent by Atlassian Jira (v8.20.10#820010)
(hudi) branch master updated: [HUDI-7017] Prevent full schema evolution from wrongly falling back to OOB schema evolution (#9966)
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new d149d787e9d [HUDI-7017] Prevent full schema evolution from wrongly falling back to OOB schema evolution (#9966) d149d787e9d is described below commit d149d787e9d3dbf425c0dd5ca0265bed5fe2795f Author: voonhous AuthorDate: Thu Nov 9 10:44:01 2023 +0800 [HUDI-7017] Prevent full schema evolution from wrongly falling back to OOB schema evolution (#9966) --- .../scala/org/apache/hudi/HoodieBaseRelation.scala| 19 --- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieBaseRelation.scala b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieBaseRelation.scala index 78c5cc4ca47..eaeff8bc7e9 100644 --- a/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieBaseRelation.scala +++ b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieBaseRelation.scala @@ -715,16 +715,29 @@ abstract class HoodieBaseRelation(val sqlContext: SQLContext, if (extractPartitionValuesFromPartitionPath) { val partitionSchema = filterInPartitionColumns(tableSchema.structTypeSchema) val prunedDataStructSchema = prunePartitionColumns(tableSchema.structTypeSchema) - val prunedRequiredSchema = prunePartitionColumns(requiredSchema.structTypeSchema) + val prunedDataInternalSchema = pruneInternalSchema(tableSchema, prunedDataStructSchema) + val prunedRequiredStructSchema = prunePartitionColumns(requiredSchema.structTypeSchema) + val prunedRequiredInternalSchema = pruneInternalSchema(requiredSchema, prunedRequiredStructSchema) (partitionSchema, -HoodieTableSchema(prunedDataStructSchema, convertToAvroSchema(prunedDataStructSchema, tableName).toString), -HoodieTableSchema(prunedRequiredSchema, convertToAvroSchema(prunedRequiredSchema, tableName).toString)) +HoodieTableSchema(prunedDataStructSchema, + convertToAvroSchema(prunedDataStructSchema, tableName).toString, prunedDataInternalSchema), +HoodieTableSchema(prunedRequiredStructSchema, + convertToAvroSchema(prunedRequiredStructSchema, tableName).toString, prunedRequiredInternalSchema)) } else { (StructType(Nil), tableSchema, requiredSchema) } } + private def pruneInternalSchema(hoodieTableSchema: HoodieTableSchema, prunedStructSchema: StructType): Option[InternalSchema] = { +if (hoodieTableSchema.internalSchema.isEmpty || hoodieTableSchema.internalSchema.get.isEmptySchema) { + Option.empty[InternalSchema] +} else { + Some(InternalSchemaUtils.pruneInternalSchema(hoodieTableSchema.internalSchema.get, +prunedStructSchema.fields.map(_.name).toList.asJava)) +} + } + private def filterInPartitionColumns(structType: StructType): StructType = StructType(structType.filter(f => partitionColumns.exists(col => resolver(f.name, col
Re: [PR] [HUDI-7017] Prevent full schema evolution from wrongly falling back t… [hudi]
danny0405 merged PR #9966: URL: https://github.com/apache/hudi/pull/9966 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-6993) Support Flink 1.18
[ https://issues.apache.org/jira/browse/HUDI-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-6993: - Fix Version/s: 1.0.0 > Support Flink 1.18 > -- > > Key: HUDI-6993 > URL: https://issues.apache.org/jira/browse/HUDI-6993 > Project: Apache Hudi > Issue Type: New Feature > Components: flink >Reporter: Prabhu Joseph >Priority: Major > Labels: flink, pull-request-available, upgrade > Fix For: 1.0.0 > > > This JIRA intends to support Flink-1.18 in Hudi. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7017] Prevent full schema evolution from wrongly falling back t… [hudi]
danny0405 commented on PR #9966: URL: https://github.com/apache/hudi/pull/9966#issuecomment-1803076235 Tests passed: https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=20721=results -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch master updated (7c17964fe73 -> bd136addc24)
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from 7c17964fe73 [HUDI-6495][RFC-66] Non-blocking Concurrency Control (#7907) add bd136addc24 [HUDI-6993] Support Flink 1.18 (#9949) No new revisions were added by this update. Summary of changes: .github/workflows/bot.yml | 12 +++-- README.md | 7 +-- azure-pipelines-20230430.yml | 7 ++- hudi-flink-datasource/hudi-flink/pom.xml | 1 + .../hudi/table/catalog/HoodieHiveCatalog.java | 36 +++ .../apache/hudi/adapter/HiveCatalogConstants.java | 51 + .../apache/hudi/adapter/HiveCatalogConstants.java | 52 ++ .../apache/hudi/adapter/HiveCatalogConstants.java | 52 ++ .../apache/hudi/adapter/HiveCatalogConstants.java | 52 ++ .../apache/hudi/adapter/HiveCatalogConstants.java | 52 ++ .../{hudi-flink1.17.x => hudi-flink1.18.x}/pom.xml | 26 ++- .../adapter/AbstractStreamOperatorAdapter.java | 0 .../AbstractStreamOperatorFactoryAdapter.java | 0 .../adapter/DataStreamScanProviderAdapter.java | 0 .../adapter/DataStreamSinkProviderAdapter.java | 0 .../apache/hudi/adapter/HiveCatalogConstants.java | 49 .../hudi/adapter/MailboxExecutorAdapter.java | 0 .../apache/hudi/adapter/MaskingOutputAdapter.java | 0 .../hudi/adapter/OperatorCoordinatorAdapter.java | 0 .../apache/hudi/adapter/RateLimiterAdapter.java| 0 .../hudi/adapter/SortCodeGeneratorAdapter.java | 0 .../adapter/SupportsRowLevelDeleteAdapter.java | 0 .../adapter/SupportsRowLevelUpdateAdapter.java | 0 .../main/java/org/apache/hudi/adapter/Utils.java | 0 .../table/format/cow/ParquetSplitReaderUtil.java | 0 .../table/format/cow/vector/HeapArrayVector.java | 0 .../format/cow/vector/HeapMapColumnVector.java | 0 .../format/cow/vector/HeapRowColumnVector.java | 0 .../format/cow/vector/ParquetDecimalVector.java| 0 .../cow/vector/reader/AbstractColumnReader.java| 0 .../cow/vector/reader/ArrayColumnReader.java | 0 .../vector/reader/BaseVectorizedColumnReader.java | 0 .../cow/vector/reader/EmptyColumnReader.java | 0 .../vector/reader/FixedLenBytesColumnReader.java | 0 .../vector/reader/Int64TimestampColumnReader.java | 0 .../format/cow/vector/reader/MapColumnReader.java | 0 .../reader/ParquetColumnarRowSplitReader.java | 0 .../cow/vector/reader/ParquetDataColumnReader.java | 0 .../reader/ParquetDataColumnReaderFactory.java | 0 .../format/cow/vector/reader/RowColumnReader.java | 0 .../format/cow/vector/reader/RunLengthDecoder.java | 0 .../org/apache/hudi/adapter/OutputAdapter.java | 0 .../adapter/StateInitializationContextAdapter.java | 0 .../adapter/StreamingRuntimeContextAdapter.java| 0 .../org/apache/hudi/adapter/TestStreamConfigs.java | 0 .../org/apache/hudi/adapter/TestTableEnvs.java | 0 hudi-flink-datasource/pom.xml | 1 + ...ark332.sh => build_flink1180hive313spark332.sh} | 6 +-- ...ark340.sh => build_flink1180hive313spark340.sh} | 6 +-- packaging/bundle-validation/ci_run.sh | 2 + pom.xml| 37 --- scripts/release/deploy_staging_jars.sh | 1 + scripts/release/validate_staged_bundles.sh | 2 +- 53 files changed, 403 insertions(+), 49 deletions(-) create mode 100644 hudi-flink-datasource/hudi-flink1.13.x/src/main/java/org/apache/hudi/adapter/HiveCatalogConstants.java create mode 100644 hudi-flink-datasource/hudi-flink1.14.x/src/main/java/org/apache/hudi/adapter/HiveCatalogConstants.java create mode 100644 hudi-flink-datasource/hudi-flink1.15.x/src/main/java/org/apache/hudi/adapter/HiveCatalogConstants.java create mode 100644 hudi-flink-datasource/hudi-flink1.16.x/src/main/java/org/apache/hudi/adapter/HiveCatalogConstants.java create mode 100644 hudi-flink-datasource/hudi-flink1.17.x/src/main/java/org/apache/hudi/adapter/HiveCatalogConstants.java copy hudi-flink-datasource/{hudi-flink1.17.x => hudi-flink1.18.x}/pom.xml (87%) copy hudi-flink-datasource/{hudi-flink1.14.x => hudi-flink1.18.x}/src/main/java/org/apache/hudi/adapter/AbstractStreamOperatorAdapter.java (100%) copy hudi-flink-datasource/{hudi-flink1.14.x => hudi-flink1.18.x}/src/main/java/org/apache/hudi/adapter/AbstractStreamOperatorFactoryAdapter.java (100%) copy hudi-flink-datasource/{hudi-flink1.15.x => hudi-flink1.18.x}/src/main/java/org/apache/hudi/adapter/DataStreamScanProviderAdapter.java (100%) copy hudi-flink-datasource/{hudi-flink1.15.x => hudi-flink1.18.x}/src/main/java/org/apache/hudi/adapter/DataStreamSinkProviderAdapter.java (100%) create mode 100644
Re: [PR] [HUDI-6993] Support Flink 1.18 [hudi]
danny0405 merged PR #9949: URL: https://github.com/apache/hudi/pull/9949 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7057) Support CopyToTableProcedure with patitial column copy
[ https://issues.apache.org/jira/browse/HUDI-7057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7057: - Labels: pull-request-available sparksql (was: sparksql) > Support CopyToTableProcedure with patitial column copy > --- > > Key: HUDI-7057 > URL: https://issues.apache.org/jira/browse/HUDI-7057 > Project: Apache Hudi > Issue Type: Improvement >Reporter: xy >Assignee: xy >Priority: Major > Labels: pull-request-available, sparksql > > currently CopyToTableProcedure only support all schema copy to a new > table,but many sences user need a part columns from table schema -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-6993] Support Flink 1.18 [hudi]
danny0405 commented on PR #9949: URL: https://github.com/apache/hudi/pull/9949#issuecomment-1803074942 Tests passed: https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=20702=results -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [HUDI-7057] Support CopyToTableProcedure with patitial column copy [hudi]
xuzifu666 opened a new pull request, #10025: URL: https://github.com/apache/hudi/pull/10025 ### Change Logs currently CopyToTableProcedure only support all schema copy to a new table,but many sences user need a part columns from table schema ### Impact none ### Risk level (write none, low medium or high below) lower ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7057) Support CopyToTableProcedure with patitial column copy
[ https://issues.apache.org/jira/browse/HUDI-7057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xy updated HUDI-7057: - Labels: sparksql (was: ) > Support CopyToTableProcedure with patitial column copy > --- > > Key: HUDI-7057 > URL: https://issues.apache.org/jira/browse/HUDI-7057 > Project: Apache Hudi > Issue Type: Improvement >Reporter: xy >Assignee: xy >Priority: Major > Labels: sparksql > > currently CopyToTableProcedure only support all schema copy to a new > table,but many sences user need a part columns from table schema -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-7057) Support CopyToTableProcedure with patitial column copy
xy created HUDI-7057: Summary: Support CopyToTableProcedure with patitial column copy Key: HUDI-7057 URL: https://issues.apache.org/jira/browse/HUDI-7057 Project: Apache Hudi Issue Type: Improvement Reporter: xy currently CopyToTableProcedure only support all schema copy to a new table,but many sences user need a part columns from table schema -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-7057) Support CopyToTableProcedure with patitial column copy
[ https://issues.apache.org/jira/browse/HUDI-7057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xy reassigned HUDI-7057: Assignee: xy > Support CopyToTableProcedure with patitial column copy > --- > > Key: HUDI-7057 > URL: https://issues.apache.org/jira/browse/HUDI-7057 > Project: Apache Hudi > Issue Type: Improvement >Reporter: xy >Assignee: xy >Priority: Major > > currently CopyToTableProcedure only support all schema copy to a new > table,but many sences user need a part columns from table schema -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-6806] Support Spark 3.5.0 [hudi]
hudi-bot commented on PR #9717: URL: https://github.com/apache/hudi/pull/9717#issuecomment-1803071474 ## CI report: * 9b8fdd2d1b69da528069e364790b53af1d6150af UNKNOWN * 43253eac6f27abbb614bb80d514d5ea3e30e09a1 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20747) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7054][FOLLOW_UP] HoodieCatalogTable should ignore lazily deleted partitions [hudi]
hudi-bot commented on PR #10024: URL: https://github.com/apache/hudi/pull/10024#issuecomment-1803071823 ## CI report: * 17566af6a57a43de3b2a5d5ab7bcd8bd28336307 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20752) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Some .commit files generated by compaction with 0 size [hudi]
zdl1 commented on issue #10017: URL: https://github.com/apache/hudi/issues/10017#issuecomment-1803070116 Thanks for your help, and there is another question, for an existing partition, the new insertion record to an existing bucket will be marked as **numUpdateWrites**, so how could I confirm how many new records have been written in this delta commit? I will appreciate it if you could help me with it. @ad1happy2go @danny0405 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7054][FOLLOW_UP] HoodieCatalogTable should ignore lazily deleted partitions [hudi]
hudi-bot commented on PR #10024: URL: https://github.com/apache/hudi/pull/10024#issuecomment-1803066773 ## CI report: * 17566af6a57a43de3b2a5d5ab7bcd8bd28336307 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7053] Fix the filter pushdown logic for file group reader [hudi]
hudi-bot commented on PR #10018: URL: https://github.com/apache/hudi/pull/10018#issuecomment-1803066730 ## CI report: * c5812540396b56db64df779ab7147cc0cede626a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20735) * 0f4e8c208b5614736e450657ad56810bc4060ea4 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20751) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6495][RFC-66] Non-blocking Concurrency Control [hudi]
vinothchandar merged PR #7907: URL: https://github.com/apache/hudi/pull/7907 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch master updated (43a39b907bc -> 7c17964fe73)
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from 43a39b907bc ShowPartitionsCommand should consider lazy delete_partitions (#10019) add 7c17964fe73 [HUDI-6495][RFC-66] Non-blocking Concurrency Control (#7907) No new revisions were added by this update. Summary of changes: rfc/rfc-66/compaction.png| Bin 0 -> 104613 bytes rfc/rfc-66/log_file_sequence.png | Bin 0 -> 62499 bytes rfc/rfc-66/multi_writer.png | Bin 0 -> 83780 bytes rfc/rfc-66/non_serial_compaction.png | Bin 0 -> 208944 bytes rfc/rfc-66/rfc-66.md | 318 +++ 5 files changed, 318 insertions(+) create mode 100644 rfc/rfc-66/compaction.png create mode 100644 rfc/rfc-66/log_file_sequence.png create mode 100644 rfc/rfc-66/multi_writer.png create mode 100644 rfc/rfc-66/non_serial_compaction.png create mode 100644 rfc/rfc-66/rfc-66.md
Re: [PR] [HUDI-7055] Support reading only log files in file group reader-based Spark parquet file format [hudi]
hudi-bot commented on PR #10020: URL: https://github.com/apache/hudi/pull/10020#issuecomment-1803061504 ## CI report: * 8b39f72f16c9b708667dc17ee51875cf3c9b7364 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20743) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7053] Fix the filter pushdown logic for file group reader [hudi]
hudi-bot commented on PR #10018: URL: https://github.com/apache/hudi/pull/10018#issuecomment-1803061463 ## CI report: * c5812540396b56db64df779ab7147cc0cede626a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20735) * 0f4e8c208b5614736e450657ad56810bc4060ea4 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6872] Simplify Out Of Box Schema Evolution Functionality [hudi]
hudi-bot commented on PR #9743: URL: https://github.com/apache/hudi/pull/9743#issuecomment-1803061117 ## CI report: * 097ef6176650413eef2a4c3581ca6e48ea43788f UNKNOWN * e32b58f7ce1880568566be0c8a6940ae2f3a1016 UNKNOWN * db06d7f6394d26bec65a629ed2b567754d28a46a Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20750) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [HUDI-7054][FOLLOW_UP] HoodieCatalogTable should ignore lazily deleted partitions [hudi]
boneanxs opened a new pull request, #10024: URL: https://github.com/apache/hudi/pull/10024 ### Change Logs _Describe context and summary for this change. Highlight if any code was copied._ Thinking it might be better to directly let `HoodieCatalogTable` to filter deleted partitions, it shouldn't show deleted partitions when calling `getPartitionPaths`. This can also fix `RepairHoodieTableCommand` to not repair deleted partitions. ### Impact _Describe any public API or user-facing feature change or any performance impact._ None ### Risk level (write none, low medium or high below) _If medium or high, explain what verification was done to mitigate the risks._ None ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change_ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6508] Fix compile errors with JDK11 [hudi]
Zouxxyy commented on PR #9300: URL: https://github.com/apache/hudi/pull/9300#issuecomment-1803034364 > @Zouxxyy : Can you fix the merge conflicts ? done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Change some default configs for 1.0.0-beta [hudi]
hudi-bot commented on PR #9998: URL: https://github.com/apache/hudi/pull/9998#issuecomment-1803028334 ## CI report: * 7e450aee63b81c2d28d04d927eadf5ca006e8a19 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20748) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] [SUPPORT] Possible memory leak issue for org.apache.hadoop.hive.conf.HiveConf while using Flink into Hudi [hudi]
xmubeta opened a new issue, #10023: URL: https://github.com/apache/hudi/issues/10023 **_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? - Join the mailing list to engage in conversations and get faster support at dev-subscr...@hudi.apache.org. - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly. **Describe the problem you faced** I am using Flink SQL to ingest data from AWS Kinesis to Hudi on S3. I used AWS Glue catalog as Hive metastore. hive_sync.enable is set to true in SQL. The ingestion works well. However after running a few hours or days, the jobmanager failed with OutOfMemory. I checked the hdump and found org.apache.hadoop.hive.conf.HiveConf took 80.77% memory. It seems to be related to HiveSyncContext. The suspect leak from Eclipse Memory Analyzer: 12 instances of "org.apache.hadoop.hive.conf.HiveConf", loaded by "sun.misc.Launcher$AppClassLoader @ 0xe400bdf8" occupy 338,544,384 (80.77%) bytes. Biggest instances: •org.apache.hadoop.hive.conf.HiveConf @ 0xe71197b0 - 33,702,712 (8.04%) bytes. •org.apache.hadoop.hive.conf.HiveConf @ 0xe72d9e30 - 33,702,712 (8.04%) bytes. •org.apache.hadoop.hive.conf.HiveConf @ 0xe77c62c0 - 33,702,712 (8.04%) bytes. •org.apache.hadoop.hive.conf.HiveConf @ 0xe787f640 - 33,702,712 (8.04%) bytes. •org.apache.hadoop.hive.conf.HiveConf @ 0xe798fd00 - 33,702,712 (8.04%) bytes. •org.apache.hadoop.hive.conf.HiveConf @ 0xe7a9b0f0 - 33,702,712 (8.04%) bytes. •org.apache.hadoop.hive.conf.HiveConf @ 0xe812a8c8 - 33,702,712 (8.04%) bytes. •org.apache.hadoop.hive.conf.HiveConf @ 0xe82d0af0 - 33,702,712 (8.04%) bytes. •org.apache.hadoop.hive.conf.HiveConf @ 0xe84c10c8 - 33,702,712 (8.04%) bytes. •org.apache.hadoop.hive.conf.HiveConf @ 0xe8736300 - 33,702,712 (8.04%) bytes. Keywords sun.misc.Launcher$AppClassLoader @ 0xe400bdf8 org.apache.hadoop.hive.conf.HiveConf **To Reproduce** Steps to reproduce the behavior: 1. Set up an AWS EMR 6.10.0 with Flink 1.16.0 +Hive 3.1 + Hudi 0.13.0 2. Set up an AWS Kinesis and ingest data into it. 3. Run a Flink SQL job to ingest to Hudi on S3 from Kinesis 4. Run for a few hours or days, could get OOM. **Expected behavior** No OOM issue. **Environment Description** * Hudi version : 0.13.0 * Spark version : 3.3.1 * Hive version : 3.1 * Hadoop version : 3.3.3 * Storage (HDFS/S3/GCS..) : S3 * Running on Docker? (yes/no) : no **Additional context** Add any other context about the problem here. **Stacktrace** ```Add the stacktrace of the error.``` ``` 2023-11-09 06:59:55,475 ERROR org.apache.hudi.sink.StreamWriteOperatorCoordinator [] - Executor executes action [commits the instant 20231109065505712] error java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.stream.StreamSupport.stream(StreamSupport.java:69) ~[?:1.8.0_392] at java.util.Collection.stream(Collection.java:581) ~[?:1.8.0_392] at org.apache.hudi.common.table.timeline.TimelineLayout$TimelineLayoutV1.lambda$filterHoodieInstants$2(TimelineLayout.java:68) ~[hudi-flink1.16-bundle-0.13.0.jar:0.13.0] at org.apache.hudi.common.table.timeline.TimelineLayout$TimelineLayoutV1$$Lambda$1187/1033743503.apply(Unknown Source) ~[?:?] at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) ~[?:1.8.0_392] at java.util.HashMap$ValueSpliterator.forEachRemaining(HashMap.java:1652) ~[?:1.8.0_392] at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) ~[?:1.8.0_392] at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) ~[?:1.8.0_392] at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) ~[?:1.8.0_392] at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:1.8.0_392] at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566) ~[?:1.8.0_392] at org.apache.hudi.common.table.HoodieTableMetaClient.scanHoodieInstantsFromFileSystem(HoodieTableMetaClient.java:651) ~[hudi-flink1.16-bundle-0.13.0.jar:0.13.0] at org.apache.hudi.common.table.HoodieTableMetaClient.scanHoodieInstantsFromFileSystem(HoodieTableMetaClient.java:625) ~[hudi-flink1.16-bundle-0.13.0.jar:0.13.0] at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.(HoodieActiveTimeline.java:163) ~[hudi-flink1.16-bundle-0.13.0.jar:0.13.0] at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.(HoodieActiveTimeline.java:155) ~[hudi-flink1.16-bundle-0.13.0.jar:0.13.0] at
Re: [PR] [HUDI-6872] Simplify Out Of Box Schema Evolution Functionality [hudi]
hudi-bot commented on PR #9743: URL: https://github.com/apache/hudi/pull/9743#issuecomment-1802979926 ## CI report: * 097ef6176650413eef2a4c3581ca6e48ea43788f UNKNOWN * e32b58f7ce1880568566be0c8a6940ae2f3a1016 UNKNOWN * 58d6194db3965823e985b646738d9d2399e4a5d8 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20741) * db06d7f6394d26bec65a629ed2b567754d28a46a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20750) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6872] Simplify Out Of Box Schema Evolution Functionality [hudi]
hudi-bot commented on PR #9743: URL: https://github.com/apache/hudi/pull/9743#issuecomment-1802973496 ## CI report: * 097ef6176650413eef2a4c3581ca6e48ea43788f UNKNOWN * e32b58f7ce1880568566be0c8a6940ae2f3a1016 UNKNOWN * 58d6194db3965823e985b646738d9d2399e4a5d8 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20741) * db06d7f6394d26bec65a629ed2b567754d28a46a UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7044] Skip reading records for delete blocks for positional merging [hudi]
hudi-bot commented on PR #10005: URL: https://github.com/apache/hudi/pull/10005#issuecomment-1802967731 ## CI report: * 83c12bf57803d832fbbfe2eed9cae30d987db175 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20745) * baaf10ac8ac319fb3e776b33b4386755bf034cb6 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20749) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-6806] Support Spark 3.5.0 [hudi]
hudi-bot commented on PR #9717: URL: https://github.com/apache/hudi/pull/9717#issuecomment-1802967250 ## CI report: * 9b8fdd2d1b69da528069e364790b53af1d6150af UNKNOWN * 94c46b0aaa5a205e767ed088ad631cc894922ea3 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20744) * 43253eac6f27abbb614bb80d514d5ea3e30e09a1 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20747) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] UPSERTs are taking time [hudi]
soumilshah1995 commented on issue #9976: URL: https://github.com/apache/hudi/issues/9976#issuecomment-1802964892 can you use new RLI ? https://www.linkedin.com/pulse/upsert-performance-evaluation-hudi-014-spark-341-record-soumil-shah-oupre%3FtrackingId=PeKhUkGNTkuSD1VRqoI3rw%253D%253D/?trackingId=PeKhUkGNTkuSD1VRqoI3rw%3D%3D -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT]insert_overwrite mode writing 2 times more duplicates [hudi]
soumilshah1995 commented on issue #9992: URL: https://github.com/apache/hudi/issues/9992#issuecomment-1802963658 here is sample works fine for me https://soumilshah1995.blogspot.com/2023/03/rfc-18-insert-overwrite-in-apache-hudi.html -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7044] Skip reading records for delete blocks for positional merging [hudi]
hudi-bot commented on PR #10005: URL: https://github.com/apache/hudi/pull/10005#issuecomment-1802909950 ## CI report: * 579179dc1c43cf4fa02c3f023187ac9f8da06ffa Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20710) * 83c12bf57803d832fbbfe2eed9cae30d987db175 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20745) * baaf10ac8ac319fb3e776b33b4386755bf034cb6 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20749) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org