Re: [PR] [HUDI-6872] Simplify Out Of Box Schema Evolution Functionality [hudi]

2023-11-08 Thread via GitHub


hudi-bot commented on PR #9743:
URL: https://github.com/apache/hudi/pull/9743#issuecomment-1803296619

   
   ## CI report:
   
   * 097ef6176650413eef2a4c3581ca6e48ea43788f UNKNOWN
   * e32b58f7ce1880568566be0c8a6940ae2f3a1016 UNKNOWN
   * d3e58fee4a8cfb636419d9eeb50fa2480b56e2c1 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20758)
 
   * a1ab15d9c36c9afd9626b63b30897f08efadbf69 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20765)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6872][DNM] Simplify Out Of Box Schema Evolution Functionality (test PR to check CI) [hudi]

2023-11-08 Thread via GitHub


hudi-bot commented on PR #10028:
URL: https://github.com/apache/hudi/pull/10028#issuecomment-1803289243

   
   ## CI report:
   
   * 4c36398beb08a1eb91959ce151ebf8fd159bb8d6 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20763)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7054][FOLLOW_UP] HoodieCatalogTable should ignore lazily deleted partitions [hudi]

2023-11-08 Thread via GitHub


hudi-bot commented on PR #10024:
URL: https://github.com/apache/hudi/pull/10024#issuecomment-1803289144

   
   ## CI report:
   
   * 17566af6a57a43de3b2a5d5ab7bcd8bd28336307 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20752)
 
   * 7c0ff25207cc19c5496dc4e7e688c3a8527663d2 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20766)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6872] Simplify Out Of Box Schema Evolution Functionality [hudi]

2023-11-08 Thread via GitHub


hudi-bot commented on PR #9743:
URL: https://github.com/apache/hudi/pull/9743#issuecomment-1803288697

   
   ## CI report:
   
   * 097ef6176650413eef2a4c3581ca6e48ea43788f UNKNOWN
   * e32b58f7ce1880568566be0c8a6940ae2f3a1016 UNKNOWN
   * d3e58fee4a8cfb636419d9eeb50fa2480b56e2c1 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20758)
 
   * a1ab15d9c36c9afd9626b63b30897f08efadbf69 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7054][FOLLOW_UP] HoodieCatalogTable should ignore lazily deleted partitions [hudi]

2023-11-08 Thread via GitHub


hudi-bot commented on PR #10024:
URL: https://github.com/apache/hudi/pull/10024#issuecomment-1803281654

   
   ## CI report:
   
   * 17566af6a57a43de3b2a5d5ab7bcd8bd28336307 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20752)
 
   * 7c0ff25207cc19c5496dc4e7e688c3a8527663d2 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6872][DNM] Simplify Out Of Box Schema Evolution Functionality (test PR to check CI) [hudi]

2023-11-08 Thread via GitHub


hudi-bot commented on PR #10028:
URL: https://github.com/apache/hudi/pull/10028#issuecomment-1803281756

   
   ## CI report:
   
   * 91a7b8640540e841522e33c001fc8c0f4abddb4a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20761)
 
   * 4c36398beb08a1eb91959ce151ebf8fd159bb8d6 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20763)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (HUDI-6992) IncrementalInputSplits incorrectly set the latestCommit attr

2023-11-08 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen closed HUDI-6992.

Resolution: Fixed

Fixed via master branch: 44ca2bbcfd1512a55155e1033a9c9aca132efae6

> IncrementalInputSplits incorrectly set the latestCommit attr
> 
>
> Key: HUDI-6992
> URL: https://issues.apache.org/jira/browse/HUDI-6992
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: reader-core
>Reporter: zhuanshenbsj1
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


(hudi) branch master updated: [HUDI-6992] IncrementalInputSplits incorrectly set the latestCommit attribute (#9923)

2023-11-08 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 44ca2bbcfd1 [HUDI-6992] IncrementalInputSplits incorrectly set the 
latestCommit attribute (#9923)
44ca2bbcfd1 is described below

commit 44ca2bbcfd1512a55155e1033a9c9aca132efae6
Author: zhuanshenbsj1 <34104400+zhuanshenb...@users.noreply.github.com>
AuthorDate: Thu Nov 9 15:04:50 2023 +0800

[HUDI-6992] IncrementalInputSplits incorrectly set the latestCommit 
attribute (#9923)
---
 .../org/apache/hudi/common/model/FileSlice.java| 13 --
 .../table/timeline/CompletionTimeQueryView.java|  7 +--
 .../hudi/common/table/timeline/HoodieTimeline.java | 14 ++
 .../apache/hudi/common/model/TestFileSlice.java| 50 ++
 .../apache/hudi/source/IncrementalInputSplits.java |  7 ++-
 .../hudi/source/TestIncrementalInputSplits.java| 47 
 .../source/TestStreamReadMonitoringFunction.java   | 12 +++---
 7 files changed, 134 insertions(+), 16 deletions(-)

diff --git 
a/hudi-common/src/main/java/org/apache/hudi/common/model/FileSlice.java 
b/hudi-common/src/main/java/org/apache/hudi/common/model/FileSlice.java
index 3f0fcf94156..d071385ea75 100644
--- a/hudi-common/src/main/java/org/apache/hudi/common/model/FileSlice.java
+++ b/hudi-common/src/main/java/org/apache/hudi/common/model/FileSlice.java
@@ -18,6 +18,7 @@
 
 package org.apache.hudi.common.model;
 
+import org.apache.hudi.common.table.timeline.HoodieTimeline;
 import org.apache.hudi.common.util.Option;
 
 import java.io.Serializable;
@@ -123,9 +124,15 @@ public class FileSlice implements Serializable {
   }
 
   /**
-   * Returns true if there is no data file and no log files. Happens as part 
of pending compaction
-   * 
-   * @return
+   * Returns the latest instant time of the file slice.
+   */
+  public String getLatestInstantTime() {
+Option latestDeltaCommitTime = 
getLatestLogFile().map(HoodieLogFile::getDeltaCommitTime);
+return latestDeltaCommitTime.isPresent() ? 
HoodieTimeline.maxInstant(latestDeltaCommitTime.get(), getBaseInstantTime()) : 
getBaseInstantTime();
+  }
+
+  /**
+   * Returns true if there is no data file and no log files. Happens as part 
of pending compaction.
*/
   public boolean isEmpty() {
 return (baseFile == null) && (logFiles.isEmpty());
diff --git 
a/hudi-common/src/main/java/org/apache/hudi/common/table/timeline/CompletionTimeQueryView.java
 
b/hudi-common/src/main/java/org/apache/hudi/common/table/timeline/CompletionTimeQueryView.java
index 081cae8cb15..e53f185bffd 100644
--- 
a/hudi-common/src/main/java/org/apache/hudi/common/table/timeline/CompletionTimeQueryView.java
+++ 
b/hudi-common/src/main/java/org/apache/hudi/common/table/timeline/CompletionTimeQueryView.java
@@ -32,7 +32,6 @@ import java.util.concurrent.ConcurrentHashMap;
 import static 
org.apache.hudi.common.table.timeline.HoodieArchivedTimeline.COMPLETION_TIME_ARCHIVED_META_FIELD;
 import static 
org.apache.hudi.common.table.timeline.HoodieTimeline.GREATER_THAN_OR_EQUALS;
 import static org.apache.hudi.common.table.timeline.HoodieTimeline.LESSER_THAN;
-import static 
org.apache.hudi.common.table.timeline.HoodieTimeline.compareTimestamps;
 
 /**
  * Query view for instant completion time.
@@ -81,7 +80,7 @@ public class CompletionTimeQueryView implements 
AutoCloseable, Serializable {
   public CompletionTimeQueryView(HoodieTableMetaClient metaClient, String 
cursorInstant) {
 this.metaClient = metaClient;
 this.startToCompletionInstantTimeMap = new ConcurrentHashMap<>();
-this.cursorInstant = minInstant(cursorInstant, 
metaClient.getActiveTimeline().firstInstant().map(HoodieInstant::getTimestamp).orElse(""));
+this.cursorInstant = HoodieTimeline.minInstant(cursorInstant, 
metaClient.getActiveTimeline().firstInstant().map(HoodieInstant::getTimestamp).orElse(""));
 // Note: use getWriteTimeline() to keep sync with the fs view 
visibleCommitsAndCompactionTimeline, see 
AbstractTableFileSystemView.refreshTimeline.
 this.firstNonSavepointCommit = 
metaClient.getActiveTimeline().getWriteTimeline().getFirstNonSavepointCommit().map(HoodieInstant::getTimestamp).orElse("");
 load();
@@ -207,10 +206,6 @@ public class CompletionTimeQueryView implements 
AutoCloseable, Serializable {
 this.startToCompletionInstantTimeMap.putIfAbsent(instantTime, 
completionTime);
   }
 
-  private static String minInstant(String instant1, String instant2) {
-return compareTimestamps(instant1, LESSER_THAN, instant2) ? instant1 : 
instant2;
-  }
-
   public String getCursorInstant() {
 return cursorInstant;
   }
diff --git 
a/hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieTimeline.java
 
b/hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieTimeline.java
index 82ec439bd25..53c7d25a00c 100644

Re: [PR] [HUDI-6992] IncrementalInputSplits incorrectly set the latestCommit attr [hudi]

2023-11-08 Thread via GitHub


danny0405 merged PR #9923:
URL: https://github.com/apache/hudi/pull/9923


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Some .commit files generated by compaction with 0 size [hudi]

2023-11-08 Thread via GitHub


ad1happy2go commented on issue #10017:
URL: https://github.com/apache/hudi/issues/10017#issuecomment-1803263076

   @zdl1 Can you also share sample code/data what you are trying if 
possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7057] Support CopyToTableProcedure with patitial column copy [hudi]

2023-11-08 Thread via GitHub


xuzifu666 commented on PR #10025:
URL: https://github.com/apache/hudi/pull/10025#issuecomment-1803255723

   CI seems well done  @danny0405  PTAL


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7054][FOLLOW_UP] HoodieCatalogTable should ignore lazily deleted partitions [hudi]

2023-11-08 Thread via GitHub


boneanxs commented on code in PR #10024:
URL: https://github.com/apache/hudi/pull/10024#discussion_r1387540620


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/catalyst/catalog/HoodieCatalogTable.scala:
##
@@ -169,9 +170,10 @@ class HoodieCatalogTable(val spark: SparkSession, var 
table: CatalogTable) exten
   lazy val partitionSchema: StructType = StructType(tableSchema.filter(f => 
partitionFields.contains(f.name)))
 
   /**
-   * All the partition paths
+   * All the partition paths, excludes lazily deleted partitions.
*/
   def getPartitionPaths: Seq[String] = getAllPartitionPaths(spark, table)
+
.filter(!TimelineUtils.getDroppedPartitions(metaClient.getActiveTimeline).contains(_))
 

Review Comment:
   Oh! make senses, thanks for point this, let me fix it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6872][DNM] Simplify Out Of Box Schema Evolution Functionality (test PR to check CI) [hudi]

2023-11-08 Thread via GitHub


hudi-bot commented on PR #10028:
URL: https://github.com/apache/hudi/pull/10028#issuecomment-1803229404

   
   ## CI report:
   
   * 91a7b8640540e841522e33c001fc8c0f4abddb4a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20761)
 
   *  Unknown: [CANCELED](TBD) 
   * 4c36398beb08a1eb91959ce151ebf8fd159bb8d6 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20763)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7057] Support CopyToTableProcedure with patitial column copy [hudi]

2023-11-08 Thread via GitHub


hudi-bot commented on PR #10025:
URL: https://github.com/apache/hudi/pull/10025#issuecomment-1803229350

   
   ## CI report:
   
   * 80a394967d09baac64231af055c40996dbb2a7fd Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20756)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7053] Fix the filter pushdown logic for file group reader [hudi]

2023-11-08 Thread via GitHub


hudi-bot commented on PR #10018:
URL: https://github.com/apache/hudi/pull/10018#issuecomment-1803229313

   
   ## CI report:
   
   * e420cdf46604699ae6587b9aa13e6e9c0d139d38 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20755)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Change some default configs for 1.0.0-beta [hudi]

2023-11-08 Thread via GitHub


hudi-bot commented on PR #9998:
URL: https://github.com/apache/hudi/pull/9998#issuecomment-1803229209

   
   ## CI report:
   
   * bb197bc8649a9b8dfcd41d9ed3e4af8e0afbeb9b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20762)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6872] Simplify Out Of Box Schema Evolution Functionality [hudi]

2023-11-08 Thread via GitHub


hudi-bot commented on PR #9743:
URL: https://github.com/apache/hudi/pull/9743#issuecomment-1803228898

   
   ## CI report:
   
   * 097ef6176650413eef2a4c3581ca6e48ea43788f UNKNOWN
   * e32b58f7ce1880568566be0c8a6940ae2f3a1016 UNKNOWN
   * d3e58fee4a8cfb636419d9eeb50fa2480b56e2c1 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20758)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7054][FOLLOW_UP] HoodieCatalogTable should ignore lazily deleted partitions [hudi]

2023-11-08 Thread via GitHub


danny0405 commented on code in PR #10024:
URL: https://github.com/apache/hudi/pull/10024#discussion_r1387533789


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/catalyst/catalog/HoodieCatalogTable.scala:
##
@@ -169,9 +170,10 @@ class HoodieCatalogTable(val spark: SparkSession, var 
table: CatalogTable) exten
   lazy val partitionSchema: StructType = StructType(tableSchema.filter(f => 
partitionFields.contains(f.name)))
 
   /**
-   * All the partition paths
+   * All the partition paths, excludes lazily deleted partitions.
*/
   def getPartitionPaths: Seq[String] = getAllPartitionPaths(spark, table)
+
.filter(!TimelineUtils.getDroppedPartitions(metaClient.getActiveTimeline).contains(_))
 

Review Comment:
   Will this trigger the calculation for each partition ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7054][FOLLOW_UP] HoodieCatalogTable should ignore lazily deleted partitions [hudi]

2023-11-08 Thread via GitHub


hudi-bot commented on PR #10024:
URL: https://github.com/apache/hudi/pull/10024#issuecomment-1803194775

   
   ## CI report:
   
   * 17566af6a57a43de3b2a5d5ab7bcd8bd28336307 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20752)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Change some default configs for 1.0.0-beta [hudi]

2023-11-08 Thread via GitHub


hudi-bot commented on PR #9998:
URL: https://github.com/apache/hudi/pull/9998#issuecomment-1803194686

   
   ## CI report:
   
   * 7e450aee63b81c2d28d04d927eadf5ca006e8a19 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20748)
 
   * bb197bc8649a9b8dfcd41d9ed3e4af8e0afbeb9b Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20762)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Change some default configs for 1.0.0-beta [hudi]

2023-11-08 Thread via GitHub


hudi-bot commented on PR #9998:
URL: https://github.com/apache/hudi/pull/9998#issuecomment-1803187055

   
   ## CI report:
   
   * 7e450aee63b81c2d28d04d927eadf5ca006e8a19 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20748)
 
   * bb197bc8649a9b8dfcd41d9ed3e4af8e0afbeb9b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6872][DNM] Simplify Out Of Box Schema Evolution Functionality (test PR to check CI) [hudi]

2023-11-08 Thread via GitHub


hudi-bot commented on PR #10028:
URL: https://github.com/apache/hudi/pull/10028#issuecomment-1803180699

   
   ## CI report:
   
   * 91a7b8640540e841522e33c001fc8c0f4abddb4a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20761)
 
   *  Unknown: [CANCELED](TBD) 
   * 4c36398beb08a1eb91959ce151ebf8fd159bb8d6 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7047] Fix Stack Overflow Caused by partition pruning [hudi]

2023-11-08 Thread via GitHub


hudi-bot commented on PR #10026:
URL: https://github.com/apache/hudi/pull/10026#issuecomment-1803180657

   
   ## CI report:
   
   * ae33aa0b78cb61bcf6c834bfd404d0c7296ac05b Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20759)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7044] Skip reading records for delete blocks for positional merging [hudi]

2023-11-08 Thread via GitHub


hudi-bot commented on PR #10005:
URL: https://github.com/apache/hudi/pull/10005#issuecomment-1803180577

   
   ## CI report:
   
   * baaf10ac8ac319fb3e776b33b4386755bf034cb6 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20749)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6872] Simplify Out Of Box Schema Evolution Functionality [hudi]

2023-11-08 Thread via GitHub


hudi-bot commented on PR #9743:
URL: https://github.com/apache/hudi/pull/9743#issuecomment-1803180312

   
   ## CI report:
   
   * 097ef6176650413eef2a4c3581ca6e48ea43788f UNKNOWN
   * e32b58f7ce1880568566be0c8a6940ae2f3a1016 UNKNOWN
   * db06d7f6394d26bec65a629ed2b567754d28a46a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20750)
 
   * d3e58fee4a8cfb636419d9eeb50fa2480b56e2c1 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20758)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6993] Support Flink 1.18 [hudi]

2023-11-08 Thread via GitHub


PrabhuJoseph commented on PR #9949:
URL: https://github.com/apache/hudi/pull/9949#issuecomment-1803166707

   Thanks @danny0405 for the review and commit.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] DBT Merge creates duplicates [hudi]

2023-11-08 Thread via GitHub


faizhasan commented on issue #7244:
URL: https://github.com/apache/hudi/issues/7244#issuecomment-1803162436

   Hi @amrishlal apologies for the delay. 
   
   I was able to test this and saw the following behavior.
   with dbt 1.6.2, dbt-spark adapter to execute models on thriftserver:
   
   - AWS EMR 6.10.1 (Hudi 0.12.2-amzn-0) : no duplicates created
   - AWS EMR 6.11.0 (Hudi 0.13.0-amzn-0) and above: duplicates were created


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6872][DNM] Simplify Out Of Box Schema Evolution Functionality (test PR to check CI) [hudi]

2023-11-08 Thread via GitHub


hudi-bot commented on PR #10028:
URL: https://github.com/apache/hudi/pull/10028#issuecomment-1803152853

   
   ## CI report:
   
   * 91a7b8640540e841522e33c001fc8c0f4abddb4a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20761)
 
   *  Unknown: [CANCELED](TBD) 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6872] Simplify Out Of Box Schema Evolution Functionality [hudi]

2023-11-08 Thread via GitHub


hudi-bot commented on PR #9743:
URL: https://github.com/apache/hudi/pull/9743#issuecomment-1803152527

   
   ## CI report:
   
   * 097ef6176650413eef2a4c3581ca6e48ea43788f UNKNOWN
   * e32b58f7ce1880568566be0c8a6940ae2f3a1016 UNKNOWN
   * db06d7f6394d26bec65a629ed2b567754d28a46a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20750)
 
   * d3e58fee4a8cfb636419d9eeb50fa2480b56e2c1 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20758)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6872] Simplify Out Of Box Schema Evolution Functionality [hudi]

2023-11-08 Thread via GitHub


nsivabalan commented on PR #9743:
URL: https://github.com/apache/hudi/pull/9743#issuecomment-1803150479

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6872][DNM] Simplify Out Of Box Schema Evolution Functionality (test PR to check CI) [hudi]

2023-11-08 Thread via GitHub


nsivabalan commented on PR #10028:
URL: https://github.com/apache/hudi/pull/10028#issuecomment-1803150379

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7058] check if option is empty before get in HoodieBaseFileGroupRecordBuffer [hudi]

2023-11-08 Thread via GitHub


hudi-bot commented on PR #10027:
URL: https://github.com/apache/hudi/pull/10027#issuecomment-1803148099

   
   ## CI report:
   
   * 384dbfa4bc44704f0151f9fa1330cbcff172d677 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20760)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6872][DNM] Simplify Out Of Box Schema Evolution Functionality (test PR to check CI) [hudi]

2023-11-08 Thread via GitHub


hudi-bot commented on PR #10028:
URL: https://github.com/apache/hudi/pull/10028#issuecomment-1803148139

   
   ## CI report:
   
   * 91a7b8640540e841522e33c001fc8c0f4abddb4a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7047] Fix Stack Overflow Caused by partition pruning [hudi]

2023-11-08 Thread via GitHub


hudi-bot commented on PR #10026:
URL: https://github.com/apache/hudi/pull/10026#issuecomment-1803148073

   
   ## CI report:
   
   * ae33aa0b78cb61bcf6c834bfd404d0c7296ac05b Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20759)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6872] Simplify Out Of Box Schema Evolution Functionality [hudi]

2023-11-08 Thread via GitHub


hudi-bot commented on PR #9743:
URL: https://github.com/apache/hudi/pull/9743#issuecomment-1803147734

   
   ## CI report:
   
   * 097ef6176650413eef2a4c3581ca6e48ea43788f UNKNOWN
   * e32b58f7ce1880568566be0c8a6940ae2f3a1016 UNKNOWN
   * db06d7f6394d26bec65a629ed2b567754d28a46a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20750)
 
   * d3e58fee4a8cfb636419d9eeb50fa2480b56e2c1 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7058] check if option is empty before get in HoodieBaseFileGroupRecordBuffer [hudi]

2023-11-08 Thread via GitHub


hudi-bot commented on PR #10027:
URL: https://github.com/apache/hudi/pull/10027#issuecomment-1803143323

   
   ## CI report:
   
   * 384dbfa4bc44704f0151f9fa1330cbcff172d677 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7047] Fix Stack Overflow Caused by partition pruning [hudi]

2023-11-08 Thread via GitHub


hudi-bot commented on PR #10026:
URL: https://github.com/apache/hudi/pull/10026#issuecomment-1803143273

   
   ## CI report:
   
   * ae33aa0b78cb61bcf6c834bfd404d0c7296ac05b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6872][DNM] Simplify Out Of Box Schema Evolution Functionality (test PR to check CI) [hudi]

2023-11-08 Thread via GitHub


nsivabalan commented on PR #10028:
URL: https://github.com/apache/hudi/pull/10028#issuecomment-1803142917

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [HUDI-6872][DNM] Simplify Out Of Box Schema Evolution Functionality [hudi]

2023-11-08 Thread via GitHub


nsivabalan opened a new pull request, #10028:
URL: https://github.com/apache/hudi/pull/10028

   ### Change Logs
   
   Simplify Out Of Box Schema Evolution Functionality
   
   ### Impact
   
   - Simplify Out Of Box Schema Evolution Functionality
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7058) HoodieBaseFileGroupRecordBuffer doesn't check if option is empty

2023-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7058:
-
Labels: pull-request-available  (was: )

> HoodieBaseFileGroupRecordBuffer doesn't check if option is empty
> 
>
> Key: HUDI-7058
> URL: https://issues.apache.org/jira/browse/HUDI-7058
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Jonathan Vexler
>Assignee: Jonathan Vexler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>
> If the option is empty an exception will be thrown when get is called. This 
> happens when the reader is enabled for the test 
> testBaseFileAndLogFileUpdateMatchesDeleteBlock
>  
> {code:java}
> Caused by: java.util.NoSuchElementException: No value present in Option
>     at org.apache.hudi.common.util.Option.get(Option.java:89)
>     at 
> org.apache.hudi.common.table.read.HoodieBaseFileGroupRecordBuffer.doProcessNextDataRecord(HoodieBaseFileGroupRecordBuffer.java:143)
>     at 
> org.apache.hudi.common.table.read.HoodieKeyBasedFileGroupRecordBuffer.processNextDataRecord(HoodieKeyBasedFileGroupRecordBuffer.java:90)
>     at 
> org.apache.hudi.common.table.read.HoodieKeyBasedFileGroupRecordBuffer.processDataBlock(HoodieKeyBasedFileGroupRecordBuffer.java:81)
>     at 
> org.apache.hudi.common.table.log.BaseHoodieLogRecordReader.processQueuedBlocksForInstant(BaseHoodieLogRecordReader.java:751)
>     at 
> org.apache.hudi.common.table.log.BaseHoodieLogRecordReader.scanInternalV1(BaseHoodieLogRecordReader.java:393)
>     ... 28 more {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [HUDI-7058] check if option is empty before get in HoodieBaseFileGroupRecordBuffer [hudi]

2023-11-08 Thread via GitHub


jonvex opened a new pull request, #10027:
URL: https://github.com/apache/hudi/pull/10027

   ### Change Logs
   
   HoodieBaseFileGroupRecordBuffer does a record merge and then gets the option 
without checking. testBaseFileAndLogFileUpdateMatchesDeleteBlock exposes that 
the option can be empty. The fix is to check and return an empty option if that 
is the case
   
   ### Impact
   
   get rid of failure point
   
   ### Risk level (write none, low medium or high below)
   
   none
   
   ### Documentation Update
   
   N/A
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-7058) HoodieBaseFileGroupRecordBuffer doesn't check if option is empty

2023-11-08 Thread Jonathan Vexler (Jira)
Jonathan Vexler created HUDI-7058:
-

 Summary: HoodieBaseFileGroupRecordBuffer doesn't check if option 
is empty
 Key: HUDI-7058
 URL: https://issues.apache.org/jira/browse/HUDI-7058
 Project: Apache Hudi
  Issue Type: Bug
Reporter: Jonathan Vexler
Assignee: Jonathan Vexler
 Fix For: 1.0.0


If the option is empty an exception will be thrown when get is called. This 
happens when the reader is enabled for the test 
testBaseFileAndLogFileUpdateMatchesDeleteBlock

 
{code:java}
Caused by: java.util.NoSuchElementException: No value present in Option
    at org.apache.hudi.common.util.Option.get(Option.java:89)
    at 
org.apache.hudi.common.table.read.HoodieBaseFileGroupRecordBuffer.doProcessNextDataRecord(HoodieBaseFileGroupRecordBuffer.java:143)
    at 
org.apache.hudi.common.table.read.HoodieKeyBasedFileGroupRecordBuffer.processNextDataRecord(HoodieKeyBasedFileGroupRecordBuffer.java:90)
    at 
org.apache.hudi.common.table.read.HoodieKeyBasedFileGroupRecordBuffer.processDataBlock(HoodieKeyBasedFileGroupRecordBuffer.java:81)
    at 
org.apache.hudi.common.table.log.BaseHoodieLogRecordReader.processQueuedBlocksForInstant(BaseHoodieLogRecordReader.java:751)
    at 
org.apache.hudi.common.table.log.BaseHoodieLogRecordReader.scanInternalV1(BaseHoodieLogRecordReader.java:393)
    ... 28 more {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [HUDI-7047] Fix Stack Overflow Caused by partition pruning [hudi]

2023-11-08 Thread via GitHub


jonvex opened a new pull request, #10026:
URL: https://github.com/apache/hudi/pull/10026

   ### Change Logs
   
   hasPushedDownPartitionPredicates is not set to true in all cases when 
listFiles is called, so the planner repeatedly tries to resolve it
   
   ### Impact
   
   no more stack overflow
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   N/A
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] UPSERTs are taking time [hudi]

2023-11-08 Thread via GitHub


nsivabalan commented on issue #9976:
URL: https://github.com/apache/hudi/issues/9976#issuecomment-1803119793

   and yeah. upgrading to 0.14.0, you can leverage RLI and that should def 
boost your index latencies and write latencies.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] UPSERTs are taking time [hudi]

2023-11-08 Thread via GitHub


nsivabalan commented on issue #9976:
URL: https://github.com/apache/hudi/issues/9976#issuecomment-1803119446

   yeah. As I suggested before, you may want to try our MOR table. and try 
using SIMPLE index. in 0.10.1 hudi uses bloom index and for random keys it 
might incur some unnecessary overhead. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7057] Support CopyToTableProcedure with patitial column copy [hudi]

2023-11-08 Thread via GitHub


hudi-bot commented on PR #10025:
URL: https://github.com/apache/hudi/pull/10025#issuecomment-1803111282

   
   ## CI report:
   
   * 80a394967d09baac64231af055c40996dbb2a7fd Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20756)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7053] Fix the filter pushdown logic for file group reader [hudi]

2023-11-08 Thread via GitHub


hudi-bot commented on PR #10018:
URL: https://github.com/apache/hudi/pull/10018#issuecomment-1803111222

   
   ## CI report:
   
   * c5812540396b56db64df779ab7147cc0cede626a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20735)
 
   * 0f4e8c208b5614736e450657ad56810bc4060ea4 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20751)
 
   * e420cdf46604699ae6587b9aa13e6e9c0d139d38 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20755)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch master updated: [HUDI-7056] Add config for merging using positions (#10022)

2023-11-08 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 2e2e39377e2 [HUDI-7056] Add config for merging using positions (#10022)
2e2e39377e2 is described below

commit 2e2e39377e2577bf357626d3347963bd0efd01ba
Author: Jon Vexler 
AuthorDate: Wed Nov 8 22:31:09 2023 -0500

[HUDI-7056] Add config for merging using positions (#10022)

Co-authored-by: Jonathan Vexler <=>
---
 .../java/org/apache/hudi/common/config/HoodieReaderConfig.java | 7 +++
 .../main/scala/org/apache/hudi/HoodieHadoopFsRelationFactory.scala | 2 +-
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git 
a/hudi-common/src/main/java/org/apache/hudi/common/config/HoodieReaderConfig.java
 
b/hudi-common/src/main/java/org/apache/hudi/common/config/HoodieReaderConfig.java
index 1738f75e9ec..c572cc21adc 100644
--- 
a/hudi-common/src/main/java/org/apache/hudi/common/config/HoodieReaderConfig.java
+++ 
b/hudi-common/src/main/java/org/apache/hudi/common/config/HoodieReaderConfig.java
@@ -58,4 +58,11 @@ public class HoodieReaderConfig extends HoodieConfig {
   .markAdvanced()
   .sinceVersion("1.0.0")
   .withDocumentation("Use engine agnostic file group reader if enabled");
+
+  public static final ConfigProperty MERGE_USE_RECORD_POSITIONS = 
ConfigProperty
+  .key("hoodie.merge.use.record.positions")
+  .defaultValue(false)
+  .markAdvanced()
+  .sinceVersion("1.0.0")
+  .withDocumentation("Whether to use positions in the block header for 
data blocks containing updates and delete blocks for merging.");
 }
diff --git 
a/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieHadoopFsRelationFactory.scala
 
b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieHadoopFsRelationFactory.scala
index 50249d87d97..a49fee2b740 100644
--- 
a/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieHadoopFsRelationFactory.scala
+++ 
b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieHadoopFsRelationFactory.scala
@@ -171,7 +171,7 @@ abstract class HoodieBaseHadoopFsRelationFactory(val 
sqlContext: SQLContext,
 
   protected lazy val fileGroupReaderEnabled: Boolean = 
checkIfAConfigurationEnabled(HoodieReaderConfig.FILE_GROUP_READER_ENABLED)
 
-  protected lazy val shouldUseRecordPosition: Boolean = 
checkIfAConfigurationEnabled(HoodieWriteConfig.WRITE_RECORD_POSITIONS)
+  protected lazy val shouldUseRecordPosition: Boolean = 
checkIfAConfigurationEnabled(HoodieReaderConfig.MERGE_USE_RECORD_POSITIONS)
 
   protected def queryTimestamp: Option[String] =
 
specifiedQueryTimestamp.orElse(toScalaOption(timeline.lastInstant()).map(_.getTimestamp))



Re: [PR] [HUDI-7056] Add config for merging using positions [hudi]

2023-11-08 Thread via GitHub


yihua merged PR #10022:
URL: https://github.com/apache/hudi/pull/10022


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Some .commit files generated by compaction with 0 size [hudi]

2023-11-08 Thread via GitHub


zdl1 commented on issue #10017:
URL: https://github.com/apache/hudi/issues/10017#issuecomment-1803107881

   > Are the inflight and requested compaction metdata files empty also?
   
   The inflight file is empty but the requested file is not.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7057] Support CopyToTableProcedure with patitial column copy [hudi]

2023-11-08 Thread via GitHub


hudi-bot commented on PR #10025:
URL: https://github.com/apache/hudi/pull/10025#issuecomment-1803106701

   
   ## CI report:
   
   * 80a394967d09baac64231af055c40996dbb2a7fd UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7053] Fix the filter pushdown logic for file group reader [hudi]

2023-11-08 Thread via GitHub


hudi-bot commented on PR #10018:
URL: https://github.com/apache/hudi/pull/10018#issuecomment-1803106634

   
   ## CI report:
   
   * c5812540396b56db64df779ab7147cc0cede626a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20735)
 
   * 0f4e8c208b5614736e450657ad56810bc4060ea4 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20751)
 
   * e420cdf46604699ae6587b9aa13e6e9c0d139d38 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Some .commit files generated by compaction with 0 size [hudi]

2023-11-08 Thread via GitHub


danny0405 commented on issue #10017:
URL: https://github.com/apache/hudi/issues/10017#issuecomment-1803103939

   Are the inflight and requested compaction metdata files empty also?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7050]Flink HoodieHiveCatalog supports hadoop parameters [hudi]

2023-11-08 Thread via GitHub


danny0405 commented on code in PR #10013:
URL: https://github.com/apache/hudi/pull/10013#discussion_r1387440082


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/HoodieCatalogUtil.java:
##
@@ -61,9 +61,10 @@ public class HoodieCatalogUtil {
* @param hiveConfDir Hive conf directory path.
* @return A HiveConf instance.
*/
-  public static HiveConf createHiveConf(@Nullable String hiveConfDir) {
+  public static HiveConf createHiveConf(@Nullable String hiveConfDir, 
@Nullable org.apache.flink.configuration.Configuration flinkConf) {

Review Comment:
   The method has only 1 caller, just make the param as 
`org.apache.flink.configuration.Configuration` should be fine.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Some .commit files generated by compaction with 0 size [hudi]

2023-11-08 Thread via GitHub


zdl1 commented on issue #10017:
URL: https://github.com/apache/hudi/issues/10017#issuecomment-1803102881

   > what release did you use?
   
   Thanks for your information! But now I am using flink1.14 and hudi0.13.1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7056] Add config for merging using positions [hudi]

2023-11-08 Thread via GitHub


hudi-bot commented on PR #10022:
URL: https://github.com/apache/hudi/pull/10022#issuecomment-1803101905

   
   ## CI report:
   
   * 8f150c02d9ff127c3a0dd0ec06d46c696de88a70 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20746)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7050]Flink HoodieHiveCatalog supports hadoop parameters [hudi]

2023-11-08 Thread via GitHub


danny0405 commented on code in PR #10013:
URL: https://github.com/apache/hudi/pull/10013#discussion_r1387438530


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/catalog/HoodieCatalogFactory.java:
##
@@ -40,6 +40,7 @@ public class HoodieCatalogFactory implements CatalogFactory {
   private static final Logger LOG = 
LoggerFactory.getLogger(HoodieCatalogFactory.class);
 
   public static final String IDENTIFIER = "hudi";
+  public static final String HADOOP_PREFIX = "hadoop.";

Review Comment:
   We already have a constant variable in `HadoopConfigurations`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Some .commit files generated by compaction with 0 size [hudi]

2023-11-08 Thread via GitHub


danny0405 commented on issue #10017:
URL: https://github.com/apache/hudi/issues/10017#issuecomment-1803093991

   what release did you use? Since 0.14.0, for MOR + INSERT, we write parquets 
directly, you should use clustering instead of compaction.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Possible memory leak issue for org.apache.hadoop.hive.conf.HiveConf while using Flink into Hudi [hudi]

2023-11-08 Thread via GitHub


danny0405 commented on issue #10023:
URL: https://github.com/apache/hudi/issues/10023#issuecomment-1803090137

   There is an known Hive conf resource leak fix: 
https://github.com/apache/hudi/pull/8050/files, but the 0.13.0 should include 
this fix.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Solution for synchronizing the entire database table in flink [hudi]

2023-11-08 Thread via GitHub


danny0405 commented on issue #9965:
URL: https://github.com/apache/hudi/issues/9965#issuecomment-1803084529

   yeah, already on the list.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [BUG] Spark will read invalid timestamp(3) data when record in log is older than the same in parquet. [hudi]

2023-11-08 Thread via GitHub


danny0405 commented on issue #10012:
URL: https://github.com/apache/hudi/issues/10012#issuecomment-1803082627

Looks like the avro schema is using `timestamp-millis` as the logical data 
type, so in `EventTimeAvroPayload`, did you debug a little bit where the 
`timestamp-micros` come from?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7057) Support CopyToTableProcedure with patitial column copy

2023-11-08 Thread xy (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xy updated HUDI-7057:
-
Description: currently CopyToTableProcedure only support all schema copy to 
a new table,but many sences user need a part columns from table schema,for 
example user don't need metadata column due to storage cost  (was: currently 
CopyToTableProcedure only support all schema copy to a new table,but many 
sences user need a part columns from table schema )

> Support CopyToTableProcedure with patitial column copy 
> ---
>
> Key: HUDI-7057
> URL: https://issues.apache.org/jira/browse/HUDI-7057
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: xy
>Assignee: xy
>Priority: Major
>  Labels: pull-request-available, sparksql
>
> currently CopyToTableProcedure only support all schema copy to a new 
> table,but many sences user need a part columns from table schema,for example 
> user don't need metadata column due to storage cost



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Closed] (HUDI-7017) Prevent full schema evolution from wrongly falling back to OOB

2023-11-08 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen closed HUDI-7017.

Resolution: Fixed

Fixed via master branch: d149d787e9d3dbf425c0dd5ca0265bed5fe2795f

> Prevent full schema evolution from wrongly falling back to OOB
> --
>
> Key: HUDI-7017
> URL: https://issues.apache.org/jira/browse/HUDI-7017
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: voon
>Assignee: voon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
> Attachments: image-2023-11-01-11-41-25-604.png, 
> image-2023-11-01-11-43-14-149.png
>
>
> For MOR tables that have these 2 configurations enabled:
>  
> {code:java}
> hoodie.schema.on.read.enable=true
> hoodie.datasource.read.extract.partition.values.from.path=true{code}
>  
>  
> BaseFileReader will use a *requiredSchemaReader* when reading some of the 
> parquet files. This BaseFileReader will have an empty *internalSchemaStr* 
> causing *Spark3XLegacyHoodieParquetInputFormat* to fall back to OOB schema 
> evolution.
>  
> Although there are required safeguards that are added in HUDI-5400 to force 
> the code execution path to use Hudi Full Schema Evolution, we should still 
> fix this so that future changes that may deprecate the use of 
> *Spark3XLegacyHoodieParquetInputFormat* will not cause issues.
>  
> A sample test to invoke this:
> {code:java}
> test("Test wrong fallback to OOB schema evolution") {
>   withRecordType()(withTempDir { tmp =>
> Seq("mor").foreach { tableType =>
>   val tableName = generateTableName
>   val tablePath = s"${new Path(tmp.getCanonicalPath, 
> tableName).toUri.toString}"
>   if (HoodieSparkUtils.gteqSpark3_1) {
> spark.sql("set " + SPARK_SQL_INSERT_INTO_OPERATION.key + "=upsert")
> spark.sql("set hoodie.schema.on.read.enable=true")
> 
> spark.sql("hoodie.datasource.read.extract.partition.values.from.path=true")
> // NOTE: This is required since as this tests use type coercions 
> which were only permitted in Spark 2.x
> //   and are disallowed now by default in Spark 3.x
> spark.sql("set spark.sql.storeAssignmentPolicy=legacy")
> createAndPreparePartitionTable(spark, tableName, tablePath, tableType)
> // date -> string
> spark.sql(s"alter table $tableName alter column col6 type String")
> checkAnswer(spark.sql(s"select col6 from $tableName where id = 
> 1").collect())(
>   Seq("2021-12-25")
> )
>   }
> }
>   })
> } {code}
>  
> Debugger snapshots:
> !image-2023-11-01-11-41-25-604.png|width=1197,height=596!
> As can be seen, *requiredSchema* (used as pruning input) has internalSchema 
> string, but *requiredDataSchema* does has a null internalSchema string.
>  
> !image-2023-11-01-11-43-14-149.png|width=1257,height=672!
> As a result, the internalSchemaStr that is passed into 
> Spark3XLegacyHoodieParquetFileFormat is null (which should not be the case)
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


(hudi) branch master updated: [HUDI-7055] Support reading only log files in file group reader-based Spark parquet file format (#10020)

2023-11-08 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 530640f61a5 [HUDI-7055] Support reading only log files in file group 
reader-based Spark parquet file format (#10020)
530640f61a5 is described below

commit 530640f61a5e3f8103d0b66ff866a2de995156b7
Author: Y Ethan Guo 
AuthorDate: Wed Nov 8 18:45:45 2023 -0800

[HUDI-7055] Support reading only log files in file group reader-based Spark 
parquet file format (#10020)
---
 .../SparkFileFormatInternalRowReaderContext.scala  | 16 +++--
 .../table/read/TestHoodieFileGroupReaderBase.java  | 71 --
 ...odieFileGroupReaderBasedParquetFileFormat.scala | 39 +++-
 .../read/TestHoodieFileGroupReaderOnSpark.scala|  6 +-
 4 files changed, 92 insertions(+), 40 deletions(-)

diff --git 
a/hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/SparkFileFormatInternalRowReaderContext.scala
 
b/hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/SparkFileFormatInternalRowReaderContext.scala
index af3d3fd239c..beca8852686 100644
--- 
a/hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/SparkFileFormatInternalRowReaderContext.scala
+++ 
b/hudi-client/hudi-spark-client/src/main/scala/org/apache/hudi/SparkFileFormatInternalRowReaderContext.scala
@@ -44,10 +44,12 @@ import scala.collection.mutable
  *
  * This uses Spark parquet reader to read parquet data files or parquet log 
blocks.
  *
- * @param baseFileReader  A reader that transforms a {@link PartitionedFile} 
to an iterator of {@link InternalRow}.
+ * @param baseFileReader  A reader that transforms a {@link PartitionedFile} 
to an iterator of
+ *{@link InternalRow}. This is required for reading 
the base file and
+ *not required for reading a file group with only log 
files.
  * @param partitionValues The values for a partition in which the file group 
lives.
  */
-class SparkFileFormatInternalRowReaderContext(baseFileReader: PartitionedFile 
=> Iterator[InternalRow],
+class SparkFileFormatInternalRowReaderContext(baseFileReader: 
Option[PartitionedFile => Iterator[InternalRow]],
   partitionValues: InternalRow) 
extends BaseSparkInternalRowReaderContext {
   lazy val sparkAdapter = SparkAdapterSupport.sparkAdapter
   lazy val sparkFileReaderFactory = new HoodieSparkFileReaderFactory
@@ -62,11 +64,11 @@ class 
SparkFileFormatInternalRowReaderContext(baseFileReader: PartitionedFile =>
 val fileInfo = sparkAdapter.getSparkPartitionedFileUtils
   .createPartitionedFile(partitionValues, filePath, start, length)
 if (FSUtils.isLogFile(filePath)) {
-  val structType: StructType = 
HoodieInternalRowUtils.getCachedSchema(dataSchema)
+  val structType: StructType = 
HoodieInternalRowUtils.getCachedSchema(requiredSchema)
   val projection: UnsafeProjection = 
HoodieInternalRowUtils.getCachedUnsafeProjection(structType, structType)
   new CloseableMappingIterator[InternalRow, UnsafeRow](
 sparkFileReaderFactory.newParquetFileReader(conf, 
filePath).asInstanceOf[HoodieSparkParquetReader]
-  .getInternalRowIterator(dataSchema, dataSchema),
+  .getInternalRowIterator(dataSchema, requiredSchema),
 new java.util.function.Function[InternalRow, UnsafeRow] {
   override def apply(data: InternalRow): UnsafeRow = {
 // NOTE: We have to do [[UnsafeProjection]] of incoming 
[[InternalRow]] to convert
@@ -75,7 +77,11 @@ class 
SparkFileFormatInternalRowReaderContext(baseFileReader: PartitionedFile =>
   }
 }).asInstanceOf[ClosableIterator[InternalRow]]
 } else {
-  new CloseableInternalRowIterator(baseFileReader.apply(fileInfo))
+  if (baseFileReader.isEmpty) {
+throw new IllegalArgumentException("Base file reader is missing when 
instantiating "
+  + "SparkFileFormatInternalRowReaderContext.");
+  }
+  new CloseableInternalRowIterator(baseFileReader.get.apply(fileInfo))
 }
   }
 
diff --git 
a/hudi-common/src/test/java/org/apache/hudi/common/table/read/TestHoodieFileGroupReaderBase.java
 
b/hudi-common/src/test/java/org/apache/hudi/common/table/read/TestHoodieFileGroupReaderBase.java
index febc0d32466..439948a6cc9 100644
--- 
a/hudi-common/src/test/java/org/apache/hudi/common/table/read/TestHoodieFileGroupReaderBase.java
+++ 
b/hudi-common/src/test/java/org/apache/hudi/common/table/read/TestHoodieFileGroupReaderBase.java
@@ -40,8 +40,9 @@ import org.apache.hudi.metadata.HoodieTableMetadata;
 
 import org.apache.avro.Schema;
 import org.apache.hadoop.conf.Configuration;
-import org.junit.jupiter.api.Test;
 import org.junit.jupiter.api.io.TempDir;
+import org.junit.jupiter.params.ParameterizedTest;
+import org.junit.jupiter.params.provider.ValueSource;
 
 import 

Re: [PR] [HUDI-7055] Support reading only log files in file group reader-based Spark parquet file format [hudi]

2023-11-08 Thread via GitHub


yihua merged PR #10020:
URL: https://github.com/apache/hudi/pull/10020


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (HUDI-6993) Support Flink 1.18

2023-11-08 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen closed HUDI-6993.

Resolution: Fixed

Fixed via master branch: bd136addc24f4d03608f5069b98c2748e463169f

> Support Flink 1.18
> --
>
> Key: HUDI-6993
> URL: https://issues.apache.org/jira/browse/HUDI-6993
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: flink
>Reporter: Prabhu Joseph
>Priority: Major
>  Labels: flink, pull-request-available, upgrade
> Fix For: 1.0.0
>
>
> This JIRA intends to support Flink-1.18 in Hudi.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7017) Prevent full schema evolution from wrongly falling back to OOB

2023-11-08 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-7017:
-
Fix Version/s: 1.0.0

> Prevent full schema evolution from wrongly falling back to OOB
> --
>
> Key: HUDI-7017
> URL: https://issues.apache.org/jira/browse/HUDI-7017
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: voon
>Assignee: voon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
> Attachments: image-2023-11-01-11-41-25-604.png, 
> image-2023-11-01-11-43-14-149.png
>
>
> For MOR tables that have these 2 configurations enabled:
>  
> {code:java}
> hoodie.schema.on.read.enable=true
> hoodie.datasource.read.extract.partition.values.from.path=true{code}
>  
>  
> BaseFileReader will use a *requiredSchemaReader* when reading some of the 
> parquet files. This BaseFileReader will have an empty *internalSchemaStr* 
> causing *Spark3XLegacyHoodieParquetInputFormat* to fall back to OOB schema 
> evolution.
>  
> Although there are required safeguards that are added in HUDI-5400 to force 
> the code execution path to use Hudi Full Schema Evolution, we should still 
> fix this so that future changes that may deprecate the use of 
> *Spark3XLegacyHoodieParquetInputFormat* will not cause issues.
>  
> A sample test to invoke this:
> {code:java}
> test("Test wrong fallback to OOB schema evolution") {
>   withRecordType()(withTempDir { tmp =>
> Seq("mor").foreach { tableType =>
>   val tableName = generateTableName
>   val tablePath = s"${new Path(tmp.getCanonicalPath, 
> tableName).toUri.toString}"
>   if (HoodieSparkUtils.gteqSpark3_1) {
> spark.sql("set " + SPARK_SQL_INSERT_INTO_OPERATION.key + "=upsert")
> spark.sql("set hoodie.schema.on.read.enable=true")
> 
> spark.sql("hoodie.datasource.read.extract.partition.values.from.path=true")
> // NOTE: This is required since as this tests use type coercions 
> which were only permitted in Spark 2.x
> //   and are disallowed now by default in Spark 3.x
> spark.sql("set spark.sql.storeAssignmentPolicy=legacy")
> createAndPreparePartitionTable(spark, tableName, tablePath, tableType)
> // date -> string
> spark.sql(s"alter table $tableName alter column col6 type String")
> checkAnswer(spark.sql(s"select col6 from $tableName where id = 
> 1").collect())(
>   Seq("2021-12-25")
> )
>   }
> }
>   })
> } {code}
>  
> Debugger snapshots:
> !image-2023-11-01-11-41-25-604.png|width=1197,height=596!
> As can be seen, *requiredSchema* (used as pruning input) has internalSchema 
> string, but *requiredDataSchema* does has a null internalSchema string.
>  
> !image-2023-11-01-11-43-14-149.png|width=1257,height=672!
> As a result, the internalSchemaStr that is passed into 
> Spark3XLegacyHoodieParquetFileFormat is null (which should not be the case)
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


(hudi) branch master updated: [HUDI-7017] Prevent full schema evolution from wrongly falling back to OOB schema evolution (#9966)

2023-11-08 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new d149d787e9d [HUDI-7017] Prevent full schema evolution from wrongly 
falling back to OOB schema evolution (#9966)
d149d787e9d is described below

commit d149d787e9d3dbf425c0dd5ca0265bed5fe2795f
Author: voonhous 
AuthorDate: Thu Nov 9 10:44:01 2023 +0800

[HUDI-7017] Prevent full schema evolution from wrongly falling back to OOB 
schema evolution (#9966)
---
 .../scala/org/apache/hudi/HoodieBaseRelation.scala| 19 ---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git 
a/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieBaseRelation.scala
 
b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieBaseRelation.scala
index 78c5cc4ca47..eaeff8bc7e9 100644
--- 
a/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieBaseRelation.scala
+++ 
b/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieBaseRelation.scala
@@ -715,16 +715,29 @@ abstract class HoodieBaseRelation(val sqlContext: 
SQLContext,
 if (extractPartitionValuesFromPartitionPath) {
   val partitionSchema = 
filterInPartitionColumns(tableSchema.structTypeSchema)
   val prunedDataStructSchema = 
prunePartitionColumns(tableSchema.structTypeSchema)
-  val prunedRequiredSchema = 
prunePartitionColumns(requiredSchema.structTypeSchema)
+  val prunedDataInternalSchema = pruneInternalSchema(tableSchema, 
prunedDataStructSchema)
+  val prunedRequiredStructSchema = 
prunePartitionColumns(requiredSchema.structTypeSchema)
+  val prunedRequiredInternalSchema = pruneInternalSchema(requiredSchema, 
prunedRequiredStructSchema)
 
   (partitionSchema,
-HoodieTableSchema(prunedDataStructSchema, 
convertToAvroSchema(prunedDataStructSchema, tableName).toString),
-HoodieTableSchema(prunedRequiredSchema, 
convertToAvroSchema(prunedRequiredSchema, tableName).toString))
+HoodieTableSchema(prunedDataStructSchema,
+  convertToAvroSchema(prunedDataStructSchema, tableName).toString, 
prunedDataInternalSchema),
+HoodieTableSchema(prunedRequiredStructSchema,
+  convertToAvroSchema(prunedRequiredStructSchema, tableName).toString, 
prunedRequiredInternalSchema))
 } else {
   (StructType(Nil), tableSchema, requiredSchema)
 }
   }
 
+  private def pruneInternalSchema(hoodieTableSchema: HoodieTableSchema, 
prunedStructSchema: StructType): Option[InternalSchema] = {
+if (hoodieTableSchema.internalSchema.isEmpty || 
hoodieTableSchema.internalSchema.get.isEmptySchema) {
+  Option.empty[InternalSchema]
+} else {
+  
Some(InternalSchemaUtils.pruneInternalSchema(hoodieTableSchema.internalSchema.get,
+prunedStructSchema.fields.map(_.name).toList.asJava))
+}
+  }
+
   private def filterInPartitionColumns(structType: StructType): StructType =
 StructType(structType.filter(f => partitionColumns.exists(col => 
resolver(f.name, col
 



Re: [PR] [HUDI-7017] Prevent full schema evolution from wrongly falling back t… [hudi]

2023-11-08 Thread via GitHub


danny0405 merged PR #9966:
URL: https://github.com/apache/hudi/pull/9966


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-6993) Support Flink 1.18

2023-11-08 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-6993:
-
Fix Version/s: 1.0.0

> Support Flink 1.18
> --
>
> Key: HUDI-6993
> URL: https://issues.apache.org/jira/browse/HUDI-6993
> Project: Apache Hudi
>  Issue Type: New Feature
>  Components: flink
>Reporter: Prabhu Joseph
>Priority: Major
>  Labels: flink, pull-request-available, upgrade
> Fix For: 1.0.0
>
>
> This JIRA intends to support Flink-1.18 in Hudi.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7017] Prevent full schema evolution from wrongly falling back t… [hudi]

2023-11-08 Thread via GitHub


danny0405 commented on PR #9966:
URL: https://github.com/apache/hudi/pull/9966#issuecomment-1803076235

   Tests passed: 
https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=20721=results


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch master updated (7c17964fe73 -> bd136addc24)

2023-11-08 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 7c17964fe73 [HUDI-6495][RFC-66] Non-blocking Concurrency Control 
(#7907)
 add bd136addc24 [HUDI-6993] Support Flink 1.18 (#9949)

No new revisions were added by this update.

Summary of changes:
 .github/workflows/bot.yml  | 12 +++--
 README.md  |  7 +--
 azure-pipelines-20230430.yml   |  7 ++-
 hudi-flink-datasource/hudi-flink/pom.xml   |  1 +
 .../hudi/table/catalog/HoodieHiveCatalog.java  | 36 +++
 .../apache/hudi/adapter/HiveCatalogConstants.java  | 51 +
 .../apache/hudi/adapter/HiveCatalogConstants.java  | 52 ++
 .../apache/hudi/adapter/HiveCatalogConstants.java  | 52 ++
 .../apache/hudi/adapter/HiveCatalogConstants.java  | 52 ++
 .../apache/hudi/adapter/HiveCatalogConstants.java  | 52 ++
 .../{hudi-flink1.17.x => hudi-flink1.18.x}/pom.xml | 26 ++-
 .../adapter/AbstractStreamOperatorAdapter.java |  0
 .../AbstractStreamOperatorFactoryAdapter.java  |  0
 .../adapter/DataStreamScanProviderAdapter.java |  0
 .../adapter/DataStreamSinkProviderAdapter.java |  0
 .../apache/hudi/adapter/HiveCatalogConstants.java  | 49 
 .../hudi/adapter/MailboxExecutorAdapter.java   |  0
 .../apache/hudi/adapter/MaskingOutputAdapter.java  |  0
 .../hudi/adapter/OperatorCoordinatorAdapter.java   |  0
 .../apache/hudi/adapter/RateLimiterAdapter.java|  0
 .../hudi/adapter/SortCodeGeneratorAdapter.java |  0
 .../adapter/SupportsRowLevelDeleteAdapter.java |  0
 .../adapter/SupportsRowLevelUpdateAdapter.java |  0
 .../main/java/org/apache/hudi/adapter/Utils.java   |  0
 .../table/format/cow/ParquetSplitReaderUtil.java   |  0
 .../table/format/cow/vector/HeapArrayVector.java   |  0
 .../format/cow/vector/HeapMapColumnVector.java |  0
 .../format/cow/vector/HeapRowColumnVector.java |  0
 .../format/cow/vector/ParquetDecimalVector.java|  0
 .../cow/vector/reader/AbstractColumnReader.java|  0
 .../cow/vector/reader/ArrayColumnReader.java   |  0
 .../vector/reader/BaseVectorizedColumnReader.java  |  0
 .../cow/vector/reader/EmptyColumnReader.java   |  0
 .../vector/reader/FixedLenBytesColumnReader.java   |  0
 .../vector/reader/Int64TimestampColumnReader.java  |  0
 .../format/cow/vector/reader/MapColumnReader.java  |  0
 .../reader/ParquetColumnarRowSplitReader.java  |  0
 .../cow/vector/reader/ParquetDataColumnReader.java |  0
 .../reader/ParquetDataColumnReaderFactory.java |  0
 .../format/cow/vector/reader/RowColumnReader.java  |  0
 .../format/cow/vector/reader/RunLengthDecoder.java |  0
 .../org/apache/hudi/adapter/OutputAdapter.java |  0
 .../adapter/StateInitializationContextAdapter.java |  0
 .../adapter/StreamingRuntimeContextAdapter.java|  0
 .../org/apache/hudi/adapter/TestStreamConfigs.java |  0
 .../org/apache/hudi/adapter/TestTableEnvs.java |  0
 hudi-flink-datasource/pom.xml  |  1 +
 ...ark332.sh => build_flink1180hive313spark332.sh} |  6 +--
 ...ark340.sh => build_flink1180hive313spark340.sh} |  6 +--
 packaging/bundle-validation/ci_run.sh  |  2 +
 pom.xml| 37 ---
 scripts/release/deploy_staging_jars.sh |  1 +
 scripts/release/validate_staged_bundles.sh |  2 +-
 53 files changed, 403 insertions(+), 49 deletions(-)
 create mode 100644 
hudi-flink-datasource/hudi-flink1.13.x/src/main/java/org/apache/hudi/adapter/HiveCatalogConstants.java
 create mode 100644 
hudi-flink-datasource/hudi-flink1.14.x/src/main/java/org/apache/hudi/adapter/HiveCatalogConstants.java
 create mode 100644 
hudi-flink-datasource/hudi-flink1.15.x/src/main/java/org/apache/hudi/adapter/HiveCatalogConstants.java
 create mode 100644 
hudi-flink-datasource/hudi-flink1.16.x/src/main/java/org/apache/hudi/adapter/HiveCatalogConstants.java
 create mode 100644 
hudi-flink-datasource/hudi-flink1.17.x/src/main/java/org/apache/hudi/adapter/HiveCatalogConstants.java
 copy hudi-flink-datasource/{hudi-flink1.17.x => hudi-flink1.18.x}/pom.xml (87%)
 copy hudi-flink-datasource/{hudi-flink1.14.x => 
hudi-flink1.18.x}/src/main/java/org/apache/hudi/adapter/AbstractStreamOperatorAdapter.java
 (100%)
 copy hudi-flink-datasource/{hudi-flink1.14.x => 
hudi-flink1.18.x}/src/main/java/org/apache/hudi/adapter/AbstractStreamOperatorFactoryAdapter.java
 (100%)
 copy hudi-flink-datasource/{hudi-flink1.15.x => 
hudi-flink1.18.x}/src/main/java/org/apache/hudi/adapter/DataStreamScanProviderAdapter.java
 (100%)
 copy hudi-flink-datasource/{hudi-flink1.15.x => 
hudi-flink1.18.x}/src/main/java/org/apache/hudi/adapter/DataStreamSinkProviderAdapter.java
 (100%)
 create mode 100644 

Re: [PR] [HUDI-6993] Support Flink 1.18 [hudi]

2023-11-08 Thread via GitHub


danny0405 merged PR #9949:
URL: https://github.com/apache/hudi/pull/9949


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7057) Support CopyToTableProcedure with patitial column copy

2023-11-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7057:
-
Labels: pull-request-available sparksql  (was: sparksql)

> Support CopyToTableProcedure with patitial column copy 
> ---
>
> Key: HUDI-7057
> URL: https://issues.apache.org/jira/browse/HUDI-7057
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: xy
>Assignee: xy
>Priority: Major
>  Labels: pull-request-available, sparksql
>
> currently CopyToTableProcedure only support all schema copy to a new 
> table,but many sences user need a part columns from table schema 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-6993] Support Flink 1.18 [hudi]

2023-11-08 Thread via GitHub


danny0405 commented on PR #9949:
URL: https://github.com/apache/hudi/pull/9949#issuecomment-1803074942

   Tests passed: 
https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=20702=results


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [HUDI-7057] Support CopyToTableProcedure with patitial column copy [hudi]

2023-11-08 Thread via GitHub


xuzifu666 opened a new pull request, #10025:
URL: https://github.com/apache/hudi/pull/10025

   ### Change Logs
   
   currently CopyToTableProcedure only support all schema copy to a new 
table,but many sences user need a part columns from table schema 
   
   ### Impact
   
   none
   
   ### Risk level (write none, low medium or high below)
   
   lower
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7057) Support CopyToTableProcedure with patitial column copy

2023-11-08 Thread xy (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xy updated HUDI-7057:
-
Labels: sparksql  (was: )

> Support CopyToTableProcedure with patitial column copy 
> ---
>
> Key: HUDI-7057
> URL: https://issues.apache.org/jira/browse/HUDI-7057
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: xy
>Assignee: xy
>Priority: Major
>  Labels: sparksql
>
> currently CopyToTableProcedure only support all schema copy to a new 
> table,but many sences user need a part columns from table schema 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7057) Support CopyToTableProcedure with patitial column copy

2023-11-08 Thread xy (Jira)
xy created HUDI-7057:


 Summary: Support CopyToTableProcedure with patitial column copy 
 Key: HUDI-7057
 URL: https://issues.apache.org/jira/browse/HUDI-7057
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: xy


currently CopyToTableProcedure only support all schema copy to a new table,but 
many sences user need a part columns from table schema 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-7057) Support CopyToTableProcedure with patitial column copy

2023-11-08 Thread xy (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xy reassigned HUDI-7057:


Assignee: xy

> Support CopyToTableProcedure with patitial column copy 
> ---
>
> Key: HUDI-7057
> URL: https://issues.apache.org/jira/browse/HUDI-7057
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: xy
>Assignee: xy
>Priority: Major
>
> currently CopyToTableProcedure only support all schema copy to a new 
> table,but many sences user need a part columns from table schema 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-6806] Support Spark 3.5.0 [hudi]

2023-11-08 Thread via GitHub


hudi-bot commented on PR #9717:
URL: https://github.com/apache/hudi/pull/9717#issuecomment-1803071474

   
   ## CI report:
   
   * 9b8fdd2d1b69da528069e364790b53af1d6150af UNKNOWN
   * 43253eac6f27abbb614bb80d514d5ea3e30e09a1 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20747)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7054][FOLLOW_UP] HoodieCatalogTable should ignore lazily deleted partitions [hudi]

2023-11-08 Thread via GitHub


hudi-bot commented on PR #10024:
URL: https://github.com/apache/hudi/pull/10024#issuecomment-1803071823

   
   ## CI report:
   
   * 17566af6a57a43de3b2a5d5ab7bcd8bd28336307 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20752)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Some .commit files generated by compaction with 0 size [hudi]

2023-11-08 Thread via GitHub


zdl1 commented on issue #10017:
URL: https://github.com/apache/hudi/issues/10017#issuecomment-1803070116

   Thanks for your help, and there is another question, for an existing 
partition, the new insertion record to an existing bucket will be marked as 
**numUpdateWrites**, so how could I confirm how many new records have been 
written in this delta commit? I will appreciate it if you could help me with it.
   @ad1happy2go @danny0405 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7054][FOLLOW_UP] HoodieCatalogTable should ignore lazily deleted partitions [hudi]

2023-11-08 Thread via GitHub


hudi-bot commented on PR #10024:
URL: https://github.com/apache/hudi/pull/10024#issuecomment-1803066773

   
   ## CI report:
   
   * 17566af6a57a43de3b2a5d5ab7bcd8bd28336307 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7053] Fix the filter pushdown logic for file group reader [hudi]

2023-11-08 Thread via GitHub


hudi-bot commented on PR #10018:
URL: https://github.com/apache/hudi/pull/10018#issuecomment-1803066730

   
   ## CI report:
   
   * c5812540396b56db64df779ab7147cc0cede626a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20735)
 
   * 0f4e8c208b5614736e450657ad56810bc4060ea4 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20751)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6495][RFC-66] Non-blocking Concurrency Control [hudi]

2023-11-08 Thread via GitHub


vinothchandar merged PR #7907:
URL: https://github.com/apache/hudi/pull/7907


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch master updated (43a39b907bc -> 7c17964fe73)

2023-11-08 Thread vinoth
This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 43a39b907bc ShowPartitionsCommand should consider lazy 
delete_partitions (#10019)
 add 7c17964fe73 [HUDI-6495][RFC-66] Non-blocking Concurrency Control 
(#7907)

No new revisions were added by this update.

Summary of changes:
 rfc/rfc-66/compaction.png| Bin 0 -> 104613 bytes
 rfc/rfc-66/log_file_sequence.png | Bin 0 -> 62499 bytes
 rfc/rfc-66/multi_writer.png  | Bin 0 -> 83780 bytes
 rfc/rfc-66/non_serial_compaction.png | Bin 0 -> 208944 bytes
 rfc/rfc-66/rfc-66.md | 318 +++
 5 files changed, 318 insertions(+)
 create mode 100644 rfc/rfc-66/compaction.png
 create mode 100644 rfc/rfc-66/log_file_sequence.png
 create mode 100644 rfc/rfc-66/multi_writer.png
 create mode 100644 rfc/rfc-66/non_serial_compaction.png
 create mode 100644 rfc/rfc-66/rfc-66.md



Re: [PR] [HUDI-7055] Support reading only log files in file group reader-based Spark parquet file format [hudi]

2023-11-08 Thread via GitHub


hudi-bot commented on PR #10020:
URL: https://github.com/apache/hudi/pull/10020#issuecomment-1803061504

   
   ## CI report:
   
   * 8b39f72f16c9b708667dc17ee51875cf3c9b7364 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20743)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7053] Fix the filter pushdown logic for file group reader [hudi]

2023-11-08 Thread via GitHub


hudi-bot commented on PR #10018:
URL: https://github.com/apache/hudi/pull/10018#issuecomment-1803061463

   
   ## CI report:
   
   * c5812540396b56db64df779ab7147cc0cede626a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20735)
 
   * 0f4e8c208b5614736e450657ad56810bc4060ea4 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6872] Simplify Out Of Box Schema Evolution Functionality [hudi]

2023-11-08 Thread via GitHub


hudi-bot commented on PR #9743:
URL: https://github.com/apache/hudi/pull/9743#issuecomment-1803061117

   
   ## CI report:
   
   * 097ef6176650413eef2a4c3581ca6e48ea43788f UNKNOWN
   * e32b58f7ce1880568566be0c8a6940ae2f3a1016 UNKNOWN
   * db06d7f6394d26bec65a629ed2b567754d28a46a Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20750)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [HUDI-7054][FOLLOW_UP] HoodieCatalogTable should ignore lazily deleted partitions [hudi]

2023-11-08 Thread via GitHub


boneanxs opened a new pull request, #10024:
URL: https://github.com/apache/hudi/pull/10024

   ### Change Logs
   
   _Describe context and summary for this change. Highlight if any code was 
copied._
   Thinking it might be better to directly let `HoodieCatalogTable` to filter 
deleted partitions, it shouldn't show deleted partitions when calling 
`getPartitionPaths`.
   
   This can also fix `RepairHoodieTableCommand` to not repair deleted 
partitions.
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance 
impact._
   None
   
   ### Risk level (write none, low medium or high below)
   
   _If medium or high, explain what verification was done to mitigate the 
risks._
   None
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6508] Fix compile errors with JDK11 [hudi]

2023-11-08 Thread via GitHub


Zouxxyy commented on PR #9300:
URL: https://github.com/apache/hudi/pull/9300#issuecomment-1803034364

   > @Zouxxyy : Can you fix the merge conflicts ?
   
   done


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Change some default configs for 1.0.0-beta [hudi]

2023-11-08 Thread via GitHub


hudi-bot commented on PR #9998:
URL: https://github.com/apache/hudi/pull/9998#issuecomment-1803028334

   
   ## CI report:
   
   * 7e450aee63b81c2d28d04d927eadf5ca006e8a19 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20748)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[I] [SUPPORT] Possible memory leak issue for org.apache.hadoop.hive.conf.HiveConf while using Flink into Hudi [hudi]

2023-11-08 Thread via GitHub


xmubeta opened a new issue, #10023:
URL: https://github.com/apache/hudi/issues/10023

   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   
   - Join the mailing list to engage in conversations and get faster support at 
dev-subscr...@hudi.apache.org.
   
   - If you have triaged this as a bug, then file an 
[issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   I am using Flink SQL to ingest data from AWS Kinesis to Hudi on S3. I used 
AWS Glue catalog as Hive metastore. hive_sync.enable is set to true in SQL. The 
ingestion works well. However after running a few hours or days, the jobmanager 
failed with OutOfMemory. I checked the hdump and found 
org.apache.hadoop.hive.conf.HiveConf took 80.77% memory. It seems to be related 
to HiveSyncContext.
   
   
   
   The suspect leak from Eclipse Memory Analyzer:
   
   12 instances of "org.apache.hadoop.hive.conf.HiveConf", loaded by 
"sun.misc.Launcher$AppClassLoader @ 0xe400bdf8" occupy 338,544,384 (80.77%) 
bytes. 
   
   Biggest instances:
   •org.apache.hadoop.hive.conf.HiveConf @ 0xe71197b0 - 33,702,712 (8.04%) 
bytes. 
   •org.apache.hadoop.hive.conf.HiveConf @ 0xe72d9e30 - 33,702,712 (8.04%) 
bytes. 
   •org.apache.hadoop.hive.conf.HiveConf @ 0xe77c62c0 - 33,702,712 (8.04%) 
bytes. 
   •org.apache.hadoop.hive.conf.HiveConf @ 0xe787f640 - 33,702,712 (8.04%) 
bytes. 
   •org.apache.hadoop.hive.conf.HiveConf @ 0xe798fd00 - 33,702,712 (8.04%) 
bytes. 
   •org.apache.hadoop.hive.conf.HiveConf @ 0xe7a9b0f0 - 33,702,712 (8.04%) 
bytes. 
   •org.apache.hadoop.hive.conf.HiveConf @ 0xe812a8c8 - 33,702,712 (8.04%) 
bytes. 
   •org.apache.hadoop.hive.conf.HiveConf @ 0xe82d0af0 - 33,702,712 (8.04%) 
bytes. 
   •org.apache.hadoop.hive.conf.HiveConf @ 0xe84c10c8 - 33,702,712 (8.04%) 
bytes. 
   •org.apache.hadoop.hive.conf.HiveConf @ 0xe8736300 - 33,702,712 (8.04%) 
bytes. 
   
   
   Keywords
   sun.misc.Launcher$AppClassLoader @ 0xe400bdf8
   org.apache.hadoop.hive.conf.HiveConf
   
   
   

   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Set up an AWS EMR 6.10.0 with Flink 1.16.0 +Hive 3.1 + Hudi 0.13.0
   2. Set up an AWS Kinesis and ingest data into it.
   3. Run a Flink SQL job to ingest to Hudi on S3 from Kinesis
   4. Run for a few hours or days, could get OOM.
   
   **Expected behavior**
   No OOM issue.
   
   **Environment Description**
   
   * Hudi version : 0.13.0
   
   * Spark version : 3.3.1
   
   * Hive version : 3.1
   
   * Hadoop version : 3.3.3
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   ```
   2023-11-09 06:59:55,475 ERROR 
org.apache.hudi.sink.StreamWriteOperatorCoordinator  [] - Executor 
executes action [commits the instant 20231109065505712] error
   java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.stream.StreamSupport.stream(StreamSupport.java:69) 
~[?:1.8.0_392]
at java.util.Collection.stream(Collection.java:581) ~[?:1.8.0_392]
at 
org.apache.hudi.common.table.timeline.TimelineLayout$TimelineLayoutV1.lambda$filterHoodieInstants$2(TimelineLayout.java:68)
 ~[hudi-flink1.16-bundle-0.13.0.jar:0.13.0]
at 
org.apache.hudi.common.table.timeline.TimelineLayout$TimelineLayoutV1$$Lambda$1187/1033743503.apply(Unknown
 Source) ~[?:?]
at 
java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) 
~[?:1.8.0_392]
at 
java.util.HashMap$ValueSpliterator.forEachRemaining(HashMap.java:1652) 
~[?:1.8.0_392]
at 
java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) 
~[?:1.8.0_392]
at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) 
~[?:1.8.0_392]
at 
java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) 
~[?:1.8.0_392]
at 
java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) 
~[?:1.8.0_392]
at 
java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566) 
~[?:1.8.0_392]
at 
org.apache.hudi.common.table.HoodieTableMetaClient.scanHoodieInstantsFromFileSystem(HoodieTableMetaClient.java:651)
 ~[hudi-flink1.16-bundle-0.13.0.jar:0.13.0]
at 
org.apache.hudi.common.table.HoodieTableMetaClient.scanHoodieInstantsFromFileSystem(HoodieTableMetaClient.java:625)
 ~[hudi-flink1.16-bundle-0.13.0.jar:0.13.0]
at 
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.(HoodieActiveTimeline.java:163)
 ~[hudi-flink1.16-bundle-0.13.0.jar:0.13.0]
at 
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.(HoodieActiveTimeline.java:155)
 ~[hudi-flink1.16-bundle-0.13.0.jar:0.13.0]
at 

Re: [PR] [HUDI-6872] Simplify Out Of Box Schema Evolution Functionality [hudi]

2023-11-08 Thread via GitHub


hudi-bot commented on PR #9743:
URL: https://github.com/apache/hudi/pull/9743#issuecomment-1802979926

   
   ## CI report:
   
   * 097ef6176650413eef2a4c3581ca6e48ea43788f UNKNOWN
   * e32b58f7ce1880568566be0c8a6940ae2f3a1016 UNKNOWN
   * 58d6194db3965823e985b646738d9d2399e4a5d8 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20741)
 
   * db06d7f6394d26bec65a629ed2b567754d28a46a Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20750)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6872] Simplify Out Of Box Schema Evolution Functionality [hudi]

2023-11-08 Thread via GitHub


hudi-bot commented on PR #9743:
URL: https://github.com/apache/hudi/pull/9743#issuecomment-1802973496

   
   ## CI report:
   
   * 097ef6176650413eef2a4c3581ca6e48ea43788f UNKNOWN
   * e32b58f7ce1880568566be0c8a6940ae2f3a1016 UNKNOWN
   * 58d6194db3965823e985b646738d9d2399e4a5d8 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20741)
 
   * db06d7f6394d26bec65a629ed2b567754d28a46a UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7044] Skip reading records for delete blocks for positional merging [hudi]

2023-11-08 Thread via GitHub


hudi-bot commented on PR #10005:
URL: https://github.com/apache/hudi/pull/10005#issuecomment-1802967731

   
   ## CI report:
   
   * 83c12bf57803d832fbbfe2eed9cae30d987db175 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20745)
 
   * baaf10ac8ac319fb3e776b33b4386755bf034cb6 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20749)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-6806] Support Spark 3.5.0 [hudi]

2023-11-08 Thread via GitHub


hudi-bot commented on PR #9717:
URL: https://github.com/apache/hudi/pull/9717#issuecomment-1802967250

   
   ## CI report:
   
   * 9b8fdd2d1b69da528069e364790b53af1d6150af UNKNOWN
   * 94c46b0aaa5a205e767ed088ad631cc894922ea3 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20744)
 
   * 43253eac6f27abbb614bb80d514d5ea3e30e09a1 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20747)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] UPSERTs are taking time [hudi]

2023-11-08 Thread via GitHub


soumilshah1995 commented on issue #9976:
URL: https://github.com/apache/hudi/issues/9976#issuecomment-1802964892

   can you use new RLI ?
   
   
https://www.linkedin.com/pulse/upsert-performance-evaluation-hudi-014-spark-341-record-soumil-shah-oupre%3FtrackingId=PeKhUkGNTkuSD1VRqoI3rw%253D%253D/?trackingId=PeKhUkGNTkuSD1VRqoI3rw%3D%3D


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT]insert_overwrite mode writing 2 times more duplicates [hudi]

2023-11-08 Thread via GitHub


soumilshah1995 commented on issue #9992:
URL: https://github.com/apache/hudi/issues/9992#issuecomment-1802963658

   here is sample works fine for me 
   
https://soumilshah1995.blogspot.com/2023/03/rfc-18-insert-overwrite-in-apache-hudi.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7044] Skip reading records for delete blocks for positional merging [hudi]

2023-11-08 Thread via GitHub


hudi-bot commented on PR #10005:
URL: https://github.com/apache/hudi/pull/10005#issuecomment-1802909950

   
   ## CI report:
   
   * 579179dc1c43cf4fa02c3f023187ac9f8da06ffa Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20710)
 
   * 83c12bf57803d832fbbfe2eed9cae30d987db175 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20745)
 
   * baaf10ac8ac319fb3e776b33b4386755bf034cb6 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=20749)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



  1   2   3   >