Re: [I] [SUPPORT] Some spark writer job failed caused by UserGroupInformation lost in the new thread of timeline service threadpool [hudi]

2024-04-19 Thread via GitHub


beyond1920 closed issue #11030: [SUPPORT] Some spark writer job failed caused 
by UserGroupInformation  lost in the new thread of timeline service threadpool
URL: https://github.com/apache/hudi/issues/11030


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Some spark writer job failed caused by UserGroupInformation lost in the new thread of timeline service threadpool [hudi]

2024-04-19 Thread via GitHub


beyond1920 commented on issue #11030:
URL: https://github.com/apache/hudi/issues/11030#issuecomment-2067525049

   Resolved by [pr#11039](https://github.com/apache/hudi/pull/11039).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch master updated: [HUDI-7628] Rename FSUtils.getPartitionPath to constructAbsolutePath (#11054)

2024-04-19 Thread yihua
This is an automated email from the ASF dual-hosted git repository.

yihua pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 683c4998d6d [HUDI-7628] Rename FSUtils.getPartitionPath to 
constructAbsolutePath (#11054)
683c4998d6d is described below

commit 683c4998d6de28605f9e94c05972258a22f2e5b9
Author: Vova Kolmakov 
AuthorDate: Sat Apr 20 08:11:07 2024 +0700

[HUDI-7628] Rename FSUtils.getPartitionPath to constructAbsolutePath 
(#11054)

Co-authored-by: Vova Kolmakov 
---
 .../hudi/aws/sync/AWSGlueCatalogSyncClient.java|  4 +--
 .../apache/hudi/cli/commands/RepairsCommand.java   |  4 +--
 .../apache/hudi/client/CompactionAdminClient.java  |  4 +--
 .../index/bucket/ConsistentBucketIndexUtils.java   |  8 +++---
 .../org/apache/hudi/io/HoodieAppendHandle.java |  2 +-
 .../org/apache/hudi/io/HoodieCreateHandle.java |  2 +-
 .../java/org/apache/hudi/io/HoodieMergeHandle.java |  2 +-
 .../java/org/apache/hudi/io/HoodieWriteHandle.java |  4 +--
 .../metadata/HoodieBackedTableMetadataWriter.java  |  2 +-
 .../hudi/table/action/compact/HoodieCompactor.java |  2 +-
 .../table/action/rollback/BaseRollbackHelper.java  |  2 +-
 .../rollback/ListingBasedRollbackStrategy.java |  6 ++--
 .../ttl/strategy/KeepByCreationTimeStrategy.java   |  2 +-
 .../marker/TimelineServerBasedWriteMarkers.java|  4 +--
 .../org/apache/hudi/table/marker/WriteMarkers.java |  2 +-
 .../io/storage/row/HoodieRowDataCreateHandle.java  |  4 +--
 .../hudi/io/storage/row/HoodieRowCreateHandle.java |  4 +--
 .../TestSavepointRestoreMergeOnRead.java   |  8 +++---
 .../java/org/apache/hudi/table/TestCleaner.java|  4 +--
 ...dieSparkMergeOnReadTableInsertUpdateDelete.java |  2 +-
 .../hudi/table/marker/TestWriteMarkersBase.java|  2 +-
 .../java/org/apache/hudi/common/fs/FSUtils.java| 32 +++---
 .../hudi/common/model/CompactionOperation.java |  2 +-
 .../hudi/common/model/HoodieCommitMetadata.java|  8 +++---
 .../hudi/common/table/cdc/HoodieCDCExtractor.java  |  4 +--
 .../clean/CleanMetadataV1MigrationHandler.java |  2 +-
 .../clean/CleanPlanV2MigrationHandler.java |  2 +-
 .../compaction/CompactionV1MigrationHandler.java   |  2 +-
 .../table/view/AbstractTableFileSystemView.java|  4 +--
 .../IncrementalTimelineSyncFileSystemView.java |  2 +-
 .../sink/compact/ITTestHoodieFlinkCompactor.java   |  2 +-
 .../org/apache/hudi/IncrementalRelation.scala  |  2 +-
 .../AlterHoodieTableAddPartitionCommand.scala  |  2 +-
 .../RepairAddpartitionmetaProcedure.scala  |  2 +-
 .../RepairMigratePartitionMetaProcedure.scala  |  2 +-
 .../procedures/ShowInvalidParquetProcedure.scala   |  2 +-
 .../TestSparkConsistentBucketClustering.java   |  2 +-
 .../apache/hudi/sync/adb/HoodieAdbJdbcClient.java  | 10 +++
 .../org/apache/hudi/hive/ddl/HMSDDLExecutor.java   |  4 +--
 .../hudi/hive/ddl/QueryBasedDDLExecutor.java   |  4 +--
 .../org/apache/hudi/hive/TestHiveSyncTool.java |  2 +-
 .../apache/hudi/sync/common/HoodieSyncClient.java  |  4 +--
 .../hudi/utilities/HoodieDataTableUtils.java   |  2 +-
 .../utilities/HoodieMetadataTableValidator.java|  8 +++---
 .../hudi/utilities/HoodieSnapshotCopier.java   |  4 +--
 .../hudi/utilities/HoodieSnapshotExporter.java |  4 +--
 46 files changed, 94 insertions(+), 94 deletions(-)

diff --git 
a/hudi-aws/src/main/java/org/apache/hudi/aws/sync/AWSGlueCatalogSyncClient.java 
b/hudi-aws/src/main/java/org/apache/hudi/aws/sync/AWSGlueCatalogSyncClient.java
index e06db9f2ba4..6dda51fd134 100644
--- 
a/hudi-aws/src/main/java/org/apache/hudi/aws/sync/AWSGlueCatalogSyncClient.java
+++ 
b/hudi-aws/src/main/java/org/apache/hudi/aws/sync/AWSGlueCatalogSyncClient.java
@@ -303,7 +303,7 @@ public class AWSGlueCatalogSyncClient extends 
HoodieSyncClient {
 try {
   StorageDescriptor sd = table.storageDescriptor();
   List partitionInputList = 
partitionsToAdd.stream().map(partition -> {
-String fullPartitionPath = 
FSUtils.getPartitionPathInHadoopPath(s3aToS3(getBasePath()), 
partition).toString();
+String fullPartitionPath = 
FSUtils.constructAbsolutePathInHadoopPath(s3aToS3(getBasePath()), 
partition).toString();
 List partitionValues = 
partitionValueExtractor.extractPartitionValuesInPath(partition);
 StorageDescriptor partitionSD = sd.copy(copySd -> 
copySd.location(fullPartitionPath));
 return 
PartitionInput.builder().values(partitionValues).storageDescriptor(partitionSD).build();
@@ -347,7 +347,7 @@ public class AWSGlueCatalogSyncClient extends 
HoodieSyncClient {
 try {
   StorageDescriptor sd = table.storageDescriptor();
   List updatePartitionEntries = 
changedPartitions.stream().map(partition -> {
-String fullPartitionPath = 
FSUtils.getPartitionPathInHadoopPath(s3aToS3(getBasePath()), 

Re: [PR] [HUDI-7628] Rename FSUtils.getPartitionPath to constructAbsolutePath [hudi]

2024-04-19 Thread via GitHub


yihua merged PR #11054:
URL: https://github.com/apache/hudi/pull/11054


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch asf-site updated: [DOCS] Added configurations of Hudi table, file-based SQL source, Hudi error table, and timestamp key generator to configuration listing (#11058)

2024-04-19 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new e3894931de4 [DOCS] Added configurations of Hudi table, file-based SQL 
source, Hudi error table, and timestamp key generator to configuration listing 
(#11058)
e3894931de4 is described below

commit e3894931de489f222730972d76f783ffd67cccac
Author: Geser Dugarov 
AuthorDate: Sat Apr 20 07:44:31 2024 +0700

[DOCS] Added configurations of Hudi table, file-based SQL source, Hudi 
error table, and timestamp key generator to configuration listing (#11058)
---
 website/docs/basic_configurations.md |  91 -
 website/docs/configurations.md   | 125 ++-
 2 files changed, 214 insertions(+), 2 deletions(-)

diff --git a/website/docs/basic_configurations.md 
b/website/docs/basic_configurations.md
index 2f18ad3e885..1fc301521e1 100644
--- a/website/docs/basic_configurations.md
+++ b/website/docs/basic_configurations.md
@@ -1,12 +1,13 @@
 ---
 title: Basic Configurations
 summary: This page covers the basic configurations you may use to write/read 
Hudi tables. This page only features a subset of the most frequently used 
configurations. For a full list of all configs, please visit the [All 
Configurations](/docs/configurations) page.
-last_modified_at: 2024-04-15T09:56:05.413
+last_modified_at: 2024-04-19T18:21:42.88
 ---
 
 
 This page covers the basic configurations you may use to write/read Hudi 
tables. This page only features a subset of the most frequently used 
configurations. For a full list of all configs, please visit the [All 
Configurations](/docs/configurations) page.
 
+- [**Hudi Table Config**](#TABLE_CONFIG): Basic Hudi Table configuration 
parameters.
 - [**Spark Datasource Configs**](#SPARK_DATASOURCE): These configs control the 
Hudi Spark Datasource, providing ability to define keys/partitioning, pick out 
the write operation, specify how to merge records or choosing query type to 
read.
 - [**Flink Sql Configs**](#FLINK_SQL): These configs control the Hudi Flink 
SQL source/sink connectors, providing ability to define record keys, pick out 
the write operation, specify how to merge records, enable/disable asynchronous 
compaction or choosing query type to read.
 - [**Write Client Configs**](#WRITE_CLIENT): Internally, the Hudi datasource 
uses a RDD based HoodieWriteClient API to actually perform writes to storage. 
These configs provide deep control over lower level aspects like file sizing, 
compression, parallelism, compaction, write schema, cleaning etc. Although Hudi 
provides sane defaults, from time-time these configs may need to be tweaked to 
optimize for specific workloads.
@@ -20,6 +21,56 @@ This page covers the basic configurations you may use to 
write/read Hudi tables.
 In the tables below **(N/A)** means there is no default value set
 :::
 
+## Hudi Table Config {#TABLE_CONFIG}
+Basic Hudi Table configuration parameters.
+
+
+### Hudi Table Basic Configs {#Hudi-Table-Basic-Configs}
+Configurations of the Hudi Table like type of ingestion, storage formats, hive 
table name etc. Configurations are loaded from hoodie.properties, these 
properties are usually set during initializing a path as hoodie base path and 
never changes during the lifetime of a hoodie table.
+
+
+
+
+[**Basic Configs**](#Hudi-Table-Basic-Configs-basic-configs)
+
+
+| Config Name  
| Default   
  | Description 



  [...]
+| 

 | --- | 
-
 [...]
+| [hoodie.bootstrap.base.path](#hoodiebootstrapbasepath)   
| (N/A) 
  | Base path of the dataset that needs to be bootstrapped as a Hudi table`Config Param: BOOTSTRAP_BASE_PATH`   


Re: [PR] [DOCS] [MINOR] Added configurations of Hudi table, file-based SQL source, Hudi error table, and timestamp key generator to configuration listing [hudi]

2024-04-19 Thread via GitHub


danny0405 merged PR #11058:
URL: https://github.com/apache/hudi/pull/11058


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch master updated: [MINOR] Added configurations of Hudi table, file-based SQL source, Hudi error table, and timestamp key generator to configuration listing (#11057)

2024-04-19 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 60424c5f998 [MINOR] Added configurations of Hudi table, file-based SQL 
source, Hudi error table, and timestamp key generator to configuration listing 
(#11057)
60424c5f998 is described below

commit 60424c5f9987f35cc21b0288ac11bd87602ae1c1
Author: Geser Dugarov 
AuthorDate: Sat Apr 20 07:43:37 2024 +0700

[MINOR] Added configurations of Hudi table, file-based SQL source, Hudi 
error table, and timestamp key generator to configuration listing (#11057)
---
 .../org/apache/hudi/config/HoodieErrorTableConfig.java  |  3 ++-
 .../org/apache/hudi/common/config/ConfigGroups.java |  4 
 .../hudi/common/config/TimestampKeyGeneratorConfig.java |  2 +-
 .../org/apache/hudi/common/table/HoodieTableConfig.java | 17 ++---
 .../hudi/utilities/config/SqlFileBasedSourceConfig.java |  3 ++-
 5 files changed, 19 insertions(+), 10 deletions(-)

diff --git 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieErrorTableConfig.java
 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieErrorTableConfig.java
index 8ba013b00ee..1db8f2c4b5f 100644
--- 
a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieErrorTableConfig.java
+++ 
b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieErrorTableConfig.java
@@ -21,6 +21,7 @@ package org.apache.hudi.config;
 import org.apache.hudi.common.config.ConfigClassProperty;
 import org.apache.hudi.common.config.ConfigGroups;
 import org.apache.hudi.common.config.ConfigProperty;
+import org.apache.hudi.common.config.HoodieConfig;
 
 import javax.annotation.concurrent.Immutable;
 
@@ -30,7 +31,7 @@ import java.util.Arrays;
 @ConfigClassProperty(name = "Error table Configs",
 groupName = ConfigGroups.Names.WRITE_CLIENT,
 description = "Configurations that are required for Error table configs")
-public class HoodieErrorTableConfig {
+public class HoodieErrorTableConfig extends HoodieConfig {
   public static final ConfigProperty ERROR_TABLE_ENABLED = 
ConfigProperty
   .key("hoodie.errortable.enable")
   .defaultValue(false)
diff --git 
a/hudi-common/src/main/java/org/apache/hudi/common/config/ConfigGroups.java 
b/hudi-common/src/main/java/org/apache/hudi/common/config/ConfigGroups.java
index 18d28ab6275..5bab6f9aeb3 100644
--- a/hudi-common/src/main/java/org/apache/hudi/common/config/ConfigGroups.java
+++ b/hudi-common/src/main/java/org/apache/hudi/common/config/ConfigGroups.java
@@ -30,6 +30,7 @@ public class ConfigGroups {
* {@link ConfigGroups#getDescription}.
*/
   public enum Names {
+TABLE_CONFIG("Hudi Table Config"),
 ENVIRONMENT_CONFIG("Environment Config"),
 SPARK_DATASOURCE("Spark Datasource Configs"),
 FLINK_SQL("Flink Sql Configs"),
@@ -98,6 +99,9 @@ public class ConfigGroups {
   public static String getDescription(Names names) {
 String description;
 switch (names) {
+  case TABLE_CONFIG:
+description = "Basic Hudi Table configuration parameters.";
+break;
   case ENVIRONMENT_CONFIG:
 description = "Hudi supports passing configurations via a 
configuration file "
 + "`hudi-default.conf` in which each line consists of a key and a 
value "
diff --git 
a/hudi-common/src/main/java/org/apache/hudi/common/config/TimestampKeyGeneratorConfig.java
 
b/hudi-common/src/main/java/org/apache/hudi/common/config/TimestampKeyGeneratorConfig.java
index 7098c076279..46b66371b31 100644
--- 
a/hudi-common/src/main/java/org/apache/hudi/common/config/TimestampKeyGeneratorConfig.java
+++ 
b/hudi-common/src/main/java/org/apache/hudi/common/config/TimestampKeyGeneratorConfig.java
@@ -31,7 +31,7 @@ import java.util.concurrent.TimeUnit;
 + "the partition field. The field values are interpreted as timestamps 
and not just "
 + "converted to string while generating partition path value for 
records. Record key is "
 + "same as before where it is chosen by field name.")
-public class TimestampKeyGeneratorConfig {
+public class TimestampKeyGeneratorConfig extends HoodieConfig {
   private static final String TIMESTAMP_KEYGEN_CONFIG_PREFIX = 
"hoodie.keygen.timebased.";
   @Deprecated
   private static final String OLD_TIMESTAMP_KEYGEN_CONFIG_PREFIX = 
"hoodie.deltastreamer.keygen.timebased.";
diff --git 
a/hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java 
b/hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java
index 78ef425a1d6..9cf3e538fd6 100644
--- 
a/hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java
+++ 
b/hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java
@@ -19,6 +19,8 @@
 package org.apache.hudi.common.table;
 
 import 

Re: [PR] [MINOR] Added configurations of Hudi table, file-based SQL source, Hudi error table, and timestamp key generator to configuration listing [hudi]

2024-04-19 Thread via GitHub


danny0405 merged PR #11057:
URL: https://github.com/apache/hudi/pull/11057


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (HUDI-7515) Fix partition metadata write failure

2024-04-19 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen closed HUDI-7515.

Resolution: Fixed

Fixed via master branch: 7a44b1ebc41ce66621e958df22195524373434c1

> Fix partition metadata write failure
> 
>
> Key: HUDI-7515
> URL: https://issues.apache.org/jira/browse/HUDI-7515
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Wechar
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
> Attachments: screenshot-1.png
>
>
> Avoid failing to write partition metadata. When spark.speculation is enabled, 
> if the write metadata operation become slow for some reason, a speculative 
> will be started to write the same metadata file concurrently.
> In HDFS, two tasks(like one is speculate task) writing to the same file could 
> both throw exception like so:
> {code:bash}
> File does not exist: 
> /path/to/table/a=3519/b=3520/c=3521/.hoodie_partition_metadata_112 (inode 
> 48415575374) Holder DFSClient_NONMAPREDUCE_-2108606624_29 does not have any 
> open files.
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-19 Thread via GitHub


danny0405 merged PR #10886:
URL: https://github.com/apache/hudi/pull/10886


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch master updated (7ac26bce3f3 -> 7a44b1ebc41)

2024-04-19 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


from 7ac26bce3f3 [HUDI-7643] Fix test by using the right StreamSync 
constructor (#11056)
 add 7a44b1ebc41 [HUDI-7515] Fix partition metadata write failure (#10886)

No new revisions were added by this update.

Summary of changes:
 .../apache/hudi/cli/commands/RepairsCommand.java   |  4 +-
 .../org/apache/hudi/io/HoodieAppendHandle.java |  2 +-
 .../org/apache/hudi/io/HoodieCreateHandle.java |  2 +-
 .../java/org/apache/hudi/io/HoodieMergeHandle.java |  2 +-
 .../io/storage/row/HoodieRowDataCreateHandle.java  |  2 +-
 .../hudi/io/storage/row/HoodieRowCreateHandle.java |  2 +-
 .../hudi/common/model/HoodiePartitionMetadata.java | 80 --
 .../table/timeline/HoodieActiveTimeline.java   | 12 +---
 .../common/model/TestHoodiePartitionMetadata.java  |  2 +-
 .../common/testutils/HoodieTestDataGenerator.java  |  5 +-
 .../hudi/common/util/TestTablePathUtils.java   |  4 +-
 .../hudi/hadoop/testutils/InputFormatTestUtil.java |  2 +-
 .../org/apache/hudi/storage/HoodieStorage.java |  2 +-
 .../AlterHoodieTableAddPartitionCommand.scala  |  2 +-
 .../RepairAddpartitionmetaProcedure.scala  |  2 +-
 .../RepairMigratePartitionMetaProcedure.scala  |  2 +-
 16 files changed, 62 insertions(+), 65 deletions(-)



[jira] [Updated] (HUDI-7507) ongoing concurrent writers with smaller timestamp can cause issues with table services

2024-04-19 Thread Krishen Bhan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krishen Bhan updated HUDI-7507:
---
Description: 
*Scenarios:*

Although HUDI operations hold a table lock when creating a .requested instant, 
because HUDI writers do not generate a timestamp and create a .requsted plan in 
the same transaction, there can be a scenario where 
 # Job 1 starts, chooses timestamp (x) , Job 2 starts and chooses timestamp (x 
- 1)
 # Job 1 schedules and creates requested file with instant timestamp (x)
 # Job 2 schedules and creates requested file with instant timestamp (x-1)
 # Both jobs continue running

If one job is writing a commit and the other is a table service, this can cause 
issues:
 * 
 ** If Job 2 is ingestion commit and Job 1 is compaction/log compaction, then 
when Job 1 runs before Job 2 and can create a compaction plan for all instant 
times (up to (x) ) that doesn’t include instant time (x-1) .  Later Job 2 will 
create instant time (x-1), but timeline will be in a corrupted state since 
compaction plan was supposed to include (x-1)
 ** There is a similar issue with clean. If Job2 is a long-running commit (that 
was stuck/delayed for a while before creating its .requested plan) and Job 1 is 
a clean, then Job 1 can perform a clean that updates the 
earliest-commit-to-retain without waiting for the inflight instant by Job 2 at 
(x-1) to complete. This causes Job2 to be "skipped" by clean.

[Edit] I added a diagram to visualize the issue, specifically the second 
scenario with clean

!Flowchart (1).png!

*Proposed approach:*

One way this can be resolved is by combining the operations of generating 
instant time and creating a requested file in the same HUDI table transaction. 
Specifically, executing the following steps whenever any instant (commit, table 
service, etc) is scheduled

Approach A
 # Acquire table lock
 # Look at the latest instant C on the active timeline (completed or not). 
Generate a timestamp after C
 # Create the plan and requested file using this new timestamp ( that is 
greater than C)
 # Release table lock

Unfortunately (A) has the following drawbacks
 * Every operation must now hold the table lock when computing its plan even if 
it's an expensive operation and will take a while
 * Users of HUDI cannot easily set their own instant time of an operation, and 
this restriction would break any public APIs that allow this and would require 
deprecating those APIs.

 

An alternate approach is to have every operation abort creating a .requested 
file unless it has the latest timestamp. Specifically, for any instant type, 
whenever an operation is about to create a .requested plan on timeline, it 
should take the table lock and assert that there are no other instants on 
timeline that are greater than it that could cause a conflict. If that 
assertion fails, then throw a retry-able conflict resolution exception.

Specifically, the following steps should be followed whenever any instant 
(commit, table service, etc) is scheduled

Approach B
 # Acquire table lock. Assume that the desired instant time C and requested 
file plan metadata have already been created, regardless of wether it was 
before this step or right after acquiring the table lock.
 # Get the set of all instants on the timeline that are greater than C 
(regardless of their action or sate status). 
 ## If the current operation is an ingestion type 
(commit/deltacommit/insert_overwrite replace) then assert the set is empty
 ## If the current operation is a table service then assert that the set 
doesn't contain any table service instant types
 # Create requested plan on timeline (As usual)
 # Release table

Unlike (A), this approach (B) allows users to continue to use HUDI APIs where 
caller can specify instant time (preventing the need from deprecating any 
public API). It also allows the possibility of table service operations 
computing their plan without holding a lock. Despite this though, (B) has 
following drawbacks
 * It is not immediately clear how MDT vs base table operations should be 
handled here. At first glance it seems that at step (2) both the base table and 
MDT timeline should be checked, but that might need more investigation to 
confirm.
 * This error will still be thrown even for combinations of concurrent 
operations where it would be safe to continue. For example, assume two 
ingestion writers being executing on a dataset, with each only performing a 
insert commit on the dataset (with no table service being scheduled). If the 
writer that started scheduling later ending up having an earlier timestamp, it 
would still be safe for it to continue. Despite that, because of step (2.1)  it 
would still have to abort an throw an error. This means that on datasets with 
many frequent concurrent ingestion commits and very infrequent table service 
operations, there would be a lot of transient failures/noise 

[jira] [Updated] (HUDI-7641) Add metrics to track what partitions are enabled in MDT

2024-04-19 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-7641:
--
Fix Version/s: 0.15.0

> Add metrics to track what partitions are enabled in MDT
> ---
>
> Key: HUDI-7641
> URL: https://issues.apache.org/jira/browse/HUDI-7641
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: metadata
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-7641) Add metrics to track what partitions are enabled in MDT

2024-04-19 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reassigned HUDI-7641:
-

Assignee: sivabalan narayanan

> Add metrics to track what partitions are enabled in MDT
> ---
>
> Key: HUDI-7641
> URL: https://issues.apache.org/jira/browse/HUDI-7641
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: metadata
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7641] Adding metadata enablement metrics [hudi]

2024-04-19 Thread via GitHub


hudi-bot commented on PR #11053:
URL: https://github.com/apache/hudi/pull/11053#issuecomment-2067283377

   
   ## CI report:
   
   * 59d53ba7cc510038dcaea707f434ebc5529b29dc UNKNOWN
   * ab39be7b9f1d7bf9de4b69640dce50105b6d9147 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23371)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7641] Adding metadata enablement metrics [hudi]

2024-04-19 Thread via GitHub


hudi-bot commented on PR #11053:
URL: https://github.com/apache/hudi/pull/11053#issuecomment-2067215979

   
   ## CI report:
   
   * 7d5d983e224a82cbb8b4253f8d9b374cd9e4b9aa Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23370)
 
   * 59d53ba7cc510038dcaea707f434ebc5529b29dc UNKNOWN
   * ab39be7b9f1d7bf9de4b69640dce50105b6d9147 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23371)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7641] Adding metadata enablement metrics [hudi]

2024-04-19 Thread via GitHub


hudi-bot commented on PR #11053:
URL: https://github.com/apache/hudi/pull/11053#issuecomment-2067161180

   
   ## CI report:
   
   * 3f7d727e83f05cb5ce7f9a3da2bfffca72686345 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23359)
 
   * 7d5d983e224a82cbb8b4253f8d9b374cd9e4b9aa Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23370)
 
   * 59d53ba7cc510038dcaea707f434ebc5529b29dc UNKNOWN
   * ab39be7b9f1d7bf9de4b69640dce50105b6d9147 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23371)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7641] Adding metadata enablement metrics [hudi]

2024-04-19 Thread via GitHub


hudi-bot commented on PR #11053:
URL: https://github.com/apache/hudi/pull/11053#issuecomment-2067152702

   
   ## CI report:
   
   * 3f7d727e83f05cb5ce7f9a3da2bfffca72686345 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23359)
 
   * 7d5d983e224a82cbb8b4253f8d9b374cd9e4b9aa Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23370)
 
   * 59d53ba7cc510038dcaea707f434ebc5529b29dc UNKNOWN
   * ab39be7b9f1d7bf9de4b69640dce50105b6d9147 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7641] Adding metadata enablement metrics [hudi]

2024-04-19 Thread via GitHub


hudi-bot commented on PR #11053:
URL: https://github.com/apache/hudi/pull/11053#issuecomment-2067143554

   
   ## CI report:
   
   * 3f7d727e83f05cb5ce7f9a3da2bfffca72686345 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23359)
 
   * 7d5d983e224a82cbb8b4253f8d9b374cd9e4b9aa Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23370)
 
   * 59d53ba7cc510038dcaea707f434ebc5529b29dc UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7641] Adding metadata enablement metrics [hudi]

2024-04-19 Thread via GitHub


hudi-bot commented on PR #11053:
URL: https://github.com/apache/hudi/pull/11053#issuecomment-2067081207

   
   ## CI report:
   
   * 3f7d727e83f05cb5ce7f9a3da2bfffca72686345 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23359)
 
   * 7d5d983e224a82cbb8b4253f8d9b374cd9e4b9aa Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23370)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7641] Adding metadata enablement metrics [hudi]

2024-04-19 Thread via GitHub


hudi-bot commented on PR #11053:
URL: https://github.com/apache/hudi/pull/11053#issuecomment-2067071879

   
   ## CI report:
   
   * 3f7d727e83f05cb5ce7f9a3da2bfffca72686345 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23359)
 
   * 7d5d983e224a82cbb8b4253f8d9b374cd9e4b9aa UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] HudiDeltaStreaming consuming from Kafka - can't see the Kafka Consumer Group in Kafka [hudi]

2024-04-19 Thread via GitHub


mattssll closed issue #11051: [SUPPORT] HudiDeltaStreaming consuming from Kafka 
- can't see the Kafka Consumer Group in Kafka
URL: https://github.com/apache/hudi/issues/11051


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Added configurations of Hudi table, file-based SQL source, Hudi error table, and timestamp key generator to configuration listing [hudi]

2024-04-19 Thread via GitHub


hudi-bot commented on PR #11057:
URL: https://github.com/apache/hudi/pull/11057#issuecomment-2066570835

   
   ## CI report:
   
   * d8ab7259a5c4825d6634eebb8610ca072abb4a05 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23369)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-19 Thread via GitHub


hudi-bot commented on PR #10886:
URL: https://github.com/apache/hudi/pull/10886#issuecomment-2066536128

   
   ## CI report:
   
   * cc4c48076b11d9a97fdb7fd0f6f0a5253d530ff1 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23368)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Added configurations of Hudi table, file-based SQL source, Hudi error table, and timestamp key generator to configuration listing [hudi]

2024-04-19 Thread via GitHub


hudi-bot commented on PR #11057:
URL: https://github.com/apache/hudi/pull/11057#issuecomment-2066453700

   
   ## CI report:
   
   * d8ab7259a5c4825d6634eebb8610ca072abb4a05 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23369)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [MINOR] Added configurations of Hudi table, file-based SQL source, Hudi error table, and timestamp key generator to configuration listing [hudi]

2024-04-19 Thread via GitHub


hudi-bot commented on PR #11057:
URL: https://github.com/apache/hudi/pull/11057#issuecomment-2066442267

   
   ## CI report:
   
   * d8ab7259a5c4825d6634eebb8610ca072abb4a05 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] If Sanitastiion Enabled In HudiStreamer It is taking too much time [hudi]

2024-04-19 Thread via GitHub


Amar1404 commented on issue #10466:
URL: https://github.com/apache/hudi/issues/10466#issuecomment-2066439568

   hi @ad1happy2go  - Any updates on this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] HudiDeltaStreaming consuming from Kafka - can't see the Kafka Consumer Group in Kafka [hudi]

2024-04-19 Thread via GitHub


Amar1404 commented on issue #11051:
URL: https://github.com/apache/hudi/issues/11051#issuecomment-2066436408

   Hi @mattssll - The hudiDeltaStream use the concept of checkpoint here 
instead of consumer group.
   Here in hoodie commit file it will store the last offset from each partiion 
it is read upto.
   Something like 
   topic.0:26228942,1:26231665,2:26218546,3:26229200,4:26220648,5:26226357
   So when you read again the Kafka read after these offset from the parition 
and match it with existing partition


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [DOCS] [MINOR] Added configurations of Hudi table, file-based SQL source, Hudi error table, and timestamp key generator to configuration listing [hudi]

2024-04-19 Thread via GitHub


geserdugarov opened a new pull request, #11058:
URL: https://github.com/apache/hudi/pull/11058

   ### Change Logs
   
   Updates to the docs. Should be merged only if [MR 
11057](https://github.com/apache/hudi/pull/11057) with corresponding code 
changes will be merged.
   Not all configurations are presented on [All 
configurations](https://hudi.apache.org/docs/configurations) page. This MR adds 
list of basic Hudi table configurations, and also some missed configurations of 
file-based SQL source, Hudi error table, and timestamp key generator.
   
   ### Impact
   
   Last updates to the `current` version of the docs.
   
   ### Risk level (write none, low medium or high below)
   
   None
   
   ### Contributor's checklist
   
   - [x] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [x] Change Logs and Impact were stated clearly
   - [x] Adequate tests were added if applicable
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [MINOR] Added configurations of Hudi table, file-based SQL source, Hudi error table, and timestamp key generator to configuration listing [hudi]

2024-04-19 Thread via GitHub


geserdugarov opened a new pull request, #11057:
URL: https://github.com/apache/hudi/pull/11057

   ### Change Logs
   
   Not all configurations are presented on [All 
configurations](https://hudi.apache.org/docs/configurations) page. This MR adds 
list of basic Hudi table configurations, and also some missed configurations of 
file-based SQL source, Hudi error table, and timestamp key generator.
   
   ### Impact
   
   Change only the list of all configurations on Hudi site.
   
   ### Risk level (write none, low medium or high below)
   
   None
   
   ### Documentation Update
   
   I will open corresponding MR to the `asf-site` branch.
   
   ### Contributor's checklist
   
   - [x] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [x] Change Logs and Impact were stated clearly
   - [x] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-19 Thread via GitHub


hudi-bot commented on PR #10886:
URL: https://github.com/apache/hudi/pull/10886#issuecomment-2066365873

   
   ## CI report:
   
   * af5d107b867fd97362710bc032a95743eb5d33a8 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23362)
 
   * cc4c48076b11d9a97fdb7fd0f6f0a5253d530ff1 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23368)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-19 Thread via GitHub


hudi-bot commented on PR #10886:
URL: https://github.com/apache/hudi/pull/10886#issuecomment-2066355763

   
   ## CI report:
   
   * af5d107b867fd97362710bc032a95743eb5d33a8 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23362)
 
   * cc4c48076b11d9a97fdb7fd0f6f0a5253d530ff1 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7628] Rename FSUtils.getPartitionPath to constructAbsolutePath [hudi]

2024-04-19 Thread via GitHub


hudi-bot commented on PR #11054:
URL: https://github.com/apache/hudi/pull/11054#issuecomment-2066344280

   
   ## CI report:
   
   * 0c8df7c7d066c2115e4a04dabb10fd14a54e40d2 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23364)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-19 Thread via GitHub


danny0405 commented on PR #10886:
URL: https://github.com/apache/hudi/pull/10886#issuecomment-2066283997

   You can rebase with the latest master now to resolve the compile error.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Closed] (HUDI-7643) Fix TestStreamSyncUnitTests

2024-04-19 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen closed HUDI-7643.

Resolution: Fixed

Fixed via master branch: 7ac26bce3f3aad2c9aebeb55febc4375c4f7bd1d

> Fix TestStreamSyncUnitTests
> ---
>
> Key: HUDI-7643
> URL: https://issues.apache.org/jira/browse/HUDI-7643
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> Use the right StreamSync constructor
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7643) Fix TestStreamSyncUnitTests

2024-04-19 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-7643:
-
Fix Version/s: 0.15.0
   1.0.0

> Fix TestStreamSyncUnitTests
> ---
>
> Key: HUDI-7643
> URL: https://issues.apache.org/jira/browse/HUDI-7643
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0, 1.0.0
>
>
> Use the right StreamSync constructor
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-19 Thread via GitHub


wecharyu commented on code in PR #10886:
URL: https://github.com/apache/hudi/pull/10886#discussion_r1572163511


##
hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java:
##
@@ -94,36 +96,32 @@ public int getPartitionDepth() {
   /**
* Write the metadata safely into partition atomically.
*/
-  public void trySave(int taskPartitionId) {
+  public void trySave() throws HoodieIOException {
 String extension = getMetafileExtension();
-StoragePath tmpMetaPath =
-new StoragePath(partitionPath, 
HoodiePartitionMetadata.HOODIE_PARTITION_METAFILE_PREFIX + "_" + 
taskPartitionId + extension);
-StoragePath metaPath = new StoragePath(partitionPath, 
HoodiePartitionMetadata.HOODIE_PARTITION_METAFILE_PREFIX + extension);
-boolean metafileExists = false;
+StoragePath metaPath = new StoragePath(
+partitionPath, 
HoodiePartitionMetadata.HOODIE_PARTITION_METAFILE_PREFIX + extension);
 
-try {
-  metafileExists = storage.exists(metaPath);
-  if (!metafileExists) {
-// write to temporary file
-writeMetafile(tmpMetaPath);
-// move to actual path
-storage.rename(tmpMetaPath, metaPath);
-  }
-} catch (IOException ioe) {
-  LOG.warn("Error trying to save partition metadata (this is okay, as long 
as at least 1 of these succeeded), "
-  + partitionPath, ioe);
-} finally {
-  if (!metafileExists) {
-try {
-  // clean up tmp file, if still lying around
-  if (storage.exists(tmpMetaPath)) {
-storage.deleteFile(tmpMetaPath);
+// This retry mechanism enables an exit-fast in metaPath exists check, 
which avoid the
+// tasks failures when there are two or more tasks trying to create the 
same metaPath.
+RetryHelper  retryHelper = new RetryHelper(1000, 
3, 1000, HoodieIOException.class.getName())
+.tryWith(() -> {
+  if (!storage.exists(metaPath)) {
+if (format.isPresent()) {
+  StoragePath tmpMetaPath = new StoragePath(

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch master updated: [HUDI-7643] Fix test by using the right StreamSync constructor (#11056)

2024-04-19 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new 7ac26bce3f3 [HUDI-7643] Fix test by using the right StreamSync 
constructor (#11056)
7ac26bce3f3 is described below

commit 7ac26bce3f3aad2c9aebeb55febc4375c4f7bd1d
Author: Sagar Sumit 
AuthorDate: Fri Apr 19 15:55:46 2024 +0530

[HUDI-7643] Fix test by using the right StreamSync constructor (#11056)
---
 .../org/apache/hudi/utilities/streamer/TestStreamSyncUnitTests.java   | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git 
a/hudi-utilities/src/test/java/org/apache/hudi/utilities/streamer/TestStreamSyncUnitTests.java
 
b/hudi-utilities/src/test/java/org/apache/hudi/utilities/streamer/TestStreamSyncUnitTests.java
index 8ff5b6ee933..fe775f95a36 100644
--- 
a/hudi-utilities/src/test/java/org/apache/hudi/utilities/streamer/TestStreamSyncUnitTests.java
+++ 
b/hudi-utilities/src/test/java/org/apache/hudi/utilities/streamer/TestStreamSyncUnitTests.java
@@ -141,7 +141,7 @@ public class TestStreamSyncUnitTests {
   @MethodSource("getCheckpointToResumeCases")
   void testGetCheckpointToResume(HoodieStreamer.Config cfg, 
HoodieCommitMetadata commitMetadata, Option expectedResumeCheckpoint) 
throws IOException {
 HoodieSparkEngineContext hoodieSparkEngineContext = 
mock(HoodieSparkEngineContext.class);
-FileSystem fs = mock(FileSystem.class);
+HoodieStorage storage = 
HoodieStorageUtils.getStorage(mock(FileSystem.class));
 TypedProperties props = new TypedProperties();
 SparkSession sparkSession = mock(SparkSession.class);
 Configuration configuration = mock(Configuration.class);
@@ -152,7 +152,7 @@ public class TestStreamSyncUnitTests {
 when(commitsTimeline.lastInstant()).thenReturn(Option.of(hoodieInstant));
 
 StreamSync streamSync = new StreamSync(cfg, sparkSession, props, 
hoodieSparkEngineContext,
-fs, configuration, client -> true, 
null,Option.empty(),null,Option.empty(),true,true);
+storage, configuration, client -> true, 
null,Option.empty(),null,Option.empty(),true,true);
 StreamSync spy = spy(streamSync);
 
doReturn(Option.of(commitMetadata)).when(spy).getLatestCommitMetadataWithValidCheckpointInfo(any());
 



Re: [PR] [HUDI-7643] Fix test by using the right StreamSync constructor [hudi]

2024-04-19 Thread via GitHub


danny0405 merged PR #11056:
URL: https://github.com/apache/hudi/pull/11056


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT]Data Loss Issue with Hudi Table After 3 Days of Continuous Writes [hudi]

2024-04-19 Thread via GitHub


danny0405 commented on issue #11016:
URL: https://github.com/apache/hudi/issues/11016#issuecomment-2066279351

   > but the issue is that we can't access older data.
   
   If you table is ingested in streaming `upsert`, then you just specify the 
`read.start-commit` as the first commit instant time on the timeline, and skip 
the compaction. Only  instant that has not been cleaned can be consumed.
   
   It actually depends on how you write the history dataset, because 
`bulk_insert` does not guarantee the payload sequence of one key, so if the 
table is boostraped with `bulk_insert`, the only way is to consume from 
`earliest`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7629] Safely rename HoodieFileStatus [hudi]

2024-04-19 Thread via GitHub


hudi-bot commented on PR #11055:
URL: https://github.com/apache/hudi/pull/11055#issuecomment-2066274445

   
   ## CI report:
   
   * 8957421b837bb5471701724e47ae908e0c0655fb Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23365)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7643] Fix test by using the right StreamSync constructor [hudi]

2024-04-19 Thread via GitHub


hudi-bot commented on PR #11056:
URL: https://github.com/apache/hudi/pull/11056#issuecomment-2066274503

   
   ## CI report:
   
   * 0dd750741922a580a0c8bee13996f7583f5b98c0 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23366)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-19 Thread via GitHub


danny0405 commented on PR #10886:
URL: https://github.com/apache/hudi/pull/10886#issuecomment-2066274001

   The master build is broken and here is the fix: 
https://github.com/apache/hudi/pull/11056, you may need to await for this patch 
and rebase with the latest master again~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7643] Fix test by using the right StreamSync constructor [hudi]

2024-04-19 Thread via GitHub


hudi-bot commented on PR #11056:
URL: https://github.com/apache/hudi/pull/11056#issuecomment-2066263288

   
   ## CI report:
   
   * 0dd750741922a580a0c8bee13996f7583f5b98c0 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7618] Add ability to ignore checkpoints in delta streamer [hudi]

2024-04-19 Thread via GitHub


danny0405 commented on PR #11018:
URL: https://github.com/apache/hudi/pull/11018#issuecomment-2066267766

   Hey, the master compile got crush with this patch, can you take care of it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7628] Rename FSUtils.getPartitionPath to constructAbsolutePath [hudi]

2024-04-19 Thread via GitHub


hudi-bot commented on PR #11054:
URL: https://github.com/apache/hudi/pull/11054#issuecomment-2066251007

   
   ## CI report:
   
   * 0d5781211cbe9977838db3ee7134bc473b6110aa Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23363)
 
   * 0c8df7c7d066c2115e4a04dabb10fd14a54e40d2 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23364)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-19 Thread via GitHub


hudi-bot commented on PR #10886:
URL: https://github.com/apache/hudi/pull/10886#issuecomment-2066250342

   
   ## CI report:
   
   * af5d107b867fd97362710bc032a95743eb5d33a8 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23362)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7629] Safely rename HoodieFileStatus [hudi]

2024-04-19 Thread via GitHub


wombatu-kun commented on PR #11055:
URL: https://github.com/apache/hudi/pull/11055#issuecomment-2066223372

   i thought the decision was already made as the task 
https://issues.apache.org/jira/browse/HUDI-7629 was created.  
   @yihua @vinothchandar @danny0405  could you please make a decision 
collectively?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [HUDI-7643] Fix test by using the right StreamSync constructor [hudi]

2024-04-19 Thread via GitHub


codope opened a new pull request, #11056:
URL: https://github.com/apache/hudi/pull/11056

   ### Change Logs
   
   `StreamSync` constructor changed after `HoodieStorage` abstraction was 
introduced and the commit 
https://github.com/apache/hudi/commit/ca77fda51fe3036f86d4ddb8b0e58a2f160882dc 
was merged without rebasing. So, the master is broken. 
   
   ### Impact
   
   Fix test and build on master.
   
   ### Risk level (write none, low medium or high below)
   
   none
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change. If not, put "none"._
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7643) Fix TestStreamSyncUnitTests

2024-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7643:
-
Labels: pull-request-available  (was: )

> Fix TestStreamSyncUnitTests
> ---
>
> Key: HUDI-7643
> URL: https://issues.apache.org/jira/browse/HUDI-7643
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Sagar Sumit
>Priority: Major
>  Labels: pull-request-available
>
> Use the right StreamSync constructor
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-7643) Fix TestStreamSyncUnitTests

2024-04-19 Thread Sagar Sumit (Jira)
Sagar Sumit created HUDI-7643:
-

 Summary: Fix TestStreamSyncUnitTests
 Key: HUDI-7643
 URL: https://issues.apache.org/jira/browse/HUDI-7643
 Project: Apache Hudi
  Issue Type: Task
Reporter: Sagar Sumit


Use the right StreamSync constructor

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7629] Safely rename HoodieFileStatus [hudi]

2024-04-19 Thread via GitHub


hudi-bot commented on PR #11055:
URL: https://github.com/apache/hudi/pull/11055#issuecomment-2066178297

   
   ## CI report:
   
   * 8957421b837bb5471701724e47ae908e0c0655fb Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23365)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7628] Rename FSUtils.getPartitionPath to constructAbsolutePath [hudi]

2024-04-19 Thread via GitHub


hudi-bot commented on PR #11054:
URL: https://github.com/apache/hudi/pull/11054#issuecomment-2066178227

   
   ## CI report:
   
   * 0d5781211cbe9977838db3ee7134bc473b6110aa Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23363)
 
   * 0c8df7c7d066c2115e4a04dabb10fd14a54e40d2 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23364)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7628] Rename FSUtils.getPartitionPath to constructAbsolutePath [hudi]

2024-04-19 Thread via GitHub


hudi-bot commented on PR #11054:
URL: https://github.com/apache/hudi/pull/11054#issuecomment-2066165313

   
   ## CI report:
   
   * 0d5781211cbe9977838db3ee7134bc473b6110aa Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23363)
 
   * 0c8df7c7d066c2115e4a04dabb10fd14a54e40d2 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7629] Safely rename HoodieFileStatus [hudi]

2024-04-19 Thread via GitHub


hudi-bot commented on PR #11055:
URL: https://github.com/apache/hudi/pull/11055#issuecomment-2066165390

   
   ## CI report:
   
   * 8957421b837bb5471701724e47ae908e0c0655fb UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-19 Thread via GitHub


danny0405 commented on code in PR #10886:
URL: https://github.com/apache/hudi/pull/10886#discussion_r1572070120


##
hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java:
##
@@ -94,36 +96,32 @@ public int getPartitionDepth() {
   /**
* Write the metadata safely into partition atomically.
*/
-  public void trySave(int taskPartitionId) {
+  public void trySave() throws HoodieIOException {
 String extension = getMetafileExtension();
-StoragePath tmpMetaPath =
-new StoragePath(partitionPath, 
HoodiePartitionMetadata.HOODIE_PARTITION_METAFILE_PREFIX + "_" + 
taskPartitionId + extension);
-StoragePath metaPath = new StoragePath(partitionPath, 
HoodiePartitionMetadata.HOODIE_PARTITION_METAFILE_PREFIX + extension);
-boolean metafileExists = false;
+StoragePath metaPath = new StoragePath(
+partitionPath, 
HoodiePartitionMetadata.HOODIE_PARTITION_METAFILE_PREFIX + extension);
 
-try {
-  metafileExists = storage.exists(metaPath);
-  if (!metafileExists) {
-// write to temporary file
-writeMetafile(tmpMetaPath);
-// move to actual path
-storage.rename(tmpMetaPath, metaPath);
-  }
-} catch (IOException ioe) {
-  LOG.warn("Error trying to save partition metadata (this is okay, as long 
as at least 1 of these succeeded), "
-  + partitionPath, ioe);
-} finally {
-  if (!metafileExists) {
-try {
-  // clean up tmp file, if still lying around
-  if (storage.exists(tmpMetaPath)) {
-storage.deleteFile(tmpMetaPath);
+// This retry mechanism enables an exit-fast in metaPath exists check, 
which avoid the
+// tasks failures when there are two or more tasks trying to create the 
same metaPath.
+RetryHelper  retryHelper = new RetryHelper(1000, 
3, 1000, HoodieIOException.class.getName())
+.tryWith(() -> {
+  if (!storage.exists(metaPath)) {
+if (format.isPresent()) {
+  StoragePath tmpMetaPath = new StoragePath(

Review Comment:
   We can move the `tmpMetaPath` into `writeMetafileInFormat`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7629) Safely rename HoodieFileStatus

2024-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7629:
-
Labels: hoodie-storage pull-request-available  (was: hoodie-storage)

> Safely rename HoodieFileStatus
> --
>
> Key: HUDI-7629
> URL: https://issues.apache.org/jira/browse/HUDI-7629
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Vova Kolmakov
>Priority: Major
>  Labels: hoodie-storage, pull-request-available
> Fix For: 1.0.0
>
>
> [https://github.com/apache/hudi/pull/10591#discussion_r1484912753]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [HUDI-7629] Safely rename HoodieFileStatus [hudi]

2024-04-19 Thread via GitHub


wombatu-kun opened a new pull request, #11055:
URL: https://github.com/apache/hudi/pull/11055

   ### Change Logs
   
   Renamed `HoodieFileStatus` to `StorageLocationInfo`: 
https://github.com/apache/hudi/pull/10591#discussion_r1484912753
   
   ### Impact
   
   none
   
   ### Risk level (write none, low medium or high below)
   
   none
   
   ### Documentation Update
   
   none
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-7642) Compact MOR tables with operation fields cause data errors

2024-04-19 Thread Zeyu Wang (Jira)
Zeyu Wang created HUDI-7642:
---

 Summary: Compact MOR tables with operation fields cause data errors
 Key: HUDI-7642
 URL: https://issues.apache.org/jira/browse/HUDI-7642
 Project: Apache Hudi
  Issue Type: Bug
Reporter: Zeyu Wang


When we compact an MOR table who with _hoodie_operation field, the hoodiekey 
tagged with operation "-D" was not correctly removed.

Refer to previous discussions 
(https://github.com/apache/hudi/pull/8721#issuecomment-1736629662) we should 
keep flink engine for the delete record, And also repair the spark in the 
https://github.com/apache/hudi/pull/10219 engine problems when reading data, 
should repair caused by compact problem now. Because of the 'compact' directly 
using the  HoodieMergedLogRecordScanner in the common module, I think we have 
to add some optional configuration to control whether or not the 
HoodieMergedLogRecordScanner directly delete the key that taged with "-D" 
operation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] [HUDI-7628] Rename FSUtils.getPartitionPath to constructAbsolutePath [hudi]

2024-04-19 Thread via GitHub


hudi-bot commented on PR #11054:
URL: https://github.com/apache/hudi/pull/11054#issuecomment-2066077903

   
   ## CI report:
   
   * 0d5781211cbe9977838db3ee7134bc473b6110aa Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23363)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-19 Thread via GitHub


hudi-bot commented on PR #10886:
URL: https://github.com/apache/hudi/pull/10886#issuecomment-2066077325

   
   ## CI report:
   
   * 7b04755aa308766f3b0f0d5292ed9476630da90d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23357)
 
   * af5d107b867fd97362710bc032a95743eb5d33a8 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23362)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7628] Rename FSUtils.getPartitionPath to constructAbsolutePath [hudi]

2024-04-19 Thread via GitHub


hudi-bot commented on PR #11054:
URL: https://github.com/apache/hudi/pull/11054#issuecomment-2066065953

   
   ## CI report:
   
   * 0d5781211cbe9977838db3ee7134bc473b6110aa UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-19 Thread via GitHub


hudi-bot commented on PR #10886:
URL: https://github.com/apache/hudi/pull/10886#issuecomment-2066065344

   
   ## CI report:
   
   * 7b04755aa308766f3b0f0d5292ed9476630da90d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23357)
 
   * af5d107b867fd97362710bc032a95743eb5d33a8 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7629) Safely rename HoodieFileStatus

2024-04-19 Thread Vova Kolmakov (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vova Kolmakov updated HUDI-7629:

Status: In Progress  (was: Open)

> Safely rename HoodieFileStatus
> --
>
> Key: HUDI-7629
> URL: https://issues.apache.org/jira/browse/HUDI-7629
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Vova Kolmakov
>Priority: Major
>  Labels: hoodie-storage
> Fix For: 1.0.0
>
>
> [https://github.com/apache/hudi/pull/10591#discussion_r1484912753]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-7628) Rename FSUtils.getPartitionPath to constructAbsolutePath

2024-04-19 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-7628:
-
Labels: hoodie-storage pull-request-available  (was: hoodie-storage)

> Rename FSUtils.getPartitionPath to constructAbsolutePath
> 
>
> Key: HUDI-7628
> URL: https://issues.apache.org/jira/browse/HUDI-7628
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Vova Kolmakov
>Priority: Major
>  Labels: hoodie-storage, pull-request-available
> Fix For: 1.0.0
>
>
> [https://github.com/apache/hudi/pull/10591#discussion_r1483632718]
> Rename FSUtils.getPartitionPath to constructAbsolutePath and partitionPath 
> argument to relativePartitionPath so that the naming reflects the 
> functionality.  This has to be merged after HUDI-6497 and the above PR to 
> reduce merging conflicts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[PR] [HUDI-7628] Rename FSUtils.getPartitionPath to constructAbsolutePath [hudi]

2024-04-19 Thread via GitHub


wombatu-kun opened a new pull request, #11054:
URL: https://github.com/apache/hudi/pull/11054

   ### Change Logs
   
   https://github.com/apache/hudi/pull/10591#discussion_r1483632718
   
   Rename FSUtils.getPartitionPath to constructAbsolutePath and partitionPath 
argument to relativePartitionPath so that the naming reflects the functionality.
   
   ### Impact
   
   none
   
   ### Risk level (write none, low medium or high below)
   
   none
   
   ### Documentation Update
   
   none
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-7628) Rename FSUtils.getPartitionPath to constructAbsolutePath

2024-04-19 Thread Vova Kolmakov (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vova Kolmakov updated HUDI-7628:

Status: In Progress  (was: Open)

> Rename FSUtils.getPartitionPath to constructAbsolutePath
> 
>
> Key: HUDI-7628
> URL: https://issues.apache.org/jira/browse/HUDI-7628
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Vova Kolmakov
>Priority: Major
>  Labels: hoodie-storage
> Fix For: 1.0.0
>
>
> [https://github.com/apache/hudi/pull/10591#discussion_r1483632718]
> Rename FSUtils.getPartitionPath to constructAbsolutePath and partitionPath 
> argument to relativePartitionPath so that the naming reflects the 
> functionality.  This has to be merged after HUDI-6497 and the above PR to 
> reduce merging conflicts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-7632) Remove FileSystem usage in HoodieLogFormatWriter

2024-04-19 Thread Vova Kolmakov (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vova Kolmakov reassigned HUDI-7632:
---

Assignee: Vova Kolmakov

> Remove FileSystem usage in HoodieLogFormatWriter
> 
>
> Key: HUDI-7632
> URL: https://issues.apache.org/jira/browse/HUDI-7632
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Vova Kolmakov
>Priority: Major
>  Labels: hoodie-storage
> Fix For: 1.0.0
>
>
> https://github.com/apache/hudi/pull/10591#discussion_r1569173014



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-7631) Clean up usage of `CachingPath` outside hudi-common module

2024-04-19 Thread Vova Kolmakov (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vova Kolmakov reassigned HUDI-7631:
---

Assignee: Vova Kolmakov

> Clean up usage of `CachingPath` outside hudi-common module
> --
>
> Key: HUDI-7631
> URL: https://issues.apache.org/jira/browse/HUDI-7631
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Vova Kolmakov
>Priority: Major
>  Labels: hoodie-storage
> Fix For: 1.0.0
>
>
> https://github.com/apache/hudi/pull/10591#discussion_r1484923458



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-7628) Rename FSUtils.getPartitionPath to constructAbsolutePath

2024-04-19 Thread Vova Kolmakov (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vova Kolmakov reassigned HUDI-7628:
---

Assignee: Vova Kolmakov

> Rename FSUtils.getPartitionPath to constructAbsolutePath
> 
>
> Key: HUDI-7628
> URL: https://issues.apache.org/jira/browse/HUDI-7628
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Vova Kolmakov
>Priority: Major
>  Labels: hoodie-storage
> Fix For: 1.0.0
>
>
> [https://github.com/apache/hudi/pull/10591#discussion_r1483632718]
> Rename FSUtils.getPartitionPath to constructAbsolutePath and partitionPath 
> argument to relativePartitionPath so that the naming reflects the 
> functionality.  This has to be merged after HUDI-6497 and the above PR to 
> reduce merging conflicts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-7630) Create a separate StorageUtils for hadoop-free util method

2024-04-19 Thread Vova Kolmakov (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vova Kolmakov reassigned HUDI-7630:
---

Assignee: Vova Kolmakov

> Create a separate StorageUtils for hadoop-free util method
> --
>
> Key: HUDI-7630
> URL: https://issues.apache.org/jira/browse/HUDI-7630
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Vova Kolmakov
>Priority: Major
>  Labels: hoodie-storage
> Fix For: 1.0.0
>
>
> https://github.com/apache/hudi/pull/10591#discussion_r1484920647



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-7629) Safely rename HoodieFileStatus

2024-04-19 Thread Vova Kolmakov (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-7629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vova Kolmakov reassigned HUDI-7629:
---

Assignee: Vova Kolmakov

> Safely rename HoodieFileStatus
> --
>
> Key: HUDI-7629
> URL: https://issues.apache.org/jira/browse/HUDI-7629
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Ethan Guo
>Assignee: Vova Kolmakov
>Priority: Major
>  Labels: hoodie-storage
> Fix For: 1.0.0
>
>
> [https://github.com/apache/hudi/pull/10591#discussion_r1484912753]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


(hudi) branch master updated: [HUDI-7618] Add ability to ignore checkpoints in delta streamer (#11018)

2024-04-19 Thread sivabalan
This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new ca77fda51fe [HUDI-7618] Add ability to ignore checkpoints in delta 
streamer (#11018)
ca77fda51fe is described below

commit ca77fda51fe3036f86d4ddb8b0e58a2f160882dc
Author: Sampan S Nayak 
AuthorDate: Fri Apr 19 11:55:43 2024 +0530

[HUDI-7618] Add ability to ignore checkpoints in delta streamer (#11018)
---
 .../hudi/utilities/streamer/HoodieStreamer.java|  7 +++
 .../apache/hudi/utilities/streamer/StreamSync.java | 13 -
 .../streamer/TestStreamSyncUnitTests.java  | 61 ++
 3 files changed, 79 insertions(+), 2 deletions(-)

diff --git 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/HoodieStreamer.java
 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/HoodieStreamer.java
index 59c1bf3d164..0dd488bffcb 100644
--- 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/HoodieStreamer.java
+++ 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/HoodieStreamer.java
@@ -428,6 +428,13 @@ public class HoodieStreamer implements Serializable {
 @Parameter(names = {"--config-hot-update-strategy-class"}, description = 
"Configuration hot update in continuous mode")
 public String configHotUpdateStrategyClass = "";
 
+@Parameter(names = {"--ignore-checkpoint"}, description = "Set this config 
with a unique value, recommend using a timestamp value or UUID."
++ " Setting this config indicates that the subsequent sync should 
ignore the last committed checkpoint for the source. The config value is stored"
++ " in the commit history, so setting the config with same values 
would not have any affect. This config can be used in scenarios like kafka 
topic change,"
++ " where we would want to start ingesting from the latest or earliest 
offset after switching the topic (in this case we would want to ignore the 
previously"
++ " committed checkpoint, and rely on other configs to pick the 
starting offsets).")
+public String ignoreCheckpoint = null;
+
 public boolean isAsyncCompactionEnabled() {
   return continuousMode && !forceDisableCompaction
   && 
HoodieTableType.MERGE_ON_READ.equals(HoodieTableType.valueOf(tableType));
diff --git 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/StreamSync.java
 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/StreamSync.java
index c9521058b12..2f5bd1fd3ff 100644
--- 
a/hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/StreamSync.java
+++ 
b/hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/StreamSync.java
@@ -164,6 +164,7 @@ public class StreamSync implements Serializable, Closeable {
   private static final long serialVersionUID = 1L;
   private static final Logger LOG = LoggerFactory.getLogger(StreamSync.class);
   private static final String NULL_PLACEHOLDER = "[null]";
+  public static final String CHECKPOINT_IGNORE_KEY = 
"deltastreamer.checkpoint.ignore_key";
 
   /**
* Delta Sync Config.
@@ -733,7 +734,8 @@ public class StreamSync implements Serializable, Closeable {
* @return the checkpoint to resume from if applicable.
* @throws IOException
*/
-  private Option getCheckpointToResume(Option 
commitsTimelineOpt) throws IOException {
+  @VisibleForTesting
+  Option getCheckpointToResume(Option 
commitsTimelineOpt) throws IOException {
 Option resumeCheckpointStr = Option.empty();
 // try get checkpoint from commits(including commit and deltacommit)
 // in COW migrating to MOR case, the first batch of the deltastreamer will 
lost the checkpoint from COW table, cause the dataloss
@@ -750,7 +752,11 @@ public class StreamSync implements Serializable, Closeable 
{
   if (commitMetadataOption.isPresent()) {
 HoodieCommitMetadata commitMetadata = commitMetadataOption.get();
 LOG.debug("Checkpoint reset from metadata: " + 
commitMetadata.getMetadata(CHECKPOINT_RESET_KEY));
-if (cfg.checkpoint != null && 
(StringUtils.isNullOrEmpty(commitMetadata.getMetadata(CHECKPOINT_RESET_KEY))
+if (cfg.ignoreCheckpoint != null && 
(StringUtils.isNullOrEmpty(commitMetadata.getMetadata(CHECKPOINT_IGNORE_KEY))
+|| 
!cfg.ignoreCheckpoint.equals(commitMetadata.getMetadata(CHECKPOINT_IGNORE_KEY
 {
+  // we ignore any existing checkpoint and start ingesting afresh
+  resumeCheckpointStr = Option.empty();
+} else if (cfg.checkpoint != null && 
(StringUtils.isNullOrEmpty(commitMetadata.getMetadata(CHECKPOINT_RESET_KEY))
 || 
!cfg.checkpoint.equals(commitMetadata.getMetadata(CHECKPOINT_RESET_KEY {
   resumeCheckpointStr = Option.of(cfg.checkpoint);
 } else if 

Re: [PR] [HUDI-7618] Add ability to ignore checkpoints in delta streamer [hudi]

2024-04-19 Thread via GitHub


nsivabalan merged PR #11018:
URL: https://github.com/apache/hudi/pull/11018


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7640] Uses UUID as temporary file suffix for HoodieStorage.createImmutableFileInPath [hudi]

2024-04-19 Thread via GitHub


boneanxs merged PR #11052:
URL: https://github.com/apache/hudi/pull/11052


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



(hudi) branch master updated: [HUDI-7640] Uses UUID as temporary file suffix for HoodieStorage.createImmutableFileInPath (#11052)

2024-04-19 Thread rexan
This is an automated email from the ASF dual-hosted git repository.

rexan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/master by this push:
 new caa1bef75c3 [HUDI-7640] Uses UUID as temporary file suffix for 
HoodieStorage.createImmutableFileInPath (#11052)
caa1bef75c3 is described below

commit caa1bef75c3e21b7443e192375a068c328cd6f81
Author: Danny Chan 
AuthorDate: Fri Apr 19 14:07:47 2024 +0800

[HUDI-7640] Uses UUID as temporary file suffix for 
HoodieStorage.createImmutableFileInPath (#11052)
---
 .../src/main/java/org/apache/hudi/storage/HoodieStorage.java  | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/hudi-io/src/main/java/org/apache/hudi/storage/HoodieStorage.java 
b/hudi-io/src/main/java/org/apache/hudi/storage/HoodieStorage.java
index adf9371c243..be160caba3b 100644
--- a/hudi-io/src/main/java/org/apache/hudi/storage/HoodieStorage.java
+++ b/hudi-io/src/main/java/org/apache/hudi/storage/HoodieStorage.java
@@ -37,6 +37,7 @@ import java.io.OutputStream;
 import java.net.URI;
 import java.util.ArrayList;
 import java.util.List;
+import java.util.UUID;
 
 /**
  * Provides I/O APIs on files and directories on storage.
@@ -45,7 +46,6 @@ import java.util.List;
 @PublicAPIClass(maturity = ApiMaturityLevel.EVOLVING)
 public abstract class HoodieStorage implements Closeable {
   public static final Logger LOG = 
LoggerFactory.getLogger(HoodieStorage.class);
-  public static final String TMP_PATH_POSTFIX = ".tmp";
 
   /**
* @return the scheme of the storage.
@@ -249,8 +249,11 @@ public abstract class HoodieStorage implements Closeable {
* empty, will first write the content to a temp file if 
{needCreateTempFile} is
* true, and then rename it back after the content is written.
*
-   * @param pathfile path.
-   * @param content content to be stored.
+   * CAUTION: if this method is invoked in multi-threads for concurrent 
write of the same file,
+   * an existence check of the file is recommended.
+   *
+   * @param pathFile path.
+   * @param content Content to be stored.
*/
   @PublicAPIMethod(maturity = ApiMaturityLevel.EVOLVING)
   public final void createImmutableFileInPath(StoragePath path,
@@ -267,7 +270,7 @@ public abstract class HoodieStorage implements Closeable {
 
   if (content.isPresent() && needTempFile) {
 StoragePath parent = path.getParent();
-tmpPath = new StoragePath(parent, path.getName() + TMP_PATH_POSTFIX);
+tmpPath = new StoragePath(parent, path.getName() + "." + 
UUID.randomUUID());
 fsout = create(tmpPath, false);
 fsout.write(content.get());
   }



Re: [PR] [HUDI-7640] Uses UUID as temporary file suffix for HoodieStorage.createImmutableFileInPath [hudi]

2024-04-19 Thread via GitHub


boneanxs commented on code in PR #11052:
URL: https://github.com/apache/hudi/pull/11052#discussion_r1571861739


##
hudi-io/src/main/java/org/apache/hudi/storage/HoodieStorage.java:
##
@@ -267,7 +270,7 @@ public final void createImmutableFileInPath(StoragePath 
path,
 
   if (content.isPresent() && needTempFile) {
 StoragePath parent = path.getParent();
-tmpPath = new StoragePath(parent, path.getName() + TMP_PATH_POSTFIX);
+tmpPath = new StoragePath(parent, path.getName() + "." + 
UUID.randomUUID());

Review Comment:
   I see, make sense



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org