Re: [I] [SUPPORT] Some spark writer job failed caused by UserGroupInformation lost in the new thread of timeline service threadpool [hudi]
beyond1920 closed issue #11030: [SUPPORT] Some spark writer job failed caused by UserGroupInformation lost in the new thread of timeline service threadpool URL: https://github.com/apache/hudi/issues/11030 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Some spark writer job failed caused by UserGroupInformation lost in the new thread of timeline service threadpool [hudi]
beyond1920 commented on issue #11030: URL: https://github.com/apache/hudi/issues/11030#issuecomment-2067525049 Resolved by [pr#11039](https://github.com/apache/hudi/pull/11039). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch master updated: [HUDI-7628] Rename FSUtils.getPartitionPath to constructAbsolutePath (#11054)
This is an automated email from the ASF dual-hosted git repository. yihua pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 683c4998d6d [HUDI-7628] Rename FSUtils.getPartitionPath to constructAbsolutePath (#11054) 683c4998d6d is described below commit 683c4998d6de28605f9e94c05972258a22f2e5b9 Author: Vova Kolmakov AuthorDate: Sat Apr 20 08:11:07 2024 +0700 [HUDI-7628] Rename FSUtils.getPartitionPath to constructAbsolutePath (#11054) Co-authored-by: Vova Kolmakov --- .../hudi/aws/sync/AWSGlueCatalogSyncClient.java| 4 +-- .../apache/hudi/cli/commands/RepairsCommand.java | 4 +-- .../apache/hudi/client/CompactionAdminClient.java | 4 +-- .../index/bucket/ConsistentBucketIndexUtils.java | 8 +++--- .../org/apache/hudi/io/HoodieAppendHandle.java | 2 +- .../org/apache/hudi/io/HoodieCreateHandle.java | 2 +- .../java/org/apache/hudi/io/HoodieMergeHandle.java | 2 +- .../java/org/apache/hudi/io/HoodieWriteHandle.java | 4 +-- .../metadata/HoodieBackedTableMetadataWriter.java | 2 +- .../hudi/table/action/compact/HoodieCompactor.java | 2 +- .../table/action/rollback/BaseRollbackHelper.java | 2 +- .../rollback/ListingBasedRollbackStrategy.java | 6 ++-- .../ttl/strategy/KeepByCreationTimeStrategy.java | 2 +- .../marker/TimelineServerBasedWriteMarkers.java| 4 +-- .../org/apache/hudi/table/marker/WriteMarkers.java | 2 +- .../io/storage/row/HoodieRowDataCreateHandle.java | 4 +-- .../hudi/io/storage/row/HoodieRowCreateHandle.java | 4 +-- .../TestSavepointRestoreMergeOnRead.java | 8 +++--- .../java/org/apache/hudi/table/TestCleaner.java| 4 +-- ...dieSparkMergeOnReadTableInsertUpdateDelete.java | 2 +- .../hudi/table/marker/TestWriteMarkersBase.java| 2 +- .../java/org/apache/hudi/common/fs/FSUtils.java| 32 +++--- .../hudi/common/model/CompactionOperation.java | 2 +- .../hudi/common/model/HoodieCommitMetadata.java| 8 +++--- .../hudi/common/table/cdc/HoodieCDCExtractor.java | 4 +-- .../clean/CleanMetadataV1MigrationHandler.java | 2 +- .../clean/CleanPlanV2MigrationHandler.java | 2 +- .../compaction/CompactionV1MigrationHandler.java | 2 +- .../table/view/AbstractTableFileSystemView.java| 4 +-- .../IncrementalTimelineSyncFileSystemView.java | 2 +- .../sink/compact/ITTestHoodieFlinkCompactor.java | 2 +- .../org/apache/hudi/IncrementalRelation.scala | 2 +- .../AlterHoodieTableAddPartitionCommand.scala | 2 +- .../RepairAddpartitionmetaProcedure.scala | 2 +- .../RepairMigratePartitionMetaProcedure.scala | 2 +- .../procedures/ShowInvalidParquetProcedure.scala | 2 +- .../TestSparkConsistentBucketClustering.java | 2 +- .../apache/hudi/sync/adb/HoodieAdbJdbcClient.java | 10 +++ .../org/apache/hudi/hive/ddl/HMSDDLExecutor.java | 4 +-- .../hudi/hive/ddl/QueryBasedDDLExecutor.java | 4 +-- .../org/apache/hudi/hive/TestHiveSyncTool.java | 2 +- .../apache/hudi/sync/common/HoodieSyncClient.java | 4 +-- .../hudi/utilities/HoodieDataTableUtils.java | 2 +- .../utilities/HoodieMetadataTableValidator.java| 8 +++--- .../hudi/utilities/HoodieSnapshotCopier.java | 4 +-- .../hudi/utilities/HoodieSnapshotExporter.java | 4 +-- 46 files changed, 94 insertions(+), 94 deletions(-) diff --git a/hudi-aws/src/main/java/org/apache/hudi/aws/sync/AWSGlueCatalogSyncClient.java b/hudi-aws/src/main/java/org/apache/hudi/aws/sync/AWSGlueCatalogSyncClient.java index e06db9f2ba4..6dda51fd134 100644 --- a/hudi-aws/src/main/java/org/apache/hudi/aws/sync/AWSGlueCatalogSyncClient.java +++ b/hudi-aws/src/main/java/org/apache/hudi/aws/sync/AWSGlueCatalogSyncClient.java @@ -303,7 +303,7 @@ public class AWSGlueCatalogSyncClient extends HoodieSyncClient { try { StorageDescriptor sd = table.storageDescriptor(); List partitionInputList = partitionsToAdd.stream().map(partition -> { -String fullPartitionPath = FSUtils.getPartitionPathInHadoopPath(s3aToS3(getBasePath()), partition).toString(); +String fullPartitionPath = FSUtils.constructAbsolutePathInHadoopPath(s3aToS3(getBasePath()), partition).toString(); List partitionValues = partitionValueExtractor.extractPartitionValuesInPath(partition); StorageDescriptor partitionSD = sd.copy(copySd -> copySd.location(fullPartitionPath)); return PartitionInput.builder().values(partitionValues).storageDescriptor(partitionSD).build(); @@ -347,7 +347,7 @@ public class AWSGlueCatalogSyncClient extends HoodieSyncClient { try { StorageDescriptor sd = table.storageDescriptor(); List updatePartitionEntries = changedPartitions.stream().map(partition -> { -String fullPartitionPath = FSUtils.getPartitionPathInHadoopPath(s3aToS3(getBasePath()),
Re: [PR] [HUDI-7628] Rename FSUtils.getPartitionPath to constructAbsolutePath [hudi]
yihua merged PR #11054: URL: https://github.com/apache/hudi/pull/11054 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch asf-site updated: [DOCS] Added configurations of Hudi table, file-based SQL source, Hudi error table, and timestamp key generator to configuration listing (#11058)
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new e3894931de4 [DOCS] Added configurations of Hudi table, file-based SQL source, Hudi error table, and timestamp key generator to configuration listing (#11058) e3894931de4 is described below commit e3894931de489f222730972d76f783ffd67cccac Author: Geser Dugarov AuthorDate: Sat Apr 20 07:44:31 2024 +0700 [DOCS] Added configurations of Hudi table, file-based SQL source, Hudi error table, and timestamp key generator to configuration listing (#11058) --- website/docs/basic_configurations.md | 91 - website/docs/configurations.md | 125 ++- 2 files changed, 214 insertions(+), 2 deletions(-) diff --git a/website/docs/basic_configurations.md b/website/docs/basic_configurations.md index 2f18ad3e885..1fc301521e1 100644 --- a/website/docs/basic_configurations.md +++ b/website/docs/basic_configurations.md @@ -1,12 +1,13 @@ --- title: Basic Configurations summary: This page covers the basic configurations you may use to write/read Hudi tables. This page only features a subset of the most frequently used configurations. For a full list of all configs, please visit the [All Configurations](/docs/configurations) page. -last_modified_at: 2024-04-15T09:56:05.413 +last_modified_at: 2024-04-19T18:21:42.88 --- This page covers the basic configurations you may use to write/read Hudi tables. This page only features a subset of the most frequently used configurations. For a full list of all configs, please visit the [All Configurations](/docs/configurations) page. +- [**Hudi Table Config**](#TABLE_CONFIG): Basic Hudi Table configuration parameters. - [**Spark Datasource Configs**](#SPARK_DATASOURCE): These configs control the Hudi Spark Datasource, providing ability to define keys/partitioning, pick out the write operation, specify how to merge records or choosing query type to read. - [**Flink Sql Configs**](#FLINK_SQL): These configs control the Hudi Flink SQL source/sink connectors, providing ability to define record keys, pick out the write operation, specify how to merge records, enable/disable asynchronous compaction or choosing query type to read. - [**Write Client Configs**](#WRITE_CLIENT): Internally, the Hudi datasource uses a RDD based HoodieWriteClient API to actually perform writes to storage. These configs provide deep control over lower level aspects like file sizing, compression, parallelism, compaction, write schema, cleaning etc. Although Hudi provides sane defaults, from time-time these configs may need to be tweaked to optimize for specific workloads. @@ -20,6 +21,56 @@ This page covers the basic configurations you may use to write/read Hudi tables. In the tables below **(N/A)** means there is no default value set ::: +## Hudi Table Config {#TABLE_CONFIG} +Basic Hudi Table configuration parameters. + + +### Hudi Table Basic Configs {#Hudi-Table-Basic-Configs} +Configurations of the Hudi Table like type of ingestion, storage formats, hive table name etc. Configurations are loaded from hoodie.properties, these properties are usually set during initializing a path as hoodie base path and never changes during the lifetime of a hoodie table. + + + + +[**Basic Configs**](#Hudi-Table-Basic-Configs-basic-configs) + + +| Config Name | Default | Description [...] +| | --- | - [...] +| [hoodie.bootstrap.base.path](#hoodiebootstrapbasepath) | (N/A) | Base path of the dataset that needs to be bootstrapped as a Hudi table`Config Param: BOOTSTRAP_BASE_PATH`
Re: [PR] [DOCS] [MINOR] Added configurations of Hudi table, file-based SQL source, Hudi error table, and timestamp key generator to configuration listing [hudi]
danny0405 merged PR #11058: URL: https://github.com/apache/hudi/pull/11058 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch master updated: [MINOR] Added configurations of Hudi table, file-based SQL source, Hudi error table, and timestamp key generator to configuration listing (#11057)
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 60424c5f998 [MINOR] Added configurations of Hudi table, file-based SQL source, Hudi error table, and timestamp key generator to configuration listing (#11057) 60424c5f998 is described below commit 60424c5f9987f35cc21b0288ac11bd87602ae1c1 Author: Geser Dugarov AuthorDate: Sat Apr 20 07:43:37 2024 +0700 [MINOR] Added configurations of Hudi table, file-based SQL source, Hudi error table, and timestamp key generator to configuration listing (#11057) --- .../org/apache/hudi/config/HoodieErrorTableConfig.java | 3 ++- .../org/apache/hudi/common/config/ConfigGroups.java | 4 .../hudi/common/config/TimestampKeyGeneratorConfig.java | 2 +- .../org/apache/hudi/common/table/HoodieTableConfig.java | 17 ++--- .../hudi/utilities/config/SqlFileBasedSourceConfig.java | 3 ++- 5 files changed, 19 insertions(+), 10 deletions(-) diff --git a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieErrorTableConfig.java b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieErrorTableConfig.java index 8ba013b00ee..1db8f2c4b5f 100644 --- a/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieErrorTableConfig.java +++ b/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieErrorTableConfig.java @@ -21,6 +21,7 @@ package org.apache.hudi.config; import org.apache.hudi.common.config.ConfigClassProperty; import org.apache.hudi.common.config.ConfigGroups; import org.apache.hudi.common.config.ConfigProperty; +import org.apache.hudi.common.config.HoodieConfig; import javax.annotation.concurrent.Immutable; @@ -30,7 +31,7 @@ import java.util.Arrays; @ConfigClassProperty(name = "Error table Configs", groupName = ConfigGroups.Names.WRITE_CLIENT, description = "Configurations that are required for Error table configs") -public class HoodieErrorTableConfig { +public class HoodieErrorTableConfig extends HoodieConfig { public static final ConfigProperty ERROR_TABLE_ENABLED = ConfigProperty .key("hoodie.errortable.enable") .defaultValue(false) diff --git a/hudi-common/src/main/java/org/apache/hudi/common/config/ConfigGroups.java b/hudi-common/src/main/java/org/apache/hudi/common/config/ConfigGroups.java index 18d28ab6275..5bab6f9aeb3 100644 --- a/hudi-common/src/main/java/org/apache/hudi/common/config/ConfigGroups.java +++ b/hudi-common/src/main/java/org/apache/hudi/common/config/ConfigGroups.java @@ -30,6 +30,7 @@ public class ConfigGroups { * {@link ConfigGroups#getDescription}. */ public enum Names { +TABLE_CONFIG("Hudi Table Config"), ENVIRONMENT_CONFIG("Environment Config"), SPARK_DATASOURCE("Spark Datasource Configs"), FLINK_SQL("Flink Sql Configs"), @@ -98,6 +99,9 @@ public class ConfigGroups { public static String getDescription(Names names) { String description; switch (names) { + case TABLE_CONFIG: +description = "Basic Hudi Table configuration parameters."; +break; case ENVIRONMENT_CONFIG: description = "Hudi supports passing configurations via a configuration file " + "`hudi-default.conf` in which each line consists of a key and a value " diff --git a/hudi-common/src/main/java/org/apache/hudi/common/config/TimestampKeyGeneratorConfig.java b/hudi-common/src/main/java/org/apache/hudi/common/config/TimestampKeyGeneratorConfig.java index 7098c076279..46b66371b31 100644 --- a/hudi-common/src/main/java/org/apache/hudi/common/config/TimestampKeyGeneratorConfig.java +++ b/hudi-common/src/main/java/org/apache/hudi/common/config/TimestampKeyGeneratorConfig.java @@ -31,7 +31,7 @@ import java.util.concurrent.TimeUnit; + "the partition field. The field values are interpreted as timestamps and not just " + "converted to string while generating partition path value for records. Record key is " + "same as before where it is chosen by field name.") -public class TimestampKeyGeneratorConfig { +public class TimestampKeyGeneratorConfig extends HoodieConfig { private static final String TIMESTAMP_KEYGEN_CONFIG_PREFIX = "hoodie.keygen.timebased."; @Deprecated private static final String OLD_TIMESTAMP_KEYGEN_CONFIG_PREFIX = "hoodie.deltastreamer.keygen.timebased."; diff --git a/hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java b/hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java index 78ef425a1d6..9cf3e538fd6 100644 --- a/hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java +++ b/hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java @@ -19,6 +19,8 @@ package org.apache.hudi.common.table; import
Re: [PR] [MINOR] Added configurations of Hudi table, file-based SQL source, Hudi error table, and timestamp key generator to configuration listing [hudi]
danny0405 merged PR #11057: URL: https://github.com/apache/hudi/pull/11057 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (HUDI-7515) Fix partition metadata write failure
[ https://issues.apache.org/jira/browse/HUDI-7515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen closed HUDI-7515. Resolution: Fixed Fixed via master branch: 7a44b1ebc41ce66621e958df22195524373434c1 > Fix partition metadata write failure > > > Key: HUDI-7515 > URL: https://issues.apache.org/jira/browse/HUDI-7515 > Project: Apache Hudi > Issue Type: Bug >Reporter: Wechar >Assignee: Danny Chen >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0, 1.0.0 > > Attachments: screenshot-1.png > > > Avoid failing to write partition metadata. When spark.speculation is enabled, > if the write metadata operation become slow for some reason, a speculative > will be started to write the same metadata file concurrently. > In HDFS, two tasks(like one is speculate task) writing to the same file could > both throw exception like so: > {code:bash} > File does not exist: > /path/to/table/a=3519/b=3520/c=3521/.hoodie_partition_metadata_112 (inode > 48415575374) Holder DFSClient_NONMAPREDUCE_-2108606624_29 does not have any > open files. > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
danny0405 merged PR #10886: URL: https://github.com/apache/hudi/pull/10886 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch master updated (7ac26bce3f3 -> 7a44b1ebc41)
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git from 7ac26bce3f3 [HUDI-7643] Fix test by using the right StreamSync constructor (#11056) add 7a44b1ebc41 [HUDI-7515] Fix partition metadata write failure (#10886) No new revisions were added by this update. Summary of changes: .../apache/hudi/cli/commands/RepairsCommand.java | 4 +- .../org/apache/hudi/io/HoodieAppendHandle.java | 2 +- .../org/apache/hudi/io/HoodieCreateHandle.java | 2 +- .../java/org/apache/hudi/io/HoodieMergeHandle.java | 2 +- .../io/storage/row/HoodieRowDataCreateHandle.java | 2 +- .../hudi/io/storage/row/HoodieRowCreateHandle.java | 2 +- .../hudi/common/model/HoodiePartitionMetadata.java | 80 -- .../table/timeline/HoodieActiveTimeline.java | 12 +--- .../common/model/TestHoodiePartitionMetadata.java | 2 +- .../common/testutils/HoodieTestDataGenerator.java | 5 +- .../hudi/common/util/TestTablePathUtils.java | 4 +- .../hudi/hadoop/testutils/InputFormatTestUtil.java | 2 +- .../org/apache/hudi/storage/HoodieStorage.java | 2 +- .../AlterHoodieTableAddPartitionCommand.scala | 2 +- .../RepairAddpartitionmetaProcedure.scala | 2 +- .../RepairMigratePartitionMetaProcedure.scala | 2 +- 16 files changed, 62 insertions(+), 65 deletions(-)
[jira] [Updated] (HUDI-7507) ongoing concurrent writers with smaller timestamp can cause issues with table services
[ https://issues.apache.org/jira/browse/HUDI-7507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krishen Bhan updated HUDI-7507: --- Description: *Scenarios:* Although HUDI operations hold a table lock when creating a .requested instant, because HUDI writers do not generate a timestamp and create a .requsted plan in the same transaction, there can be a scenario where # Job 1 starts, chooses timestamp (x) , Job 2 starts and chooses timestamp (x - 1) # Job 1 schedules and creates requested file with instant timestamp (x) # Job 2 schedules and creates requested file with instant timestamp (x-1) # Both jobs continue running If one job is writing a commit and the other is a table service, this can cause issues: * ** If Job 2 is ingestion commit and Job 1 is compaction/log compaction, then when Job 1 runs before Job 2 and can create a compaction plan for all instant times (up to (x) ) that doesn’t include instant time (x-1) . Later Job 2 will create instant time (x-1), but timeline will be in a corrupted state since compaction plan was supposed to include (x-1) ** There is a similar issue with clean. If Job2 is a long-running commit (that was stuck/delayed for a while before creating its .requested plan) and Job 1 is a clean, then Job 1 can perform a clean that updates the earliest-commit-to-retain without waiting for the inflight instant by Job 2 at (x-1) to complete. This causes Job2 to be "skipped" by clean. [Edit] I added a diagram to visualize the issue, specifically the second scenario with clean !Flowchart (1).png! *Proposed approach:* One way this can be resolved is by combining the operations of generating instant time and creating a requested file in the same HUDI table transaction. Specifically, executing the following steps whenever any instant (commit, table service, etc) is scheduled Approach A # Acquire table lock # Look at the latest instant C on the active timeline (completed or not). Generate a timestamp after C # Create the plan and requested file using this new timestamp ( that is greater than C) # Release table lock Unfortunately (A) has the following drawbacks * Every operation must now hold the table lock when computing its plan even if it's an expensive operation and will take a while * Users of HUDI cannot easily set their own instant time of an operation, and this restriction would break any public APIs that allow this and would require deprecating those APIs. An alternate approach is to have every operation abort creating a .requested file unless it has the latest timestamp. Specifically, for any instant type, whenever an operation is about to create a .requested plan on timeline, it should take the table lock and assert that there are no other instants on timeline that are greater than it that could cause a conflict. If that assertion fails, then throw a retry-able conflict resolution exception. Specifically, the following steps should be followed whenever any instant (commit, table service, etc) is scheduled Approach B # Acquire table lock. Assume that the desired instant time C and requested file plan metadata have already been created, regardless of wether it was before this step or right after acquiring the table lock. # Get the set of all instants on the timeline that are greater than C (regardless of their action or sate status). ## If the current operation is an ingestion type (commit/deltacommit/insert_overwrite replace) then assert the set is empty ## If the current operation is a table service then assert that the set doesn't contain any table service instant types # Create requested plan on timeline (As usual) # Release table Unlike (A), this approach (B) allows users to continue to use HUDI APIs where caller can specify instant time (preventing the need from deprecating any public API). It also allows the possibility of table service operations computing their plan without holding a lock. Despite this though, (B) has following drawbacks * It is not immediately clear how MDT vs base table operations should be handled here. At first glance it seems that at step (2) both the base table and MDT timeline should be checked, but that might need more investigation to confirm. * This error will still be thrown even for combinations of concurrent operations where it would be safe to continue. For example, assume two ingestion writers being executing on a dataset, with each only performing a insert commit on the dataset (with no table service being scheduled). If the writer that started scheduling later ending up having an earlier timestamp, it would still be safe for it to continue. Despite that, because of step (2.1) it would still have to abort an throw an error. This means that on datasets with many frequent concurrent ingestion commits and very infrequent table service operations, there would be a lot of transient failures/noise
[jira] [Updated] (HUDI-7641) Add metrics to track what partitions are enabled in MDT
[ https://issues.apache.org/jira/browse/HUDI-7641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan updated HUDI-7641: -- Fix Version/s: 0.15.0 > Add metrics to track what partitions are enabled in MDT > --- > > Key: HUDI-7641 > URL: https://issues.apache.org/jira/browse/HUDI-7641 > Project: Apache Hudi > Issue Type: Improvement > Components: metadata >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-7641) Add metrics to track what partitions are enabled in MDT
[ https://issues.apache.org/jira/browse/HUDI-7641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sivabalan narayanan reassigned HUDI-7641: - Assignee: sivabalan narayanan > Add metrics to track what partitions are enabled in MDT > --- > > Key: HUDI-7641 > URL: https://issues.apache.org/jira/browse/HUDI-7641 > Project: Apache Hudi > Issue Type: Improvement > Components: metadata >Reporter: sivabalan narayanan >Assignee: sivabalan narayanan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7641] Adding metadata enablement metrics [hudi]
hudi-bot commented on PR #11053: URL: https://github.com/apache/hudi/pull/11053#issuecomment-2067283377 ## CI report: * 59d53ba7cc510038dcaea707f434ebc5529b29dc UNKNOWN * ab39be7b9f1d7bf9de4b69640dce50105b6d9147 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23371) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7641] Adding metadata enablement metrics [hudi]
hudi-bot commented on PR #11053: URL: https://github.com/apache/hudi/pull/11053#issuecomment-2067215979 ## CI report: * 7d5d983e224a82cbb8b4253f8d9b374cd9e4b9aa Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23370) * 59d53ba7cc510038dcaea707f434ebc5529b29dc UNKNOWN * ab39be7b9f1d7bf9de4b69640dce50105b6d9147 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23371) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7641] Adding metadata enablement metrics [hudi]
hudi-bot commented on PR #11053: URL: https://github.com/apache/hudi/pull/11053#issuecomment-2067161180 ## CI report: * 3f7d727e83f05cb5ce7f9a3da2bfffca72686345 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23359) * 7d5d983e224a82cbb8b4253f8d9b374cd9e4b9aa Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23370) * 59d53ba7cc510038dcaea707f434ebc5529b29dc UNKNOWN * ab39be7b9f1d7bf9de4b69640dce50105b6d9147 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23371) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7641] Adding metadata enablement metrics [hudi]
hudi-bot commented on PR #11053: URL: https://github.com/apache/hudi/pull/11053#issuecomment-2067152702 ## CI report: * 3f7d727e83f05cb5ce7f9a3da2bfffca72686345 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23359) * 7d5d983e224a82cbb8b4253f8d9b374cd9e4b9aa Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23370) * 59d53ba7cc510038dcaea707f434ebc5529b29dc UNKNOWN * ab39be7b9f1d7bf9de4b69640dce50105b6d9147 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7641] Adding metadata enablement metrics [hudi]
hudi-bot commented on PR #11053: URL: https://github.com/apache/hudi/pull/11053#issuecomment-2067143554 ## CI report: * 3f7d727e83f05cb5ce7f9a3da2bfffca72686345 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23359) * 7d5d983e224a82cbb8b4253f8d9b374cd9e4b9aa Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23370) * 59d53ba7cc510038dcaea707f434ebc5529b29dc UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7641] Adding metadata enablement metrics [hudi]
hudi-bot commented on PR #11053: URL: https://github.com/apache/hudi/pull/11053#issuecomment-2067081207 ## CI report: * 3f7d727e83f05cb5ce7f9a3da2bfffca72686345 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23359) * 7d5d983e224a82cbb8b4253f8d9b374cd9e4b9aa Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23370) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7641] Adding metadata enablement metrics [hudi]
hudi-bot commented on PR #11053: URL: https://github.com/apache/hudi/pull/11053#issuecomment-2067071879 ## CI report: * 3f7d727e83f05cb5ce7f9a3da2bfffca72686345 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23359) * 7d5d983e224a82cbb8b4253f8d9b374cd9e4b9aa UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] HudiDeltaStreaming consuming from Kafka - can't see the Kafka Consumer Group in Kafka [hudi]
mattssll closed issue #11051: [SUPPORT] HudiDeltaStreaming consuming from Kafka - can't see the Kafka Consumer Group in Kafka URL: https://github.com/apache/hudi/issues/11051 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Added configurations of Hudi table, file-based SQL source, Hudi error table, and timestamp key generator to configuration listing [hudi]
hudi-bot commented on PR #11057: URL: https://github.com/apache/hudi/pull/11057#issuecomment-2066570835 ## CI report: * d8ab7259a5c4825d6634eebb8610ca072abb4a05 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23369) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
hudi-bot commented on PR #10886: URL: https://github.com/apache/hudi/pull/10886#issuecomment-2066536128 ## CI report: * cc4c48076b11d9a97fdb7fd0f6f0a5253d530ff1 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23368) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Added configurations of Hudi table, file-based SQL source, Hudi error table, and timestamp key generator to configuration listing [hudi]
hudi-bot commented on PR #11057: URL: https://github.com/apache/hudi/pull/11057#issuecomment-2066453700 ## CI report: * d8ab7259a5c4825d6634eebb8610ca072abb4a05 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23369) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Added configurations of Hudi table, file-based SQL source, Hudi error table, and timestamp key generator to configuration listing [hudi]
hudi-bot commented on PR #11057: URL: https://github.com/apache/hudi/pull/11057#issuecomment-2066442267 ## CI report: * d8ab7259a5c4825d6634eebb8610ca072abb4a05 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] If Sanitastiion Enabled In HudiStreamer It is taking too much time [hudi]
Amar1404 commented on issue #10466: URL: https://github.com/apache/hudi/issues/10466#issuecomment-2066439568 hi @ad1happy2go - Any updates on this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] HudiDeltaStreaming consuming from Kafka - can't see the Kafka Consumer Group in Kafka [hudi]
Amar1404 commented on issue #11051: URL: https://github.com/apache/hudi/issues/11051#issuecomment-2066436408 Hi @mattssll - The hudiDeltaStream use the concept of checkpoint here instead of consumer group. Here in hoodie commit file it will store the last offset from each partiion it is read upto. Something like topic.0:26228942,1:26231665,2:26218546,3:26229200,4:26220648,5:26226357 So when you read again the Kafka read after these offset from the parition and match it with existing partition -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [DOCS] [MINOR] Added configurations of Hudi table, file-based SQL source, Hudi error table, and timestamp key generator to configuration listing [hudi]
geserdugarov opened a new pull request, #11058: URL: https://github.com/apache/hudi/pull/11058 ### Change Logs Updates to the docs. Should be merged only if [MR 11057](https://github.com/apache/hudi/pull/11057) with corresponding code changes will be merged. Not all configurations are presented on [All configurations](https://hudi.apache.org/docs/configurations) page. This MR adds list of basic Hudi table configurations, and also some missed configurations of file-based SQL source, Hudi error table, and timestamp key generator. ### Impact Last updates to the `current` version of the docs. ### Risk level (write none, low medium or high below) None ### Contributor's checklist - [x] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [x] Change Logs and Impact were stated clearly - [x] Adequate tests were added if applicable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [MINOR] Added configurations of Hudi table, file-based SQL source, Hudi error table, and timestamp key generator to configuration listing [hudi]
geserdugarov opened a new pull request, #11057: URL: https://github.com/apache/hudi/pull/11057 ### Change Logs Not all configurations are presented on [All configurations](https://hudi.apache.org/docs/configurations) page. This MR adds list of basic Hudi table configurations, and also some missed configurations of file-based SQL source, Hudi error table, and timestamp key generator. ### Impact Change only the list of all configurations on Hudi site. ### Risk level (write none, low medium or high below) None ### Documentation Update I will open corresponding MR to the `asf-site` branch. ### Contributor's checklist - [x] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [x] Change Logs and Impact were stated clearly - [x] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
hudi-bot commented on PR #10886: URL: https://github.com/apache/hudi/pull/10886#issuecomment-2066365873 ## CI report: * af5d107b867fd97362710bc032a95743eb5d33a8 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23362) * cc4c48076b11d9a97fdb7fd0f6f0a5253d530ff1 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23368) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
hudi-bot commented on PR #10886: URL: https://github.com/apache/hudi/pull/10886#issuecomment-2066355763 ## CI report: * af5d107b867fd97362710bc032a95743eb5d33a8 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23362) * cc4c48076b11d9a97fdb7fd0f6f0a5253d530ff1 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7628] Rename FSUtils.getPartitionPath to constructAbsolutePath [hudi]
hudi-bot commented on PR #11054: URL: https://github.com/apache/hudi/pull/11054#issuecomment-2066344280 ## CI report: * 0c8df7c7d066c2115e4a04dabb10fd14a54e40d2 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23364) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
danny0405 commented on PR #10886: URL: https://github.com/apache/hudi/pull/10886#issuecomment-2066283997 You can rebase with the latest master now to resolve the compile error. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Closed] (HUDI-7643) Fix TestStreamSyncUnitTests
[ https://issues.apache.org/jira/browse/HUDI-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen closed HUDI-7643. Resolution: Fixed Fixed via master branch: 7ac26bce3f3aad2c9aebeb55febc4375c4f7bd1d > Fix TestStreamSyncUnitTests > --- > > Key: HUDI-7643 > URL: https://issues.apache.org/jira/browse/HUDI-7643 > Project: Apache Hudi > Issue Type: Task >Reporter: Sagar Sumit >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0, 1.0.0 > > > Use the right StreamSync constructor > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7643) Fix TestStreamSyncUnitTests
[ https://issues.apache.org/jira/browse/HUDI-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danny Chen updated HUDI-7643: - Fix Version/s: 0.15.0 1.0.0 > Fix TestStreamSyncUnitTests > --- > > Key: HUDI-7643 > URL: https://issues.apache.org/jira/browse/HUDI-7643 > Project: Apache Hudi > Issue Type: Task >Reporter: Sagar Sumit >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0, 1.0.0 > > > Use the right StreamSync constructor > -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
wecharyu commented on code in PR #10886: URL: https://github.com/apache/hudi/pull/10886#discussion_r1572163511 ## hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java: ## @@ -94,36 +96,32 @@ public int getPartitionDepth() { /** * Write the metadata safely into partition atomically. */ - public void trySave(int taskPartitionId) { + public void trySave() throws HoodieIOException { String extension = getMetafileExtension(); -StoragePath tmpMetaPath = -new StoragePath(partitionPath, HoodiePartitionMetadata.HOODIE_PARTITION_METAFILE_PREFIX + "_" + taskPartitionId + extension); -StoragePath metaPath = new StoragePath(partitionPath, HoodiePartitionMetadata.HOODIE_PARTITION_METAFILE_PREFIX + extension); -boolean metafileExists = false; +StoragePath metaPath = new StoragePath( +partitionPath, HoodiePartitionMetadata.HOODIE_PARTITION_METAFILE_PREFIX + extension); -try { - metafileExists = storage.exists(metaPath); - if (!metafileExists) { -// write to temporary file -writeMetafile(tmpMetaPath); -// move to actual path -storage.rename(tmpMetaPath, metaPath); - } -} catch (IOException ioe) { - LOG.warn("Error trying to save partition metadata (this is okay, as long as at least 1 of these succeeded), " - + partitionPath, ioe); -} finally { - if (!metafileExists) { -try { - // clean up tmp file, if still lying around - if (storage.exists(tmpMetaPath)) { -storage.deleteFile(tmpMetaPath); +// This retry mechanism enables an exit-fast in metaPath exists check, which avoid the +// tasks failures when there are two or more tasks trying to create the same metaPath. +RetryHelper retryHelper = new RetryHelper(1000, 3, 1000, HoodieIOException.class.getName()) +.tryWith(() -> { + if (!storage.exists(metaPath)) { +if (format.isPresent()) { + StoragePath tmpMetaPath = new StoragePath( Review Comment: Done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch master updated: [HUDI-7643] Fix test by using the right StreamSync constructor (#11056)
This is an automated email from the ASF dual-hosted git repository. danny0405 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new 7ac26bce3f3 [HUDI-7643] Fix test by using the right StreamSync constructor (#11056) 7ac26bce3f3 is described below commit 7ac26bce3f3aad2c9aebeb55febc4375c4f7bd1d Author: Sagar Sumit AuthorDate: Fri Apr 19 15:55:46 2024 +0530 [HUDI-7643] Fix test by using the right StreamSync constructor (#11056) --- .../org/apache/hudi/utilities/streamer/TestStreamSyncUnitTests.java | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/hudi-utilities/src/test/java/org/apache/hudi/utilities/streamer/TestStreamSyncUnitTests.java b/hudi-utilities/src/test/java/org/apache/hudi/utilities/streamer/TestStreamSyncUnitTests.java index 8ff5b6ee933..fe775f95a36 100644 --- a/hudi-utilities/src/test/java/org/apache/hudi/utilities/streamer/TestStreamSyncUnitTests.java +++ b/hudi-utilities/src/test/java/org/apache/hudi/utilities/streamer/TestStreamSyncUnitTests.java @@ -141,7 +141,7 @@ public class TestStreamSyncUnitTests { @MethodSource("getCheckpointToResumeCases") void testGetCheckpointToResume(HoodieStreamer.Config cfg, HoodieCommitMetadata commitMetadata, Option expectedResumeCheckpoint) throws IOException { HoodieSparkEngineContext hoodieSparkEngineContext = mock(HoodieSparkEngineContext.class); -FileSystem fs = mock(FileSystem.class); +HoodieStorage storage = HoodieStorageUtils.getStorage(mock(FileSystem.class)); TypedProperties props = new TypedProperties(); SparkSession sparkSession = mock(SparkSession.class); Configuration configuration = mock(Configuration.class); @@ -152,7 +152,7 @@ public class TestStreamSyncUnitTests { when(commitsTimeline.lastInstant()).thenReturn(Option.of(hoodieInstant)); StreamSync streamSync = new StreamSync(cfg, sparkSession, props, hoodieSparkEngineContext, -fs, configuration, client -> true, null,Option.empty(),null,Option.empty(),true,true); +storage, configuration, client -> true, null,Option.empty(),null,Option.empty(),true,true); StreamSync spy = spy(streamSync); doReturn(Option.of(commitMetadata)).when(spy).getLatestCommitMetadataWithValidCheckpointInfo(any());
Re: [PR] [HUDI-7643] Fix test by using the right StreamSync constructor [hudi]
danny0405 merged PR #11056: URL: https://github.com/apache/hudi/pull/11056 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT]Data Loss Issue with Hudi Table After 3 Days of Continuous Writes [hudi]
danny0405 commented on issue #11016: URL: https://github.com/apache/hudi/issues/11016#issuecomment-2066279351 > but the issue is that we can't access older data. If you table is ingested in streaming `upsert`, then you just specify the `read.start-commit` as the first commit instant time on the timeline, and skip the compaction. Only instant that has not been cleaned can be consumed. It actually depends on how you write the history dataset, because `bulk_insert` does not guarantee the payload sequence of one key, so if the table is boostraped with `bulk_insert`, the only way is to consume from `earliest`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7629] Safely rename HoodieFileStatus [hudi]
hudi-bot commented on PR #11055: URL: https://github.com/apache/hudi/pull/11055#issuecomment-2066274445 ## CI report: * 8957421b837bb5471701724e47ae908e0c0655fb Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23365) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7643] Fix test by using the right StreamSync constructor [hudi]
hudi-bot commented on PR #11056: URL: https://github.com/apache/hudi/pull/11056#issuecomment-2066274503 ## CI report: * 0dd750741922a580a0c8bee13996f7583f5b98c0 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23366) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
danny0405 commented on PR #10886: URL: https://github.com/apache/hudi/pull/10886#issuecomment-2066274001 The master build is broken and here is the fix: https://github.com/apache/hudi/pull/11056, you may need to await for this patch and rebase with the latest master again~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7643] Fix test by using the right StreamSync constructor [hudi]
hudi-bot commented on PR #11056: URL: https://github.com/apache/hudi/pull/11056#issuecomment-2066263288 ## CI report: * 0dd750741922a580a0c8bee13996f7583f5b98c0 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7618] Add ability to ignore checkpoints in delta streamer [hudi]
danny0405 commented on PR #11018: URL: https://github.com/apache/hudi/pull/11018#issuecomment-2066267766 Hey, the master compile got crush with this patch, can you take care of it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7628] Rename FSUtils.getPartitionPath to constructAbsolutePath [hudi]
hudi-bot commented on PR #11054: URL: https://github.com/apache/hudi/pull/11054#issuecomment-2066251007 ## CI report: * 0d5781211cbe9977838db3ee7134bc473b6110aa Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23363) * 0c8df7c7d066c2115e4a04dabb10fd14a54e40d2 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23364) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
hudi-bot commented on PR #10886: URL: https://github.com/apache/hudi/pull/10886#issuecomment-2066250342 ## CI report: * af5d107b867fd97362710bc032a95743eb5d33a8 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23362) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7629] Safely rename HoodieFileStatus [hudi]
wombatu-kun commented on PR #11055: URL: https://github.com/apache/hudi/pull/11055#issuecomment-2066223372 i thought the decision was already made as the task https://issues.apache.org/jira/browse/HUDI-7629 was created. @yihua @vinothchandar @danny0405 could you please make a decision collectively? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [HUDI-7643] Fix test by using the right StreamSync constructor [hudi]
codope opened a new pull request, #11056: URL: https://github.com/apache/hudi/pull/11056 ### Change Logs `StreamSync` constructor changed after `HoodieStorage` abstraction was introduced and the commit https://github.com/apache/hudi/commit/ca77fda51fe3036f86d4ddb8b0e58a2f160882dc was merged without rebasing. So, the master is broken. ### Impact Fix test and build on master. ### Risk level (write none, low medium or high below) none ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none"._ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7643) Fix TestStreamSyncUnitTests
[ https://issues.apache.org/jira/browse/HUDI-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7643: - Labels: pull-request-available (was: ) > Fix TestStreamSyncUnitTests > --- > > Key: HUDI-7643 > URL: https://issues.apache.org/jira/browse/HUDI-7643 > Project: Apache Hudi > Issue Type: Task >Reporter: Sagar Sumit >Priority: Major > Labels: pull-request-available > > Use the right StreamSync constructor > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (HUDI-7643) Fix TestStreamSyncUnitTests
Sagar Sumit created HUDI-7643: - Summary: Fix TestStreamSyncUnitTests Key: HUDI-7643 URL: https://issues.apache.org/jira/browse/HUDI-7643 Project: Apache Hudi Issue Type: Task Reporter: Sagar Sumit Use the right StreamSync constructor -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7629] Safely rename HoodieFileStatus [hudi]
hudi-bot commented on PR #11055: URL: https://github.com/apache/hudi/pull/11055#issuecomment-2066178297 ## CI report: * 8957421b837bb5471701724e47ae908e0c0655fb Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23365) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7628] Rename FSUtils.getPartitionPath to constructAbsolutePath [hudi]
hudi-bot commented on PR #11054: URL: https://github.com/apache/hudi/pull/11054#issuecomment-2066178227 ## CI report: * 0d5781211cbe9977838db3ee7134bc473b6110aa Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23363) * 0c8df7c7d066c2115e4a04dabb10fd14a54e40d2 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23364) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7628] Rename FSUtils.getPartitionPath to constructAbsolutePath [hudi]
hudi-bot commented on PR #11054: URL: https://github.com/apache/hudi/pull/11054#issuecomment-2066165313 ## CI report: * 0d5781211cbe9977838db3ee7134bc473b6110aa Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23363) * 0c8df7c7d066c2115e4a04dabb10fd14a54e40d2 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7629] Safely rename HoodieFileStatus [hudi]
hudi-bot commented on PR #11055: URL: https://github.com/apache/hudi/pull/11055#issuecomment-2066165390 ## CI report: * 8957421b837bb5471701724e47ae908e0c0655fb UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
danny0405 commented on code in PR #10886: URL: https://github.com/apache/hudi/pull/10886#discussion_r1572070120 ## hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java: ## @@ -94,36 +96,32 @@ public int getPartitionDepth() { /** * Write the metadata safely into partition atomically. */ - public void trySave(int taskPartitionId) { + public void trySave() throws HoodieIOException { String extension = getMetafileExtension(); -StoragePath tmpMetaPath = -new StoragePath(partitionPath, HoodiePartitionMetadata.HOODIE_PARTITION_METAFILE_PREFIX + "_" + taskPartitionId + extension); -StoragePath metaPath = new StoragePath(partitionPath, HoodiePartitionMetadata.HOODIE_PARTITION_METAFILE_PREFIX + extension); -boolean metafileExists = false; +StoragePath metaPath = new StoragePath( +partitionPath, HoodiePartitionMetadata.HOODIE_PARTITION_METAFILE_PREFIX + extension); -try { - metafileExists = storage.exists(metaPath); - if (!metafileExists) { -// write to temporary file -writeMetafile(tmpMetaPath); -// move to actual path -storage.rename(tmpMetaPath, metaPath); - } -} catch (IOException ioe) { - LOG.warn("Error trying to save partition metadata (this is okay, as long as at least 1 of these succeeded), " - + partitionPath, ioe); -} finally { - if (!metafileExists) { -try { - // clean up tmp file, if still lying around - if (storage.exists(tmpMetaPath)) { -storage.deleteFile(tmpMetaPath); +// This retry mechanism enables an exit-fast in metaPath exists check, which avoid the +// tasks failures when there are two or more tasks trying to create the same metaPath. +RetryHelper retryHelper = new RetryHelper(1000, 3, 1000, HoodieIOException.class.getName()) +.tryWith(() -> { + if (!storage.exists(metaPath)) { +if (format.isPresent()) { + StoragePath tmpMetaPath = new StoragePath( Review Comment: We can move the `tmpMetaPath` into `writeMetafileInFormat`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7629) Safely rename HoodieFileStatus
[ https://issues.apache.org/jira/browse/HUDI-7629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7629: - Labels: hoodie-storage pull-request-available (was: hoodie-storage) > Safely rename HoodieFileStatus > -- > > Key: HUDI-7629 > URL: https://issues.apache.org/jira/browse/HUDI-7629 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Vova Kolmakov >Priority: Major > Labels: hoodie-storage, pull-request-available > Fix For: 1.0.0 > > > [https://github.com/apache/hudi/pull/10591#discussion_r1484912753] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [HUDI-7629] Safely rename HoodieFileStatus [hudi]
wombatu-kun opened a new pull request, #11055: URL: https://github.com/apache/hudi/pull/11055 ### Change Logs Renamed `HoodieFileStatus` to `StorageLocationInfo`: https://github.com/apache/hudi/pull/10591#discussion_r1484912753 ### Impact none ### Risk level (write none, low medium or high below) none ### Documentation Update none - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (HUDI-7642) Compact MOR tables with operation fields cause data errors
Zeyu Wang created HUDI-7642: --- Summary: Compact MOR tables with operation fields cause data errors Key: HUDI-7642 URL: https://issues.apache.org/jira/browse/HUDI-7642 Project: Apache Hudi Issue Type: Bug Reporter: Zeyu Wang When we compact an MOR table who with _hoodie_operation field, the hoodiekey tagged with operation "-D" was not correctly removed. Refer to previous discussions (https://github.com/apache/hudi/pull/8721#issuecomment-1736629662) we should keep flink engine for the delete record, And also repair the spark in the https://github.com/apache/hudi/pull/10219 engine problems when reading data, should repair caused by compact problem now. Because of the 'compact' directly using the HoodieMergedLogRecordScanner in the common module, I think we have to add some optional configuration to control whether or not the HoodieMergedLogRecordScanner directly delete the key that taged with "-D" operation -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] [HUDI-7628] Rename FSUtils.getPartitionPath to constructAbsolutePath [hudi]
hudi-bot commented on PR #11054: URL: https://github.com/apache/hudi/pull/11054#issuecomment-2066077903 ## CI report: * 0d5781211cbe9977838db3ee7134bc473b6110aa Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23363) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
hudi-bot commented on PR #10886: URL: https://github.com/apache/hudi/pull/10886#issuecomment-2066077325 ## CI report: * 7b04755aa308766f3b0f0d5292ed9476630da90d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23357) * af5d107b867fd97362710bc032a95743eb5d33a8 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23362) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7628] Rename FSUtils.getPartitionPath to constructAbsolutePath [hudi]
hudi-bot commented on PR #11054: URL: https://github.com/apache/hudi/pull/11054#issuecomment-2066065953 ## CI report: * 0d5781211cbe9977838db3ee7134bc473b6110aa UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
hudi-bot commented on PR #10886: URL: https://github.com/apache/hudi/pull/10886#issuecomment-2066065344 ## CI report: * 7b04755aa308766f3b0f0d5292ed9476630da90d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23357) * af5d107b867fd97362710bc032a95743eb5d33a8 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7629) Safely rename HoodieFileStatus
[ https://issues.apache.org/jira/browse/HUDI-7629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vova Kolmakov updated HUDI-7629: Status: In Progress (was: Open) > Safely rename HoodieFileStatus > -- > > Key: HUDI-7629 > URL: https://issues.apache.org/jira/browse/HUDI-7629 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Vova Kolmakov >Priority: Major > Labels: hoodie-storage > Fix For: 1.0.0 > > > [https://github.com/apache/hudi/pull/10591#discussion_r1484912753] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (HUDI-7628) Rename FSUtils.getPartitionPath to constructAbsolutePath
[ https://issues.apache.org/jira/browse/HUDI-7628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HUDI-7628: - Labels: hoodie-storage pull-request-available (was: hoodie-storage) > Rename FSUtils.getPartitionPath to constructAbsolutePath > > > Key: HUDI-7628 > URL: https://issues.apache.org/jira/browse/HUDI-7628 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Vova Kolmakov >Priority: Major > Labels: hoodie-storage, pull-request-available > Fix For: 1.0.0 > > > [https://github.com/apache/hudi/pull/10591#discussion_r1483632718] > Rename FSUtils.getPartitionPath to constructAbsolutePath and partitionPath > argument to relativePartitionPath so that the naming reflects the > functionality. This has to be merged after HUDI-6497 and the above PR to > reduce merging conflicts. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] [HUDI-7628] Rename FSUtils.getPartitionPath to constructAbsolutePath [hudi]
wombatu-kun opened a new pull request, #11054: URL: https://github.com/apache/hudi/pull/11054 ### Change Logs https://github.com/apache/hudi/pull/10591#discussion_r1483632718 Rename FSUtils.getPartitionPath to constructAbsolutePath and partitionPath argument to relativePartitionPath so that the naming reflects the functionality. ### Impact none ### Risk level (write none, low medium or high below) none ### Documentation Update none - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (HUDI-7628) Rename FSUtils.getPartitionPath to constructAbsolutePath
[ https://issues.apache.org/jira/browse/HUDI-7628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vova Kolmakov updated HUDI-7628: Status: In Progress (was: Open) > Rename FSUtils.getPartitionPath to constructAbsolutePath > > > Key: HUDI-7628 > URL: https://issues.apache.org/jira/browse/HUDI-7628 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Vova Kolmakov >Priority: Major > Labels: hoodie-storage > Fix For: 1.0.0 > > > [https://github.com/apache/hudi/pull/10591#discussion_r1483632718] > Rename FSUtils.getPartitionPath to constructAbsolutePath and partitionPath > argument to relativePartitionPath so that the naming reflects the > functionality. This has to be merged after HUDI-6497 and the above PR to > reduce merging conflicts. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-7632) Remove FileSystem usage in HoodieLogFormatWriter
[ https://issues.apache.org/jira/browse/HUDI-7632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vova Kolmakov reassigned HUDI-7632: --- Assignee: Vova Kolmakov > Remove FileSystem usage in HoodieLogFormatWriter > > > Key: HUDI-7632 > URL: https://issues.apache.org/jira/browse/HUDI-7632 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Vova Kolmakov >Priority: Major > Labels: hoodie-storage > Fix For: 1.0.0 > > > https://github.com/apache/hudi/pull/10591#discussion_r1569173014 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-7631) Clean up usage of `CachingPath` outside hudi-common module
[ https://issues.apache.org/jira/browse/HUDI-7631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vova Kolmakov reassigned HUDI-7631: --- Assignee: Vova Kolmakov > Clean up usage of `CachingPath` outside hudi-common module > -- > > Key: HUDI-7631 > URL: https://issues.apache.org/jira/browse/HUDI-7631 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Vova Kolmakov >Priority: Major > Labels: hoodie-storage > Fix For: 1.0.0 > > > https://github.com/apache/hudi/pull/10591#discussion_r1484923458 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-7628) Rename FSUtils.getPartitionPath to constructAbsolutePath
[ https://issues.apache.org/jira/browse/HUDI-7628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vova Kolmakov reassigned HUDI-7628: --- Assignee: Vova Kolmakov > Rename FSUtils.getPartitionPath to constructAbsolutePath > > > Key: HUDI-7628 > URL: https://issues.apache.org/jira/browse/HUDI-7628 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Vova Kolmakov >Priority: Major > Labels: hoodie-storage > Fix For: 1.0.0 > > > [https://github.com/apache/hudi/pull/10591#discussion_r1483632718] > Rename FSUtils.getPartitionPath to constructAbsolutePath and partitionPath > argument to relativePartitionPath so that the naming reflects the > functionality. This has to be merged after HUDI-6497 and the above PR to > reduce merging conflicts. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-7630) Create a separate StorageUtils for hadoop-free util method
[ https://issues.apache.org/jira/browse/HUDI-7630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vova Kolmakov reassigned HUDI-7630: --- Assignee: Vova Kolmakov > Create a separate StorageUtils for hadoop-free util method > -- > > Key: HUDI-7630 > URL: https://issues.apache.org/jira/browse/HUDI-7630 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Vova Kolmakov >Priority: Major > Labels: hoodie-storage > Fix For: 1.0.0 > > > https://github.com/apache/hudi/pull/10591#discussion_r1484920647 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (HUDI-7629) Safely rename HoodieFileStatus
[ https://issues.apache.org/jira/browse/HUDI-7629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vova Kolmakov reassigned HUDI-7629: --- Assignee: Vova Kolmakov > Safely rename HoodieFileStatus > -- > > Key: HUDI-7629 > URL: https://issues.apache.org/jira/browse/HUDI-7629 > Project: Apache Hudi > Issue Type: Improvement >Reporter: Ethan Guo >Assignee: Vova Kolmakov >Priority: Major > Labels: hoodie-storage > Fix For: 1.0.0 > > > [https://github.com/apache/hudi/pull/10591#discussion_r1484912753] -- This message was sent by Atlassian Jira (v8.20.10#820010)
(hudi) branch master updated: [HUDI-7618] Add ability to ignore checkpoints in delta streamer (#11018)
This is an automated email from the ASF dual-hosted git repository. sivabalan pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new ca77fda51fe [HUDI-7618] Add ability to ignore checkpoints in delta streamer (#11018) ca77fda51fe is described below commit ca77fda51fe3036f86d4ddb8b0e58a2f160882dc Author: Sampan S Nayak AuthorDate: Fri Apr 19 11:55:43 2024 +0530 [HUDI-7618] Add ability to ignore checkpoints in delta streamer (#11018) --- .../hudi/utilities/streamer/HoodieStreamer.java| 7 +++ .../apache/hudi/utilities/streamer/StreamSync.java | 13 - .../streamer/TestStreamSyncUnitTests.java | 61 ++ 3 files changed, 79 insertions(+), 2 deletions(-) diff --git a/hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/HoodieStreamer.java b/hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/HoodieStreamer.java index 59c1bf3d164..0dd488bffcb 100644 --- a/hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/HoodieStreamer.java +++ b/hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/HoodieStreamer.java @@ -428,6 +428,13 @@ public class HoodieStreamer implements Serializable { @Parameter(names = {"--config-hot-update-strategy-class"}, description = "Configuration hot update in continuous mode") public String configHotUpdateStrategyClass = ""; +@Parameter(names = {"--ignore-checkpoint"}, description = "Set this config with a unique value, recommend using a timestamp value or UUID." ++ " Setting this config indicates that the subsequent sync should ignore the last committed checkpoint for the source. The config value is stored" ++ " in the commit history, so setting the config with same values would not have any affect. This config can be used in scenarios like kafka topic change," ++ " where we would want to start ingesting from the latest or earliest offset after switching the topic (in this case we would want to ignore the previously" ++ " committed checkpoint, and rely on other configs to pick the starting offsets).") +public String ignoreCheckpoint = null; + public boolean isAsyncCompactionEnabled() { return continuousMode && !forceDisableCompaction && HoodieTableType.MERGE_ON_READ.equals(HoodieTableType.valueOf(tableType)); diff --git a/hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/StreamSync.java b/hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/StreamSync.java index c9521058b12..2f5bd1fd3ff 100644 --- a/hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/StreamSync.java +++ b/hudi-utilities/src/main/java/org/apache/hudi/utilities/streamer/StreamSync.java @@ -164,6 +164,7 @@ public class StreamSync implements Serializable, Closeable { private static final long serialVersionUID = 1L; private static final Logger LOG = LoggerFactory.getLogger(StreamSync.class); private static final String NULL_PLACEHOLDER = "[null]"; + public static final String CHECKPOINT_IGNORE_KEY = "deltastreamer.checkpoint.ignore_key"; /** * Delta Sync Config. @@ -733,7 +734,8 @@ public class StreamSync implements Serializable, Closeable { * @return the checkpoint to resume from if applicable. * @throws IOException */ - private Option getCheckpointToResume(Option commitsTimelineOpt) throws IOException { + @VisibleForTesting + Option getCheckpointToResume(Option commitsTimelineOpt) throws IOException { Option resumeCheckpointStr = Option.empty(); // try get checkpoint from commits(including commit and deltacommit) // in COW migrating to MOR case, the first batch of the deltastreamer will lost the checkpoint from COW table, cause the dataloss @@ -750,7 +752,11 @@ public class StreamSync implements Serializable, Closeable { if (commitMetadataOption.isPresent()) { HoodieCommitMetadata commitMetadata = commitMetadataOption.get(); LOG.debug("Checkpoint reset from metadata: " + commitMetadata.getMetadata(CHECKPOINT_RESET_KEY)); -if (cfg.checkpoint != null && (StringUtils.isNullOrEmpty(commitMetadata.getMetadata(CHECKPOINT_RESET_KEY)) +if (cfg.ignoreCheckpoint != null && (StringUtils.isNullOrEmpty(commitMetadata.getMetadata(CHECKPOINT_IGNORE_KEY)) +|| !cfg.ignoreCheckpoint.equals(commitMetadata.getMetadata(CHECKPOINT_IGNORE_KEY { + // we ignore any existing checkpoint and start ingesting afresh + resumeCheckpointStr = Option.empty(); +} else if (cfg.checkpoint != null && (StringUtils.isNullOrEmpty(commitMetadata.getMetadata(CHECKPOINT_RESET_KEY)) || !cfg.checkpoint.equals(commitMetadata.getMetadata(CHECKPOINT_RESET_KEY { resumeCheckpointStr = Option.of(cfg.checkpoint); } else if
Re: [PR] [HUDI-7618] Add ability to ignore checkpoints in delta streamer [hudi]
nsivabalan merged PR #11018: URL: https://github.com/apache/hudi/pull/11018 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7640] Uses UUID as temporary file suffix for HoodieStorage.createImmutableFileInPath [hudi]
boneanxs merged PR #11052: URL: https://github.com/apache/hudi/pull/11052 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch master updated: [HUDI-7640] Uses UUID as temporary file suffix for HoodieStorage.createImmutableFileInPath (#11052)
This is an automated email from the ASF dual-hosted git repository. rexan pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/master by this push: new caa1bef75c3 [HUDI-7640] Uses UUID as temporary file suffix for HoodieStorage.createImmutableFileInPath (#11052) caa1bef75c3 is described below commit caa1bef75c3e21b7443e192375a068c328cd6f81 Author: Danny Chan AuthorDate: Fri Apr 19 14:07:47 2024 +0800 [HUDI-7640] Uses UUID as temporary file suffix for HoodieStorage.createImmutableFileInPath (#11052) --- .../src/main/java/org/apache/hudi/storage/HoodieStorage.java | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/hudi-io/src/main/java/org/apache/hudi/storage/HoodieStorage.java b/hudi-io/src/main/java/org/apache/hudi/storage/HoodieStorage.java index adf9371c243..be160caba3b 100644 --- a/hudi-io/src/main/java/org/apache/hudi/storage/HoodieStorage.java +++ b/hudi-io/src/main/java/org/apache/hudi/storage/HoodieStorage.java @@ -37,6 +37,7 @@ import java.io.OutputStream; import java.net.URI; import java.util.ArrayList; import java.util.List; +import java.util.UUID; /** * Provides I/O APIs on files and directories on storage. @@ -45,7 +46,6 @@ import java.util.List; @PublicAPIClass(maturity = ApiMaturityLevel.EVOLVING) public abstract class HoodieStorage implements Closeable { public static final Logger LOG = LoggerFactory.getLogger(HoodieStorage.class); - public static final String TMP_PATH_POSTFIX = ".tmp"; /** * @return the scheme of the storage. @@ -249,8 +249,11 @@ public abstract class HoodieStorage implements Closeable { * empty, will first write the content to a temp file if {needCreateTempFile} is * true, and then rename it back after the content is written. * - * @param pathfile path. - * @param content content to be stored. + * CAUTION: if this method is invoked in multi-threads for concurrent write of the same file, + * an existence check of the file is recommended. + * + * @param pathFile path. + * @param content Content to be stored. */ @PublicAPIMethod(maturity = ApiMaturityLevel.EVOLVING) public final void createImmutableFileInPath(StoragePath path, @@ -267,7 +270,7 @@ public abstract class HoodieStorage implements Closeable { if (content.isPresent() && needTempFile) { StoragePath parent = path.getParent(); -tmpPath = new StoragePath(parent, path.getName() + TMP_PATH_POSTFIX); +tmpPath = new StoragePath(parent, path.getName() + "." + UUID.randomUUID()); fsout = create(tmpPath, false); fsout.write(content.get()); }
Re: [PR] [HUDI-7640] Uses UUID as temporary file suffix for HoodieStorage.createImmutableFileInPath [hudi]
boneanxs commented on code in PR #11052: URL: https://github.com/apache/hudi/pull/11052#discussion_r1571861739 ## hudi-io/src/main/java/org/apache/hudi/storage/HoodieStorage.java: ## @@ -267,7 +270,7 @@ public final void createImmutableFileInPath(StoragePath path, if (content.isPresent() && needTempFile) { StoragePath parent = path.getParent(); -tmpPath = new StoragePath(parent, path.getName() + TMP_PATH_POSTFIX); +tmpPath = new StoragePath(parent, path.getName() + "." + UUID.randomUUID()); Review Comment: I see, make sense -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org