Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-19 Thread via GitHub


danny0405 merged PR #10886:
URL: https://github.com/apache/hudi/pull/10886


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-19 Thread via GitHub


hudi-bot commented on PR #10886:
URL: https://github.com/apache/hudi/pull/10886#issuecomment-2066536128

   
   ## CI report:
   
   * cc4c48076b11d9a97fdb7fd0f6f0a5253d530ff1 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23368)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-19 Thread via GitHub


hudi-bot commented on PR #10886:
URL: https://github.com/apache/hudi/pull/10886#issuecomment-2066365873

   
   ## CI report:
   
   * af5d107b867fd97362710bc032a95743eb5d33a8 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23362)
 
   * cc4c48076b11d9a97fdb7fd0f6f0a5253d530ff1 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23368)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-19 Thread via GitHub


hudi-bot commented on PR #10886:
URL: https://github.com/apache/hudi/pull/10886#issuecomment-2066355763

   
   ## CI report:
   
   * af5d107b867fd97362710bc032a95743eb5d33a8 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23362)
 
   * cc4c48076b11d9a97fdb7fd0f6f0a5253d530ff1 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-19 Thread via GitHub


danny0405 commented on PR #10886:
URL: https://github.com/apache/hudi/pull/10886#issuecomment-2066283997

   You can rebase with the latest master now to resolve the compile error.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-19 Thread via GitHub


wecharyu commented on code in PR #10886:
URL: https://github.com/apache/hudi/pull/10886#discussion_r1572163511


##
hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java:
##
@@ -94,36 +96,32 @@ public int getPartitionDepth() {
   /**
* Write the metadata safely into partition atomically.
*/
-  public void trySave(int taskPartitionId) {
+  public void trySave() throws HoodieIOException {
 String extension = getMetafileExtension();
-StoragePath tmpMetaPath =
-new StoragePath(partitionPath, 
HoodiePartitionMetadata.HOODIE_PARTITION_METAFILE_PREFIX + "_" + 
taskPartitionId + extension);
-StoragePath metaPath = new StoragePath(partitionPath, 
HoodiePartitionMetadata.HOODIE_PARTITION_METAFILE_PREFIX + extension);
-boolean metafileExists = false;
+StoragePath metaPath = new StoragePath(
+partitionPath, 
HoodiePartitionMetadata.HOODIE_PARTITION_METAFILE_PREFIX + extension);
 
-try {
-  metafileExists = storage.exists(metaPath);
-  if (!metafileExists) {
-// write to temporary file
-writeMetafile(tmpMetaPath);
-// move to actual path
-storage.rename(tmpMetaPath, metaPath);
-  }
-} catch (IOException ioe) {
-  LOG.warn("Error trying to save partition metadata (this is okay, as long 
as at least 1 of these succeeded), "
-  + partitionPath, ioe);
-} finally {
-  if (!metafileExists) {
-try {
-  // clean up tmp file, if still lying around
-  if (storage.exists(tmpMetaPath)) {
-storage.deleteFile(tmpMetaPath);
+// This retry mechanism enables an exit-fast in metaPath exists check, 
which avoid the
+// tasks failures when there are two or more tasks trying to create the 
same metaPath.
+RetryHelper  retryHelper = new RetryHelper(1000, 
3, 1000, HoodieIOException.class.getName())
+.tryWith(() -> {
+  if (!storage.exists(metaPath)) {
+if (format.isPresent()) {
+  StoragePath tmpMetaPath = new StoragePath(

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-19 Thread via GitHub


danny0405 commented on PR #10886:
URL: https://github.com/apache/hudi/pull/10886#issuecomment-2066274001

   The master build is broken and here is the fix: 
https://github.com/apache/hudi/pull/11056, you may need to await for this patch 
and rebase with the latest master again~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-19 Thread via GitHub


hudi-bot commented on PR #10886:
URL: https://github.com/apache/hudi/pull/10886#issuecomment-2066250342

   
   ## CI report:
   
   * af5d107b867fd97362710bc032a95743eb5d33a8 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23362)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-19 Thread via GitHub


danny0405 commented on code in PR #10886:
URL: https://github.com/apache/hudi/pull/10886#discussion_r1572070120


##
hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java:
##
@@ -94,36 +96,32 @@ public int getPartitionDepth() {
   /**
* Write the metadata safely into partition atomically.
*/
-  public void trySave(int taskPartitionId) {
+  public void trySave() throws HoodieIOException {
 String extension = getMetafileExtension();
-StoragePath tmpMetaPath =
-new StoragePath(partitionPath, 
HoodiePartitionMetadata.HOODIE_PARTITION_METAFILE_PREFIX + "_" + 
taskPartitionId + extension);
-StoragePath metaPath = new StoragePath(partitionPath, 
HoodiePartitionMetadata.HOODIE_PARTITION_METAFILE_PREFIX + extension);
-boolean metafileExists = false;
+StoragePath metaPath = new StoragePath(
+partitionPath, 
HoodiePartitionMetadata.HOODIE_PARTITION_METAFILE_PREFIX + extension);
 
-try {
-  metafileExists = storage.exists(metaPath);
-  if (!metafileExists) {
-// write to temporary file
-writeMetafile(tmpMetaPath);
-// move to actual path
-storage.rename(tmpMetaPath, metaPath);
-  }
-} catch (IOException ioe) {
-  LOG.warn("Error trying to save partition metadata (this is okay, as long 
as at least 1 of these succeeded), "
-  + partitionPath, ioe);
-} finally {
-  if (!metafileExists) {
-try {
-  // clean up tmp file, if still lying around
-  if (storage.exists(tmpMetaPath)) {
-storage.deleteFile(tmpMetaPath);
+// This retry mechanism enables an exit-fast in metaPath exists check, 
which avoid the
+// tasks failures when there are two or more tasks trying to create the 
same metaPath.
+RetryHelper  retryHelper = new RetryHelper(1000, 
3, 1000, HoodieIOException.class.getName())
+.tryWith(() -> {
+  if (!storage.exists(metaPath)) {
+if (format.isPresent()) {
+  StoragePath tmpMetaPath = new StoragePath(

Review Comment:
   We can move the `tmpMetaPath` into `writeMetafileInFormat`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-19 Thread via GitHub


hudi-bot commented on PR #10886:
URL: https://github.com/apache/hudi/pull/10886#issuecomment-2066077325

   
   ## CI report:
   
   * 7b04755aa308766f3b0f0d5292ed9476630da90d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23357)
 
   * af5d107b867fd97362710bc032a95743eb5d33a8 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23362)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-19 Thread via GitHub


hudi-bot commented on PR #10886:
URL: https://github.com/apache/hudi/pull/10886#issuecomment-2066065344

   
   ## CI report:
   
   * 7b04755aa308766f3b0f0d5292ed9476630da90d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23357)
 
   * af5d107b867fd97362710bc032a95743eb5d33a8 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-18 Thread via GitHub


hudi-bot commented on PR #10886:
URL: https://github.com/apache/hudi/pull/10886#issuecomment-2065710966

   
   ## CI report:
   
   * 7b04755aa308766f3b0f0d5292ed9476630da90d Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23357)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-18 Thread via GitHub


hudi-bot commented on PR #10886:
URL: https://github.com/apache/hudi/pull/10886#issuecomment-2065675635

   
   ## CI report:
   
   * 522a68cb3ea8dc725418eb9b811a03b5c86c694b Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23343)
 
   * 7b04755aa308766f3b0f0d5292ed9476630da90d Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23357)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-18 Thread via GitHub


hudi-bot commented on PR #10886:
URL: https://github.com/apache/hudi/pull/10886#issuecomment-2065670991

   
   ## CI report:
   
   * 522a68cb3ea8dc725418eb9b811a03b5c86c694b Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23343)
 
   * 7b04755aa308766f3b0f0d5292ed9476630da90d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-18 Thread via GitHub


wecharyu commented on code in PR #10886:
URL: https://github.com/apache/hudi/pull/10886#discussion_r1571697843


##
hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java:
##
@@ -92,37 +94,33 @@ public int getPartitionDepth() {
 
   /**
* Write the metadata safely into partition atomically.
+   * To avoid concurrent write into the same partition (for example in 
speculative case),
+   * please make sure writeToken is unique.
*/
-  public void trySave(int taskPartitionId) {
+  public void trySave(String writeToken) throws IOException {
 String extension = getMetafileExtension();
-Path tmpMetaPath =
-new Path(partitionPath, 
HoodiePartitionMetadata.HOODIE_PARTITION_METAFILE_PREFIX + "_" + 
taskPartitionId + extension);
 Path metaPath = new Path(partitionPath, 
HoodiePartitionMetadata.HOODIE_PARTITION_METAFILE_PREFIX + extension);
-boolean metafileExists = false;
 
-try {
-  metafileExists = fs.exists(metaPath);
-  if (!metafileExists) {
-// write to temporary file
-writeMetafile(tmpMetaPath);
-// move to actual path
-fs.rename(tmpMetaPath, metaPath);
-  }
-} catch (IOException ioe) {
-  LOG.warn("Error trying to save partition metadata (this is okay, as long 
as at least 1 of these succeeded), "
-  + partitionPath, ioe);
-} finally {
-  if (!metafileExists) {
-try {
-  // clean up tmp file, if still lying around
-  if (fs.exists(tmpMetaPath)) {
-fs.delete(tmpMetaPath, false);
+// This retry mechanism enables an exit-fast in metaPath exists check, 
which avoid the
+// tasks failures when there are two or more tasks trying to create the 
same metaPath.
+RetryHelper  retryHelper = new RetryHelper(1000, 3, 
1000, IOException.class.getName())
+.tryWith(() -> {
+  if (!fs.exists(metaPath)) {
+if (format.isPresent()) {
+  Path tmpMetaPath = new Path(partitionPath, 
HoodiePartitionMetadata.HOODIE_PARTITION_METAFILE_PREFIX + "_" + writeToken + 
extension);
+  writeMetafileInFormat(metaPath, tmpMetaPath, format.get());
+} else {
+  // Backwards compatible properties file format
+  try (ByteArrayOutputStream os = new ByteArrayOutputStream()) {
+props.store(os, "partition metadata");
+Option content = Option.of(os.toByteArray());
+HadoopFSUtils.createImmutableFileInPath(fs, metaPath, content, 
true, "_" + writeToken);

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-18 Thread via GitHub


danny0405 commented on code in PR #10886:
URL: https://github.com/apache/hudi/pull/10886#discussion_r1571623276


##
hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java:
##
@@ -92,11 +92,12 @@ public int getPartitionDepth() {
 
   /**
* Write the metadata safely into partition atomically.
+   * To avoid concurrent write into the same partition (for example in 
speculative case),
+   * please make sure writeToken is unique.
*/
-  public void trySave(int taskPartitionId) {
+  public void trySave(String writeToken) throws IOException {

Review Comment:
   I have fied a fix for the random suffix, 
https://github.com/apache/hudi/pull/11052, with this change, I think we can get 
rid of the `writeToken` param that is passing around. We can use the 
`RetryHelper` for the file creation like this:
   
   ```java
   RetryHelper.doRetry {
 if (file does not exist) {
  storage. createImmutableFileInPath(file, content).
 }
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-18 Thread via GitHub


danny0405 commented on code in PR #10886:
URL: https://github.com/apache/hudi/pull/10886#discussion_r1571623276


##
hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java:
##
@@ -92,11 +92,12 @@ public int getPartitionDepth() {
 
   /**
* Write the metadata safely into partition atomically.
+   * To avoid concurrent write into the same partition (for example in 
speculative case),
+   * please make sure writeToken is unique.
*/
-  public void trySave(int taskPartitionId) {
+  public void trySave(String writeToken) throws IOException {

Review Comment:
   I have fied a fix for the random suffix, 
https://github.com/apache/hudi/pull/11052



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-18 Thread via GitHub


danny0405 commented on code in PR #10886:
URL: https://github.com/apache/hudi/pull/10886#discussion_r1571564604


##
hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java:
##
@@ -92,37 +94,33 @@ public int getPartitionDepth() {
 
   /**
* Write the metadata safely into partition atomically.
+   * To avoid concurrent write into the same partition (for example in 
speculative case),
+   * please make sure writeToken is unique.
*/
-  public void trySave(int taskPartitionId) {
+  public void trySave(String writeToken) throws IOException {
 String extension = getMetafileExtension();
-Path tmpMetaPath =
-new Path(partitionPath, 
HoodiePartitionMetadata.HOODIE_PARTITION_METAFILE_PREFIX + "_" + 
taskPartitionId + extension);
 Path metaPath = new Path(partitionPath, 
HoodiePartitionMetadata.HOODIE_PARTITION_METAFILE_PREFIX + extension);
-boolean metafileExists = false;
 
-try {
-  metafileExists = fs.exists(metaPath);
-  if (!metafileExists) {
-// write to temporary file
-writeMetafile(tmpMetaPath);
-// move to actual path
-fs.rename(tmpMetaPath, metaPath);
-  }
-} catch (IOException ioe) {
-  LOG.warn("Error trying to save partition metadata (this is okay, as long 
as at least 1 of these succeeded), "
-  + partitionPath, ioe);
-} finally {
-  if (!metafileExists) {
-try {
-  // clean up tmp file, if still lying around
-  if (fs.exists(tmpMetaPath)) {
-fs.delete(tmpMetaPath, false);
+// This retry mechanism enables an exit-fast in metaPath exists check, 
which avoid the
+// tasks failures when there are two or more tasks trying to create the 
same metaPath.
+RetryHelper  retryHelper = new RetryHelper(1000, 3, 
1000, IOException.class.getName())
+.tryWith(() -> {
+  if (!fs.exists(metaPath)) {
+if (format.isPresent()) {
+  Path tmpMetaPath = new Path(partitionPath, 
HoodiePartitionMetadata.HOODIE_PARTITION_METAFILE_PREFIX + "_" + writeToken + 
extension);
+  writeMetafileInFormat(metaPath, tmpMetaPath, format.get());
+} else {
+  // Backwards compatible properties file format
+  try (ByteArrayOutputStream os = new ByteArrayOutputStream()) {
+props.store(os, "partition metadata");
+Option content = Option.of(os.toByteArray());
+HadoopFSUtils.createImmutableFileInPath(fs, metaPath, content, 
true, "_" + writeToken);

Review Comment:
   Can we eliminate the writeToken and just use the 
`HadoopFSUtils.createImmutableFileInPath` to write the files directly? You may 
need to refactorer the method `HadoopFSUtils.createImmutableFileInPath` a 
little to use UUID as the temporal suffix too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-18 Thread via GitHub


hudi-bot commented on PR #10886:
URL: https://github.com/apache/hudi/pull/10886#issuecomment-2063823692

   
   ## CI report:
   
   * 522a68cb3ea8dc725418eb9b811a03b5c86c694b Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23343)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-18 Thread via GitHub


hudi-bot commented on PR #10886:
URL: https://github.com/apache/hudi/pull/10886#issuecomment-2063621252

   
   ## CI report:
   
   * aadcb616ac338ef60c5799414bef660a19135c06 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22957)
 
   * 522a68cb3ea8dc725418eb9b811a03b5c86c694b Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23343)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-18 Thread via GitHub


hudi-bot commented on PR #10886:
URL: https://github.com/apache/hudi/pull/10886#issuecomment-2063608614

   
   ## CI report:
   
   * aadcb616ac338ef60c5799414bef660a19135c06 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22957)
 
   * 522a68cb3ea8dc725418eb9b811a03b5c86c694b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-18 Thread via GitHub


wecharyu commented on code in PR #10886:
URL: https://github.com/apache/hudi/pull/10886#discussion_r1570475426


##
hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java:
##
@@ -92,11 +92,12 @@ public int getPartitionDepth() {
 
   /**
* Write the metadata safely into partition atomically.
+   * To avoid concurrent write into the same partition (for example in 
speculative case),
+   * please make sure writeToken is unique.
*/
-  public void trySave(int taskPartitionId) {
+  public void trySave(String writeToken) throws IOException {

Review Comment:
   So sorry that I have pushed the patch to another repository and did not 
realize it :'(. 
   I have added the retry logic, please help take a look again. @beyond1920  
@boneanxs @danny0405 @Tartarus0zm 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-18 Thread via GitHub


danny0405 commented on code in PR #10886:
URL: https://github.com/apache/hudi/pull/10886#discussion_r1570413829


##
hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java:
##
@@ -92,11 +92,12 @@ public int getPartitionDepth() {
 
   /**
* Write the metadata safely into partition atomically.
+   * To avoid concurrent write into the same partition (for example in 
speculative case),
+   * please make sure writeToken is unique.
*/
-  public void trySave(int taskPartitionId) {
+  public void trySave(String writeToken) throws IOException {

Review Comment:
   ` RetryHelper` should be suitable to use here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-18 Thread via GitHub


Tartarus0zm commented on code in PR #10886:
URL: https://github.com/apache/hudi/pull/10886#discussion_r1570153651


##
hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java:
##
@@ -92,11 +92,12 @@ public int getPartitionDepth() {
 
   /**
* Write the metadata safely into partition atomically.
+   * To avoid concurrent write into the same partition (for example in 
speculative case),
+   * please make sure writeToken is unique.
*/
-  public void trySave(int taskPartitionId) {
+  public void trySave(String writeToken) throws IOException {

Review Comment:
   @danny0405 @boneanxs  Using `RetryHelper` means using 
`HoodieRetryWrapperFileSystem` directly in HoodiePartitionMetadata ?
   Or only this code need using `RetryHelper `?
   ```
   metafileExists = fs.exists(metaPath);
   if (!metafileExists) {
 // write to temporary file
 writeMetafile(tmpMetaPath);
 // move to actual path
 fs.rename(tmpMetaPath, metaPath);
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-18 Thread via GitHub


Tartarus0zm commented on code in PR #10886:
URL: https://github.com/apache/hudi/pull/10886#discussion_r1570153651


##
hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java:
##
@@ -92,11 +92,12 @@ public int getPartitionDepth() {
 
   /**
* Write the metadata safely into partition atomically.
+   * To avoid concurrent write into the same partition (for example in 
speculative case),
+   * please make sure writeToken is unique.
*/
-  public void trySave(int taskPartitionId) {
+  public void trySave(String writeToken) throws IOException {

Review Comment:
   @danny0405 @boneanxs  Using `RetryHelper` means using 
`HoodieRetryWrapperFileSystem` directly in HoodiePartitionMetadata ?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-17 Thread via GitHub


danny0405 commented on code in PR #10886:
URL: https://github.com/apache/hudi/pull/10886#discussion_r1568425096


##
hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java:
##
@@ -92,11 +92,12 @@ public int getPartitionDepth() {
 
   /**
* Write the metadata safely into partition atomically.
+   * To avoid concurrent write into the same partition (for example in 
speculative case),
+   * please make sure writeToken is unique.
*/
-  public void trySave(int taskPartitionId) {
+  public void trySave(String writeToken) throws IOException {

Review Comment:
   Yeah, the `HoodieWrapperFileSystem#createImmutableFileInPath` can actually 
ensure atomicity. Can you make the changes and have a real production test 
though, so that we are convinced the behavior is expected.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-17 Thread via GitHub


danny0405 commented on code in PR #10886:
URL: https://github.com/apache/hudi/pull/10886#discussion_r1568425096


##
hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java:
##
@@ -92,11 +92,12 @@ public int getPartitionDepth() {
 
   /**
* Write the metadata safely into partition atomically.
+   * To avoid concurrent write into the same partition (for example in 
speculative case),
+   * please make sure writeToken is unique.
*/
-  public void trySave(int taskPartitionId) {
+  public void trySave(String writeToken) throws IOException {

Review Comment:
   Yeah, the `HoodieWrapperFileSystem#createImmutableFileInPath` can actually 
ensure atomicity.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-17 Thread via GitHub


boneanxs commented on code in PR #10886:
URL: https://github.com/apache/hudi/pull/10886#discussion_r1568323048


##
hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java:
##
@@ -92,11 +92,12 @@ public int getPartitionDepth() {
 
   /**
* Write the metadata safely into partition atomically.
+   * To avoid concurrent write into the same partition (for example in 
speculative case),
+   * please make sure writeToken is unique.
*/
-  public void trySave(int taskPartitionId) {
+  public void trySave(String writeToken) throws IOException {

Review Comment:
   I see, we don't need UUID, after using `RetryHelper`, this issue should be 
fixed:
   
   ```java
   metafileExists = fs.exists(metaPath);
 if (!metafileExists) {
   // write to temporary file
   writeMetafile(tmpMetaPath);
   // move to actual path
   fs.rename(tmpMetaPath, metaPath);
 }
   ```
   If 2 threads concurrently creating tmpMetaPath, after one thread renamed 
succeed, and another one failed because of tmpMetaPath missing, it will retry 
and return directly since `metafileExists` becomes true



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-16 Thread via GitHub


danny0405 commented on code in PR #10886:
URL: https://github.com/apache/hudi/pull/10886#discussion_r1568071623


##
hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java:
##
@@ -92,11 +92,12 @@ public int getPartitionDepth() {
 
   /**
* Write the metadata safely into partition atomically.
+   * To avoid concurrent write into the same partition (for example in 
speculative case),
+   * please make sure writeToken is unique.
*/
-  public void trySave(int taskPartitionId) {
+  public void trySave(String writeToken) throws IOException {

Review Comment:
   Let's switch to UUID to simplify the code, there is no possibility for 
conflicts any more, so the gains for debugging is little, we just need to 
handle the tmp file deletion in good manner, which should be ensured by 
`HoodieWrapperFileSystem#createImmutableFileInPath`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-16 Thread via GitHub


Tartarus0zm commented on code in PR #10886:
URL: https://github.com/apache/hudi/pull/10886#discussion_r1567119140


##
hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java:
##
@@ -92,11 +92,12 @@ public int getPartitionDepth() {
 
   /**
* Write the metadata safely into partition atomically.
+   * To avoid concurrent write into the same partition (for example in 
speculative case),
+   * please make sure writeToken is unique.
*/
-  public void trySave(int taskPartitionId) {
+  public void trySave(String writeToken) throws IOException {

Review Comment:
   > Hey, @Tartarus0zm what is parallel write? 2 jobs writing to the same path?
   
   Sorry, I need to explain here.
   `parallel write` is our internal optimization strategy that uses 
multi-threading to write out hdfs files to reduce the time consuming flink 
Checkpoint.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-15 Thread via GitHub


boneanxs commented on code in PR #10886:
URL: https://github.com/apache/hudi/pull/10886#discussion_r1566657358


##
hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java:
##
@@ -92,11 +92,12 @@ public int getPartitionDepth() {
 
   /**
* Write the metadata safely into partition atomically.
+   * To avoid concurrent write into the same partition (for example in 
speculative case),
+   * please make sure writeToken is unique.
*/
-  public void trySave(int taskPartitionId) {
+  public void trySave(String writeToken) throws IOException {

Review Comment:
   Hey, @Tartarus0zm what is parallel write? 2 jobs writing to the same path?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-15 Thread via GitHub


Tartarus0zm commented on code in PR #10886:
URL: https://github.com/apache/hudi/pull/10886#discussion_r1565699528


##
hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java:
##
@@ -92,11 +92,12 @@ public int getPartitionDepth() {
 
   /**
* Write the metadata safely into partition atomically.
+   * To avoid concurrent write into the same partition (for example in 
speculative case),
+   * please make sure writeToken is unique.
*/
-  public void trySave(int taskPartitionId) {
+  public void trySave(String writeToken) throws IOException {

Review Comment:
   hi everyone, I has cherry pick this patch to our internal, then if we enable 
parallel write, there is a high probability that `tmpMetaPath` write will fail 
because it was deleted by another thread.
   Maybe UUID is a good choice.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-09 Thread via GitHub


danny0405 commented on PR #10886:
URL: https://github.com/apache/hudi/pull/10886#issuecomment-2044649201

   > > Before throw exception finally, we could retry for several times.
   > 
   > @beyond1920 It sounds a good idea here. Added a retry logic. cc: @boneanxs 
@danny0405
   
   There is a `RetryHelper` in Hudi if you wanna do that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-04-09 Thread via GitHub


wecharyu commented on PR #10886:
URL: https://github.com/apache/hudi/pull/10886#issuecomment-2044635984

   > Before throw exception finally, we could retry for several times.
   
   @beyond1920  It sounds a good idea here. Added a retry logic. cc: @boneanxs 
@danny0405 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-03-27 Thread via GitHub


beyond1920 commented on PR #10886:
URL: https://github.com/apache/hudi/pull/10886#issuecomment-2023032742

   @wecharyu 
   > We did not notice this possible failure, it may not happen frequently 
because we have a pre-checked if this metadata file exists. OTOH, I think it's 
also OK to remove this throw exception.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-03-26 Thread via GitHub


wecharyu commented on PR #10886:
URL: https://github.com/apache/hudi/pull/10886#issuecomment-2020686744

   > Would the failure probability increasing?
   
   We did not notice this possible failure, it may not happen frequently 
because we have a pre-checked if this metadata file exists. OTOH, I think it's 
also OK to remove this throw exception.
   
   >  could we move created hoodie_partition_metadata to driver or jobmaster 
instead of each task?
   
   This may cause the performance issue if there are too many partitions need 
to create in driver. We need think more on it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-03-21 Thread via GitHub


beyond1920 commented on PR #10886:
URL: https://github.com/apache/hudi/pull/10886#issuecomment-2012131723

   @wecharyu Thanks for your contribution.
   The pull request could resolve the problem, of course.
   But I am not sure whether that the solution would increase the probability 
of job failure or not.
   After this PR,  any task encountered an `IOException` would lead the task 
failure, it might cause the job failure.
   And before this PR, each task would ignore `IOException`.
   Would the failure probability increasing?
   
   And is there any other solutions?
   
   For example, could we move created `hoodie_partition_metadata` to driver or 
jobmaster instead of each task?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-03-21 Thread via GitHub


danny0405 commented on code in PR #10886:
URL: https://github.com/apache/hudi/pull/10886#discussion_r1533289294


##
hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java:
##
@@ -92,11 +92,12 @@ public int getPartitionDepth() {
 
   /**
* Write the metadata safely into partition atomically.
+   * To avoid concurrent write into the same partition (for example in 
speculative case),
+   * please make sure writeToken is unique.
*/
-  public void trySave(int taskPartitionId) {
+  public void trySave(String writeToken) throws IOException {

Review Comment:
   >  it's not easy to reuse createImmutableFileInPath() method here
   
   Needs some refactoring to the method, we can move it to FSUtils as a static 
method and make the suffix configurable.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-03-20 Thread via GitHub


beyond1920 commented on code in PR #10886:
URL: https://github.com/apache/hudi/pull/10886#discussion_r1532422677


##
hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java:
##
@@ -92,11 +92,12 @@ public int getPartitionDepth() {
 
   /**
* Write the metadata safely into partition atomically.
+   * To avoid concurrent write into the same partition (for example in 
speculative case),
+   * please make sure writeToken is unique.
*/
-  public void trySave(int taskPartitionId) {
+  public void trySave(String writeToken) throws IOException {

Review Comment:
   Both UUID and writeToken could make the metadata temp file name is unique 
for different task.
   +1 on writeToken because it contains more information to help us find the 
tasks.



##
hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java:
##
@@ -92,11 +92,12 @@ public int getPartitionDepth() {
 
   /**
* Write the metadata safely into partition atomically.
+   * To avoid concurrent write into the same partition (for example in 
speculative case),
+   * please make sure writeToken is unique.
*/
-  public void trySave(int taskPartitionId) {
+  public void trySave(String writeToken) throws IOException {

Review Comment:
   Both UUID and writeToken could make the metadata temp file name is unique 
for different tasks.
   +1 on writeToken because it contains more information to help us find the 
tasks.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-03-20 Thread via GitHub


wecharyu commented on code in PR #10886:
URL: https://github.com/apache/hudi/pull/10886#discussion_r1532367467


##
hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java:
##
@@ -92,11 +92,12 @@ public int getPartitionDepth() {
 
   /**
* Write the metadata safely into partition atomically.
+   * To avoid concurrent write into the same partition (for example in 
speculative case),
+   * please make sure writeToken is unique.
*/
-  public void trySave(int taskPartitionId) {
+  public void trySave(String writeToken) throws IOException {

Review Comment:
   It's good to reuse such write file code, but `fs` is not 
`HoodieWrapperFileSystem` in this class, it's not easy to reuse 
`createImmutableFileInPath()` method here, do we need move this method into 
`FSUtils`?
   
https://github.com/apache/hudi/blob/7c55ac35ba11ed00151bf9d536aecdb5d83af33f/hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java#L61
   
   As for writeToken and UUID, we prefer writeToken because it includes the 
task partition info which is useful in debug, pls let me know your idea.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-03-19 Thread via GitHub


danny0405 commented on code in PR #10886:
URL: https://github.com/apache/hudi/pull/10886#discussion_r1531326627


##
hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java:
##
@@ -92,11 +92,12 @@ public int getPartitionDepth() {
 
   /**
* Write the metadata safely into partition atomically.
+   * To avoid concurrent write into the same partition (for example in 
speculative case),
+   * please make sure writeToken is unique.
*/
-  public void trySave(int taskPartitionId) {
+  public void trySave(String writeToken) throws IOException {

Review Comment:
   Maybe we can utililize thee method 
`HoodieWrapperFileSystem#createImmutableFileInPath` and make the suffix a UUID.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-03-19 Thread via GitHub


wecharyu commented on code in PR #10886:
URL: https://github.com/apache/hudi/pull/10886#discussion_r1530312027


##
hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java:
##
@@ -92,11 +92,12 @@ public int getPartitionDepth() {
 
   /**
* Write the metadata safely into partition atomically.
+   * To avoid concurrent write into the same partition (for example in 
speculative case),
+   * please make sure writeToken is unique.
*/
-  public void trySave(int taskPartitionId) {
+  public void trySave(String writeToken) throws IOException {

Review Comment:
   Here we want a unique suffix for temp metadata file, and `writeToken` just 
fullfills.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-03-19 Thread via GitHub


hudi-bot commented on PR #10886:
URL: https://github.com/apache/hudi/pull/10886#issuecomment-2007036084

   
   ## CI report:
   
   * aadcb616ac338ef60c5799414bef660a19135c06 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22957)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-03-19 Thread via GitHub


danny0405 commented on code in PR #10886:
URL: https://github.com/apache/hudi/pull/10886#discussion_r1530176270


##
hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java:
##
@@ -92,11 +92,12 @@ public int getPartitionDepth() {
 
   /**
* Write the metadata safely into partition atomically.
+   * To avoid concurrent write into the same partition (for example in 
speculative case),
+   * please make sure writeToken is unique.
*/
-  public void trySave(int taskPartitionId) {
+  public void trySave(String writeToken) throws IOException {

Review Comment:
   Can it be a UUID, why we need to pass around the writeToken ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-03-19 Thread via GitHub


hudi-bot commented on PR #10886:
URL: https://github.com/apache/hudi/pull/10886#issuecomment-2006901528

   
   ## CI report:
   
   * aadcb616ac338ef60c5799414bef660a19135c06 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22957)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-03-19 Thread via GitHub


hudi-bot commented on PR #10886:
URL: https://github.com/apache/hudi/pull/10886#issuecomment-2006875566

   
   ## CI report:
   
   * aadcb616ac338ef60c5799414bef660a19135c06 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-03-19 Thread via GitHub


wecharyu commented on PR #10886:
URL: https://github.com/apache/hudi/pull/10886#issuecomment-2006795897

   For https://github.com/apache/hudi/issues/10885, cc: @beyond1920 @boneanxs 
@danny0405 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] [HUDI-7515] Fix partition metadata write failure [hudi]

2024-03-19 Thread via GitHub


wecharyu opened a new pull request, #10886:
URL: https://github.com/apache/hudi/pull/10886

   ### Change Logs
   When `spark.speculation` is enabled, if the write metadata operation become 
slow for some reason, a speculative will be started to write the same metadata 
file concurrently.
   
   In HDFS, two threads working on same piece may cause the issue:
   ```bash
   Caused by: 
org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does 
not exist: /path/to/table/a=3519/b=3520/c=3521/.hoodie_partition_metadata_112 
(inode 48415575374) Holder DFSClient_NONMAPREDUCE_-2108606624_29 does not have 
any open files.
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3239)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.analyzeFileState(FSDirWriteFileOp.java:617)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:176)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3116)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:1075)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:619)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:623)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:591)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:575)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1137)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1230)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1128)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:2005)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3354)
   ```
   Then the partition metadata file can not be created successfully.
   
   We try to fix this issue by: 
   1. Use a unique token in `HoodiePartitionMetadata#trySave()` to support 
concurrent write
   2. Throw exception when failed to write partition metadata file.
   
   ### Impact
   
   Fix the concurrently write same partition metadata issue.
   
   ### Risk level (write none, low medium or high below)
   
   Low
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change. If not, put "none"._
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
 ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
 changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org