Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
danny0405 merged PR #10886: URL: https://github.com/apache/hudi/pull/10886 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
hudi-bot commented on PR #10886: URL: https://github.com/apache/hudi/pull/10886#issuecomment-2066536128 ## CI report: * cc4c48076b11d9a97fdb7fd0f6f0a5253d530ff1 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23368) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
hudi-bot commented on PR #10886: URL: https://github.com/apache/hudi/pull/10886#issuecomment-2066365873 ## CI report: * af5d107b867fd97362710bc032a95743eb5d33a8 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23362) * cc4c48076b11d9a97fdb7fd0f6f0a5253d530ff1 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23368) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
hudi-bot commented on PR #10886: URL: https://github.com/apache/hudi/pull/10886#issuecomment-2066355763 ## CI report: * af5d107b867fd97362710bc032a95743eb5d33a8 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23362) * cc4c48076b11d9a97fdb7fd0f6f0a5253d530ff1 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
danny0405 commented on PR #10886: URL: https://github.com/apache/hudi/pull/10886#issuecomment-2066283997 You can rebase with the latest master now to resolve the compile error. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
wecharyu commented on code in PR #10886: URL: https://github.com/apache/hudi/pull/10886#discussion_r1572163511 ## hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java: ## @@ -94,36 +96,32 @@ public int getPartitionDepth() { /** * Write the metadata safely into partition atomically. */ - public void trySave(int taskPartitionId) { + public void trySave() throws HoodieIOException { String extension = getMetafileExtension(); -StoragePath tmpMetaPath = -new StoragePath(partitionPath, HoodiePartitionMetadata.HOODIE_PARTITION_METAFILE_PREFIX + "_" + taskPartitionId + extension); -StoragePath metaPath = new StoragePath(partitionPath, HoodiePartitionMetadata.HOODIE_PARTITION_METAFILE_PREFIX + extension); -boolean metafileExists = false; +StoragePath metaPath = new StoragePath( +partitionPath, HoodiePartitionMetadata.HOODIE_PARTITION_METAFILE_PREFIX + extension); -try { - metafileExists = storage.exists(metaPath); - if (!metafileExists) { -// write to temporary file -writeMetafile(tmpMetaPath); -// move to actual path -storage.rename(tmpMetaPath, metaPath); - } -} catch (IOException ioe) { - LOG.warn("Error trying to save partition metadata (this is okay, as long as at least 1 of these succeeded), " - + partitionPath, ioe); -} finally { - if (!metafileExists) { -try { - // clean up tmp file, if still lying around - if (storage.exists(tmpMetaPath)) { -storage.deleteFile(tmpMetaPath); +// This retry mechanism enables an exit-fast in metaPath exists check, which avoid the +// tasks failures when there are two or more tasks trying to create the same metaPath. +RetryHelper retryHelper = new RetryHelper(1000, 3, 1000, HoodieIOException.class.getName()) +.tryWith(() -> { + if (!storage.exists(metaPath)) { +if (format.isPresent()) { + StoragePath tmpMetaPath = new StoragePath( Review Comment: Done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
danny0405 commented on PR #10886: URL: https://github.com/apache/hudi/pull/10886#issuecomment-2066274001 The master build is broken and here is the fix: https://github.com/apache/hudi/pull/11056, you may need to await for this patch and rebase with the latest master again~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
hudi-bot commented on PR #10886: URL: https://github.com/apache/hudi/pull/10886#issuecomment-2066250342 ## CI report: * af5d107b867fd97362710bc032a95743eb5d33a8 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23362) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
danny0405 commented on code in PR #10886: URL: https://github.com/apache/hudi/pull/10886#discussion_r1572070120 ## hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java: ## @@ -94,36 +96,32 @@ public int getPartitionDepth() { /** * Write the metadata safely into partition atomically. */ - public void trySave(int taskPartitionId) { + public void trySave() throws HoodieIOException { String extension = getMetafileExtension(); -StoragePath tmpMetaPath = -new StoragePath(partitionPath, HoodiePartitionMetadata.HOODIE_PARTITION_METAFILE_PREFIX + "_" + taskPartitionId + extension); -StoragePath metaPath = new StoragePath(partitionPath, HoodiePartitionMetadata.HOODIE_PARTITION_METAFILE_PREFIX + extension); -boolean metafileExists = false; +StoragePath metaPath = new StoragePath( +partitionPath, HoodiePartitionMetadata.HOODIE_PARTITION_METAFILE_PREFIX + extension); -try { - metafileExists = storage.exists(metaPath); - if (!metafileExists) { -// write to temporary file -writeMetafile(tmpMetaPath); -// move to actual path -storage.rename(tmpMetaPath, metaPath); - } -} catch (IOException ioe) { - LOG.warn("Error trying to save partition metadata (this is okay, as long as at least 1 of these succeeded), " - + partitionPath, ioe); -} finally { - if (!metafileExists) { -try { - // clean up tmp file, if still lying around - if (storage.exists(tmpMetaPath)) { -storage.deleteFile(tmpMetaPath); +// This retry mechanism enables an exit-fast in metaPath exists check, which avoid the +// tasks failures when there are two or more tasks trying to create the same metaPath. +RetryHelper retryHelper = new RetryHelper(1000, 3, 1000, HoodieIOException.class.getName()) +.tryWith(() -> { + if (!storage.exists(metaPath)) { +if (format.isPresent()) { + StoragePath tmpMetaPath = new StoragePath( Review Comment: We can move the `tmpMetaPath` into `writeMetafileInFormat`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
hudi-bot commented on PR #10886: URL: https://github.com/apache/hudi/pull/10886#issuecomment-2066077325 ## CI report: * 7b04755aa308766f3b0f0d5292ed9476630da90d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23357) * af5d107b867fd97362710bc032a95743eb5d33a8 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23362) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
hudi-bot commented on PR #10886: URL: https://github.com/apache/hudi/pull/10886#issuecomment-2066065344 ## CI report: * 7b04755aa308766f3b0f0d5292ed9476630da90d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23357) * af5d107b867fd97362710bc032a95743eb5d33a8 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
hudi-bot commented on PR #10886: URL: https://github.com/apache/hudi/pull/10886#issuecomment-2065710966 ## CI report: * 7b04755aa308766f3b0f0d5292ed9476630da90d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23357) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
hudi-bot commented on PR #10886: URL: https://github.com/apache/hudi/pull/10886#issuecomment-2065675635 ## CI report: * 522a68cb3ea8dc725418eb9b811a03b5c86c694b Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23343) * 7b04755aa308766f3b0f0d5292ed9476630da90d Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23357) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
hudi-bot commented on PR #10886: URL: https://github.com/apache/hudi/pull/10886#issuecomment-2065670991 ## CI report: * 522a68cb3ea8dc725418eb9b811a03b5c86c694b Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23343) * 7b04755aa308766f3b0f0d5292ed9476630da90d UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
wecharyu commented on code in PR #10886: URL: https://github.com/apache/hudi/pull/10886#discussion_r1571697843 ## hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java: ## @@ -92,37 +94,33 @@ public int getPartitionDepth() { /** * Write the metadata safely into partition atomically. + * To avoid concurrent write into the same partition (for example in speculative case), + * please make sure writeToken is unique. */ - public void trySave(int taskPartitionId) { + public void trySave(String writeToken) throws IOException { String extension = getMetafileExtension(); -Path tmpMetaPath = -new Path(partitionPath, HoodiePartitionMetadata.HOODIE_PARTITION_METAFILE_PREFIX + "_" + taskPartitionId + extension); Path metaPath = new Path(partitionPath, HoodiePartitionMetadata.HOODIE_PARTITION_METAFILE_PREFIX + extension); -boolean metafileExists = false; -try { - metafileExists = fs.exists(metaPath); - if (!metafileExists) { -// write to temporary file -writeMetafile(tmpMetaPath); -// move to actual path -fs.rename(tmpMetaPath, metaPath); - } -} catch (IOException ioe) { - LOG.warn("Error trying to save partition metadata (this is okay, as long as at least 1 of these succeeded), " - + partitionPath, ioe); -} finally { - if (!metafileExists) { -try { - // clean up tmp file, if still lying around - if (fs.exists(tmpMetaPath)) { -fs.delete(tmpMetaPath, false); +// This retry mechanism enables an exit-fast in metaPath exists check, which avoid the +// tasks failures when there are two or more tasks trying to create the same metaPath. +RetryHelper retryHelper = new RetryHelper(1000, 3, 1000, IOException.class.getName()) +.tryWith(() -> { + if (!fs.exists(metaPath)) { +if (format.isPresent()) { + Path tmpMetaPath = new Path(partitionPath, HoodiePartitionMetadata.HOODIE_PARTITION_METAFILE_PREFIX + "_" + writeToken + extension); + writeMetafileInFormat(metaPath, tmpMetaPath, format.get()); +} else { + // Backwards compatible properties file format + try (ByteArrayOutputStream os = new ByteArrayOutputStream()) { +props.store(os, "partition metadata"); +Option content = Option.of(os.toByteArray()); +HadoopFSUtils.createImmutableFileInPath(fs, metaPath, content, true, "_" + writeToken); Review Comment: Done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
danny0405 commented on code in PR #10886: URL: https://github.com/apache/hudi/pull/10886#discussion_r1571623276 ## hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java: ## @@ -92,11 +92,12 @@ public int getPartitionDepth() { /** * Write the metadata safely into partition atomically. + * To avoid concurrent write into the same partition (for example in speculative case), + * please make sure writeToken is unique. */ - public void trySave(int taskPartitionId) { + public void trySave(String writeToken) throws IOException { Review Comment: I have fied a fix for the random suffix, https://github.com/apache/hudi/pull/11052, with this change, I think we can get rid of the `writeToken` param that is passing around. We can use the `RetryHelper` for the file creation like this: ```java RetryHelper.doRetry { if (file does not exist) { storage. createImmutableFileInPath(file, content). } } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
danny0405 commented on code in PR #10886: URL: https://github.com/apache/hudi/pull/10886#discussion_r1571623276 ## hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java: ## @@ -92,11 +92,12 @@ public int getPartitionDepth() { /** * Write the metadata safely into partition atomically. + * To avoid concurrent write into the same partition (for example in speculative case), + * please make sure writeToken is unique. */ - public void trySave(int taskPartitionId) { + public void trySave(String writeToken) throws IOException { Review Comment: I have fied a fix for the random suffix, https://github.com/apache/hudi/pull/11052 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
danny0405 commented on code in PR #10886: URL: https://github.com/apache/hudi/pull/10886#discussion_r1571564604 ## hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java: ## @@ -92,37 +94,33 @@ public int getPartitionDepth() { /** * Write the metadata safely into partition atomically. + * To avoid concurrent write into the same partition (for example in speculative case), + * please make sure writeToken is unique. */ - public void trySave(int taskPartitionId) { + public void trySave(String writeToken) throws IOException { String extension = getMetafileExtension(); -Path tmpMetaPath = -new Path(partitionPath, HoodiePartitionMetadata.HOODIE_PARTITION_METAFILE_PREFIX + "_" + taskPartitionId + extension); Path metaPath = new Path(partitionPath, HoodiePartitionMetadata.HOODIE_PARTITION_METAFILE_PREFIX + extension); -boolean metafileExists = false; -try { - metafileExists = fs.exists(metaPath); - if (!metafileExists) { -// write to temporary file -writeMetafile(tmpMetaPath); -// move to actual path -fs.rename(tmpMetaPath, metaPath); - } -} catch (IOException ioe) { - LOG.warn("Error trying to save partition metadata (this is okay, as long as at least 1 of these succeeded), " - + partitionPath, ioe); -} finally { - if (!metafileExists) { -try { - // clean up tmp file, if still lying around - if (fs.exists(tmpMetaPath)) { -fs.delete(tmpMetaPath, false); +// This retry mechanism enables an exit-fast in metaPath exists check, which avoid the +// tasks failures when there are two or more tasks trying to create the same metaPath. +RetryHelper retryHelper = new RetryHelper(1000, 3, 1000, IOException.class.getName()) +.tryWith(() -> { + if (!fs.exists(metaPath)) { +if (format.isPresent()) { + Path tmpMetaPath = new Path(partitionPath, HoodiePartitionMetadata.HOODIE_PARTITION_METAFILE_PREFIX + "_" + writeToken + extension); + writeMetafileInFormat(metaPath, tmpMetaPath, format.get()); +} else { + // Backwards compatible properties file format + try (ByteArrayOutputStream os = new ByteArrayOutputStream()) { +props.store(os, "partition metadata"); +Option content = Option.of(os.toByteArray()); +HadoopFSUtils.createImmutableFileInPath(fs, metaPath, content, true, "_" + writeToken); Review Comment: Can we eliminate the writeToken and just use the `HadoopFSUtils.createImmutableFileInPath` to write the files directly? You may need to refactorer the method `HadoopFSUtils.createImmutableFileInPath` a little to use UUID as the temporal suffix too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
hudi-bot commented on PR #10886: URL: https://github.com/apache/hudi/pull/10886#issuecomment-2063823692 ## CI report: * 522a68cb3ea8dc725418eb9b811a03b5c86c694b Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23343) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
hudi-bot commented on PR #10886: URL: https://github.com/apache/hudi/pull/10886#issuecomment-2063621252 ## CI report: * aadcb616ac338ef60c5799414bef660a19135c06 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22957) * 522a68cb3ea8dc725418eb9b811a03b5c86c694b Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=23343) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
hudi-bot commented on PR #10886: URL: https://github.com/apache/hudi/pull/10886#issuecomment-2063608614 ## CI report: * aadcb616ac338ef60c5799414bef660a19135c06 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22957) * 522a68cb3ea8dc725418eb9b811a03b5c86c694b UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
wecharyu commented on code in PR #10886: URL: https://github.com/apache/hudi/pull/10886#discussion_r1570475426 ## hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java: ## @@ -92,11 +92,12 @@ public int getPartitionDepth() { /** * Write the metadata safely into partition atomically. + * To avoid concurrent write into the same partition (for example in speculative case), + * please make sure writeToken is unique. */ - public void trySave(int taskPartitionId) { + public void trySave(String writeToken) throws IOException { Review Comment: So sorry that I have pushed the patch to another repository and did not realize it :'(. I have added the retry logic, please help take a look again. @beyond1920 @boneanxs @danny0405 @Tartarus0zm -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
danny0405 commented on code in PR #10886: URL: https://github.com/apache/hudi/pull/10886#discussion_r1570413829 ## hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java: ## @@ -92,11 +92,12 @@ public int getPartitionDepth() { /** * Write the metadata safely into partition atomically. + * To avoid concurrent write into the same partition (for example in speculative case), + * please make sure writeToken is unique. */ - public void trySave(int taskPartitionId) { + public void trySave(String writeToken) throws IOException { Review Comment: ` RetryHelper` should be suitable to use here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
Tartarus0zm commented on code in PR #10886: URL: https://github.com/apache/hudi/pull/10886#discussion_r1570153651 ## hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java: ## @@ -92,11 +92,12 @@ public int getPartitionDepth() { /** * Write the metadata safely into partition atomically. + * To avoid concurrent write into the same partition (for example in speculative case), + * please make sure writeToken is unique. */ - public void trySave(int taskPartitionId) { + public void trySave(String writeToken) throws IOException { Review Comment: @danny0405 @boneanxs Using `RetryHelper` means using `HoodieRetryWrapperFileSystem` directly in HoodiePartitionMetadata ? Or only this code need using `RetryHelper `? ``` metafileExists = fs.exists(metaPath); if (!metafileExists) { // write to temporary file writeMetafile(tmpMetaPath); // move to actual path fs.rename(tmpMetaPath, metaPath); } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
Tartarus0zm commented on code in PR #10886: URL: https://github.com/apache/hudi/pull/10886#discussion_r1570153651 ## hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java: ## @@ -92,11 +92,12 @@ public int getPartitionDepth() { /** * Write the metadata safely into partition atomically. + * To avoid concurrent write into the same partition (for example in speculative case), + * please make sure writeToken is unique. */ - public void trySave(int taskPartitionId) { + public void trySave(String writeToken) throws IOException { Review Comment: @danny0405 @boneanxs Using `RetryHelper` means using `HoodieRetryWrapperFileSystem` directly in HoodiePartitionMetadata ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
danny0405 commented on code in PR #10886: URL: https://github.com/apache/hudi/pull/10886#discussion_r1568425096 ## hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java: ## @@ -92,11 +92,12 @@ public int getPartitionDepth() { /** * Write the metadata safely into partition atomically. + * To avoid concurrent write into the same partition (for example in speculative case), + * please make sure writeToken is unique. */ - public void trySave(int taskPartitionId) { + public void trySave(String writeToken) throws IOException { Review Comment: Yeah, the `HoodieWrapperFileSystem#createImmutableFileInPath` can actually ensure atomicity. Can you make the changes and have a real production test though, so that we are convinced the behavior is expected. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
danny0405 commented on code in PR #10886: URL: https://github.com/apache/hudi/pull/10886#discussion_r1568425096 ## hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java: ## @@ -92,11 +92,12 @@ public int getPartitionDepth() { /** * Write the metadata safely into partition atomically. + * To avoid concurrent write into the same partition (for example in speculative case), + * please make sure writeToken is unique. */ - public void trySave(int taskPartitionId) { + public void trySave(String writeToken) throws IOException { Review Comment: Yeah, the `HoodieWrapperFileSystem#createImmutableFileInPath` can actually ensure atomicity. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
boneanxs commented on code in PR #10886: URL: https://github.com/apache/hudi/pull/10886#discussion_r1568323048 ## hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java: ## @@ -92,11 +92,12 @@ public int getPartitionDepth() { /** * Write the metadata safely into partition atomically. + * To avoid concurrent write into the same partition (for example in speculative case), + * please make sure writeToken is unique. */ - public void trySave(int taskPartitionId) { + public void trySave(String writeToken) throws IOException { Review Comment: I see, we don't need UUID, after using `RetryHelper`, this issue should be fixed: ```java metafileExists = fs.exists(metaPath); if (!metafileExists) { // write to temporary file writeMetafile(tmpMetaPath); // move to actual path fs.rename(tmpMetaPath, metaPath); } ``` If 2 threads concurrently creating tmpMetaPath, after one thread renamed succeed, and another one failed because of tmpMetaPath missing, it will retry and return directly since `metafileExists` becomes true -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
danny0405 commented on code in PR #10886: URL: https://github.com/apache/hudi/pull/10886#discussion_r1568071623 ## hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java: ## @@ -92,11 +92,12 @@ public int getPartitionDepth() { /** * Write the metadata safely into partition atomically. + * To avoid concurrent write into the same partition (for example in speculative case), + * please make sure writeToken is unique. */ - public void trySave(int taskPartitionId) { + public void trySave(String writeToken) throws IOException { Review Comment: Let's switch to UUID to simplify the code, there is no possibility for conflicts any more, so the gains for debugging is little, we just need to handle the tmp file deletion in good manner, which should be ensured by `HoodieWrapperFileSystem#createImmutableFileInPath`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
Tartarus0zm commented on code in PR #10886: URL: https://github.com/apache/hudi/pull/10886#discussion_r1567119140 ## hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java: ## @@ -92,11 +92,12 @@ public int getPartitionDepth() { /** * Write the metadata safely into partition atomically. + * To avoid concurrent write into the same partition (for example in speculative case), + * please make sure writeToken is unique. */ - public void trySave(int taskPartitionId) { + public void trySave(String writeToken) throws IOException { Review Comment: > Hey, @Tartarus0zm what is parallel write? 2 jobs writing to the same path? Sorry, I need to explain here. `parallel write` is our internal optimization strategy that uses multi-threading to write out hdfs files to reduce the time consuming flink Checkpoint. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
boneanxs commented on code in PR #10886: URL: https://github.com/apache/hudi/pull/10886#discussion_r1566657358 ## hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java: ## @@ -92,11 +92,12 @@ public int getPartitionDepth() { /** * Write the metadata safely into partition atomically. + * To avoid concurrent write into the same partition (for example in speculative case), + * please make sure writeToken is unique. */ - public void trySave(int taskPartitionId) { + public void trySave(String writeToken) throws IOException { Review Comment: Hey, @Tartarus0zm what is parallel write? 2 jobs writing to the same path? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
Tartarus0zm commented on code in PR #10886: URL: https://github.com/apache/hudi/pull/10886#discussion_r1565699528 ## hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java: ## @@ -92,11 +92,12 @@ public int getPartitionDepth() { /** * Write the metadata safely into partition atomically. + * To avoid concurrent write into the same partition (for example in speculative case), + * please make sure writeToken is unique. */ - public void trySave(int taskPartitionId) { + public void trySave(String writeToken) throws IOException { Review Comment: hi everyone, I has cherry pick this patch to our internal, then if we enable parallel write, there is a high probability that `tmpMetaPath` write will fail because it was deleted by another thread. Maybe UUID is a good choice. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
danny0405 commented on PR #10886: URL: https://github.com/apache/hudi/pull/10886#issuecomment-2044649201 > > Before throw exception finally, we could retry for several times. > > @beyond1920 It sounds a good idea here. Added a retry logic. cc: @boneanxs @danny0405 There is a `RetryHelper` in Hudi if you wanna do that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
wecharyu commented on PR #10886: URL: https://github.com/apache/hudi/pull/10886#issuecomment-2044635984 > Before throw exception finally, we could retry for several times. @beyond1920 It sounds a good idea here. Added a retry logic. cc: @boneanxs @danny0405 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
beyond1920 commented on PR #10886: URL: https://github.com/apache/hudi/pull/10886#issuecomment-2023032742 @wecharyu > We did not notice this possible failure, it may not happen frequently because we have a pre-checked if this metadata file exists. OTOH, I think it's also OK to remove this throw exception. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
wecharyu commented on PR #10886: URL: https://github.com/apache/hudi/pull/10886#issuecomment-2020686744 > Would the failure probability increasing? We did not notice this possible failure, it may not happen frequently because we have a pre-checked if this metadata file exists. OTOH, I think it's also OK to remove this throw exception. > could we move created hoodie_partition_metadata to driver or jobmaster instead of each task? This may cause the performance issue if there are too many partitions need to create in driver. We need think more on it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
beyond1920 commented on PR #10886: URL: https://github.com/apache/hudi/pull/10886#issuecomment-2012131723 @wecharyu Thanks for your contribution. The pull request could resolve the problem, of course. But I am not sure whether that the solution would increase the probability of job failure or not. After this PR, any task encountered an `IOException` would lead the task failure, it might cause the job failure. And before this PR, each task would ignore `IOException`. Would the failure probability increasing? And is there any other solutions? For example, could we move created `hoodie_partition_metadata` to driver or jobmaster instead of each task? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
danny0405 commented on code in PR #10886: URL: https://github.com/apache/hudi/pull/10886#discussion_r1533289294 ## hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java: ## @@ -92,11 +92,12 @@ public int getPartitionDepth() { /** * Write the metadata safely into partition atomically. + * To avoid concurrent write into the same partition (for example in speculative case), + * please make sure writeToken is unique. */ - public void trySave(int taskPartitionId) { + public void trySave(String writeToken) throws IOException { Review Comment: > it's not easy to reuse createImmutableFileInPath() method here Needs some refactoring to the method, we can move it to FSUtils as a static method and make the suffix configurable. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
beyond1920 commented on code in PR #10886: URL: https://github.com/apache/hudi/pull/10886#discussion_r1532422677 ## hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java: ## @@ -92,11 +92,12 @@ public int getPartitionDepth() { /** * Write the metadata safely into partition atomically. + * To avoid concurrent write into the same partition (for example in speculative case), + * please make sure writeToken is unique. */ - public void trySave(int taskPartitionId) { + public void trySave(String writeToken) throws IOException { Review Comment: Both UUID and writeToken could make the metadata temp file name is unique for different task. +1 on writeToken because it contains more information to help us find the tasks. ## hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java: ## @@ -92,11 +92,12 @@ public int getPartitionDepth() { /** * Write the metadata safely into partition atomically. + * To avoid concurrent write into the same partition (for example in speculative case), + * please make sure writeToken is unique. */ - public void trySave(int taskPartitionId) { + public void trySave(String writeToken) throws IOException { Review Comment: Both UUID and writeToken could make the metadata temp file name is unique for different tasks. +1 on writeToken because it contains more information to help us find the tasks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
wecharyu commented on code in PR #10886: URL: https://github.com/apache/hudi/pull/10886#discussion_r1532367467 ## hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java: ## @@ -92,11 +92,12 @@ public int getPartitionDepth() { /** * Write the metadata safely into partition atomically. + * To avoid concurrent write into the same partition (for example in speculative case), + * please make sure writeToken is unique. */ - public void trySave(int taskPartitionId) { + public void trySave(String writeToken) throws IOException { Review Comment: It's good to reuse such write file code, but `fs` is not `HoodieWrapperFileSystem` in this class, it's not easy to reuse `createImmutableFileInPath()` method here, do we need move this method into `FSUtils`? https://github.com/apache/hudi/blob/7c55ac35ba11ed00151bf9d536aecdb5d83af33f/hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java#L61 As for writeToken and UUID, we prefer writeToken because it includes the task partition info which is useful in debug, pls let me know your idea. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
danny0405 commented on code in PR #10886: URL: https://github.com/apache/hudi/pull/10886#discussion_r1531326627 ## hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java: ## @@ -92,11 +92,12 @@ public int getPartitionDepth() { /** * Write the metadata safely into partition atomically. + * To avoid concurrent write into the same partition (for example in speculative case), + * please make sure writeToken is unique. */ - public void trySave(int taskPartitionId) { + public void trySave(String writeToken) throws IOException { Review Comment: Maybe we can utililize thee method `HoodieWrapperFileSystem#createImmutableFileInPath` and make the suffix a UUID. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
wecharyu commented on code in PR #10886: URL: https://github.com/apache/hudi/pull/10886#discussion_r1530312027 ## hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java: ## @@ -92,11 +92,12 @@ public int getPartitionDepth() { /** * Write the metadata safely into partition atomically. + * To avoid concurrent write into the same partition (for example in speculative case), + * please make sure writeToken is unique. */ - public void trySave(int taskPartitionId) { + public void trySave(String writeToken) throws IOException { Review Comment: Here we want a unique suffix for temp metadata file, and `writeToken` just fullfills. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
hudi-bot commented on PR #10886: URL: https://github.com/apache/hudi/pull/10886#issuecomment-2007036084 ## CI report: * aadcb616ac338ef60c5799414bef660a19135c06 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22957) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
danny0405 commented on code in PR #10886: URL: https://github.com/apache/hudi/pull/10886#discussion_r1530176270 ## hudi-common/src/main/java/org/apache/hudi/common/model/HoodiePartitionMetadata.java: ## @@ -92,11 +92,12 @@ public int getPartitionDepth() { /** * Write the metadata safely into partition atomically. + * To avoid concurrent write into the same partition (for example in speculative case), + * please make sure writeToken is unique. */ - public void trySave(int taskPartitionId) { + public void trySave(String writeToken) throws IOException { Review Comment: Can it be a UUID, why we need to pass around the writeToken ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
hudi-bot commented on PR #10886: URL: https://github.com/apache/hudi/pull/10886#issuecomment-2006901528 ## CI report: * aadcb616ac338ef60c5799414bef660a19135c06 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=22957) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
hudi-bot commented on PR #10886: URL: https://github.com/apache/hudi/pull/10886#issuecomment-2006875566 ## CI report: * aadcb616ac338ef60c5799414bef660a19135c06 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7515] Fix partition metadata write failure [hudi]
wecharyu commented on PR #10886: URL: https://github.com/apache/hudi/pull/10886#issuecomment-2006795897 For https://github.com/apache/hudi/issues/10885, cc: @beyond1920 @boneanxs @danny0405 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] [HUDI-7515] Fix partition metadata write failure [hudi]
wecharyu opened a new pull request, #10886: URL: https://github.com/apache/hudi/pull/10886 ### Change Logs When `spark.speculation` is enabled, if the write metadata operation become slow for some reason, a speculative will be started to write the same metadata file concurrently. In HDFS, two threads working on same piece may cause the issue: ```bash Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /path/to/table/a=3519/b=3520/c=3521/.hoodie_partition_metadata_112 (inode 48415575374) Holder DFSClient_NONMAPREDUCE_-2108606624_29 does not have any open files. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3239) at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.analyzeFileState(FSDirWriteFileOp.java:617) at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:176) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3116) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:1075) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:619) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:623) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:591) at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:575) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1137) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1230) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1128) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:2005) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3354) ``` Then the partition metadata file can not be created successfully. We try to fix this issue by: 1. Use a unique token in `HoodiePartitionMetadata#trySave()` to support concurrent write 2. Throw exception when failed to write partition metadata file. ### Impact Fix the concurrently write same partition metadata issue. ### Risk level (write none, low medium or high below) Low ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none"._ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org