[jira] [Work logged] (MAPREDUCE-7317) Add latency information in FileOutputCommitter.mergePaths
[ https://issues.apache.org/jira/browse/MAPREDUCE-7317?focusedWorklogId=543019=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-543019 ] ASF GitHub Bot logged work on MAPREDUCE-7317: - Author: ASF GitHub Bot Created on: 27/Jan/21 19:18 Start Date: 27/Jan/21 19:18 Worklog Time Spent: 10m Work Description: HeartSaVioR commented on pull request #2624: URL: https://github.com/apache/hadoop/pull/2624#issuecomment-768515963 Thanks for reviewing and merging! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 543019) Time Spent: 2.5h (was: 2h 20m) > Add latency information in FileOutputCommitter.mergePaths > - > > Key: MAPREDUCE-7317 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7317 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Minor > Labels: pull-request-available > Fix For: 3.3.1 > > Time Spent: 2.5h > Remaining Estimate: 0h > > We have been observed some occurrences of huge delay from file output > committer V1, where file output committer V2 is not an option. > While the root cause should have investigated on our side, there's another > issue that there's insufficient information to debug. Most likely the huge > delay comes from mergePaths, but the class only provides the "debug" log > message to log the call itself with parameters, nothing else. mergePaths has > been called recursively which is harder to trace how much latency specific > directory takes to merge. > It would be nice and not intrusive to add latency info in mergePath, so that > we can see how much latency specific directory takes to merge, only when > debug log is enabled. > (Ideally it'd be nice if we can log warn message when the call takes huge > time to process, but I don't have the proper threshold for the "huge time", > so I'd avoid dealing with it altogether here.) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Work logged] (MAPREDUCE-7317) Add latency information in FileOutputCommitter.mergePaths
[ https://issues.apache.org/jira/browse/MAPREDUCE-7317?focusedWorklogId=543012=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-543012 ] ASF GitHub Bot logged work on MAPREDUCE-7317: - Author: ASF GitHub Bot Created on: 27/Jan/21 19:08 Start Date: 27/Jan/21 19:08 Worklog Time Spent: 10m Work Description: steveloughran merged pull request #2624: URL: https://github.com/apache/hadoop/pull/2624 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 543012) Time Spent: 2h 20m (was: 2h 10m) > Add latency information in FileOutputCommitter.mergePaths > - > > Key: MAPREDUCE-7317 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7317 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client >Reporter: Jungtaek Lim >Priority: Minor > Labels: pull-request-available > Time Spent: 2h 20m > Remaining Estimate: 0h > > We have been observed some occurrences of huge delay from file output > committer V1, where file output committer V2 is not an option. > While the root cause should have investigated on our side, there's another > issue that there's insufficient information to debug. Most likely the huge > delay comes from mergePaths, but the class only provides the "debug" log > message to log the call itself with parameters, nothing else. mergePaths has > been called recursively which is harder to trace how much latency specific > directory takes to merge. > It would be nice and not intrusive to add latency info in mergePath, so that > we can see how much latency specific directory takes to merge, only when > debug log is enabled. > (Ideally it'd be nice if we can log warn message when the call takes huge > time to process, but I don't have the proper threshold for the "huge time", > so I'd avoid dealing with it altogether here.) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Work logged] (MAPREDUCE-7317) Add latency information in FileOutputCommitter.mergePaths
[ https://issues.apache.org/jira/browse/MAPREDUCE-7317?focusedWorklogId=542838=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-542838 ] ASF GitHub Bot logged work on MAPREDUCE-7317: - Author: ASF GitHub Bot Created on: 27/Jan/21 13:46 Start Date: 27/Jan/21 13:46 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #2624: URL: https://github.com/apache/hadoop/pull/2624#issuecomment-768295570 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 1m 23s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 33m 16s | | trunk passed | | +1 :green_heart: | compile | 0m 41s | | trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | compile | 0m 35s | | trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 | | +1 :green_heart: | checkstyle | 0m 34s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 42s | | trunk passed | | +1 :green_heart: | shadedclient | 14m 38s | | branch has no errors when building and testing our client artifacts. | | +1 :green_heart: | javadoc | 0m 26s | | trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | javadoc | 0m 24s | | trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 | | +0 :ok: | spotbugs | 1m 17s | | Used deprecated FindBugs config; considering switching to SpotBugs. | | +1 :green_heart: | findbugs | 1m 14s | | trunk passed | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 32s | | the patch passed | | +1 :green_heart: | compile | 0m 32s | | the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | javac | 0m 32s | | the patch passed | | +1 :green_heart: | compile | 0m 28s | | the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 | | +1 :green_heart: | javac | 0m 28s | | the patch passed | | +1 :green_heart: | checkstyle | 0m 26s | | the patch passed | | +1 :green_heart: | mvnsite | 0m 32s | | the patch passed | | +1 :green_heart: | whitespace | 0m 0s | | The patch has no whitespace issues. | | +1 :green_heart: | shadedclient | 12m 56s | | patch has no errors when building and testing our client artifacts. | | +1 :green_heart: | javadoc | 0m 20s | | the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | javadoc | 0m 20s | | the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 | | +1 :green_heart: | findbugs | 1m 16s | | the patch passed | _ Other Tests _ | | +1 :green_heart: | unit | 7m 0s | | hadoop-mapreduce-client-core in the patch passed. | | +1 :green_heart: | asflicense | 0m 31s | | The patch does not generate ASF License warnings. | | | | 80m 43s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2624/4/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/2624 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux dd8707cc185a 4.15.0-65-generic #74-Ubuntu SMP Tue Sep 17 17:06:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 80c7404b519 | | Default Java | Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2624/4/testReport/ | | Max. process+thread count | 1559 (vs. ulimit of 5500) | | modules | C: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core U:
[jira] [Work logged] (MAPREDUCE-7317) Add latency information in FileOutputCommitter.mergePaths
[ https://issues.apache.org/jira/browse/MAPREDUCE-7317?focusedWorklogId=542824=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-542824 ] ASF GitHub Bot logged work on MAPREDUCE-7317: - Author: ASF GitHub Bot Created on: 27/Jan/21 13:26 Start Date: 27/Jan/21 13:26 Worklog Time Spent: 10m Work Description: steveloughran commented on pull request #2624: URL: https://github.com/apache/hadoop/pull/2624#issuecomment-768283949 Yeah, that's good. we don't build the string unless its being logged, (and its only done at the start, not re-evaluated later), so keeping it efficient is nice. LGTM. +1 pending Yetus, being happy (and ignoring its complaints about tests) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 542824) Time Spent: 2h (was: 1h 50m) > Add latency information in FileOutputCommitter.mergePaths > - > > Key: MAPREDUCE-7317 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7317 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client >Reporter: Jungtaek Lim >Priority: Minor > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > > We have been observed some occurrences of huge delay from file output > committer V1, where file output committer V2 is not an option. > While the root cause should have investigated on our side, there's another > issue that there's insufficient information to debug. Most likely the huge > delay comes from mergePaths, but the class only provides the "debug" log > message to log the call itself with parameters, nothing else. mergePaths has > been called recursively which is harder to trace how much latency specific > directory takes to merge. > It would be nice and not intrusive to add latency info in mergePath, so that > we can see how much latency specific directory takes to merge, only when > debug log is enabled. > (Ideally it'd be nice if we can log warn message when the call takes huge > time to process, but I don't have the proper threshold for the "huge time", > so I'd avoid dealing with it altogether here.) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Work logged] (MAPREDUCE-7317) Add latency information in FileOutputCommitter.mergePaths
[ https://issues.apache.org/jira/browse/MAPREDUCE-7317?focusedWorklogId=542791=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-542791 ] ASF GitHub Bot logged work on MAPREDUCE-7317: - Author: ASF GitHub Bot Created on: 27/Jan/21 12:19 Start Date: 27/Jan/21 12:19 Worklog Time Spent: 10m Work Description: HeartSaVioR commented on pull request #2624: URL: https://github.com/apache/hadoop/pull/2624#issuecomment-768248565 Ah OK. Good to know. I didn't realize it prints two times for entrance and exit. Will fix. Probably I'll have to go back to use `from` instead of `from.getPath()` then. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 542791) Time Spent: 1h 50m (was: 1h 40m) > Add latency information in FileOutputCommitter.mergePaths > - > > Key: MAPREDUCE-7317 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7317 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client >Reporter: Jungtaek Lim >Priority: Minor > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > We have been observed some occurrences of huge delay from file output > committer V1, where file output committer V2 is not an option. > While the root cause should have investigated on our side, there's another > issue that there's insufficient information to debug. Most likely the huge > delay comes from mergePaths, but the class only provides the "debug" log > message to log the call itself with parameters, nothing else. mergePaths has > been called recursively which is harder to trace how much latency specific > directory takes to merge. > It would be nice and not intrusive to add latency info in mergePath, so that > we can see how much latency specific directory takes to merge, only when > debug log is enabled. > (Ideally it'd be nice if we can log warn message when the call takes huge > time to process, but I don't have the proper threshold for the "huge time", > so I'd avoid dealing with it altogether here.) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Work logged] (MAPREDUCE-7317) Add latency information in FileOutputCommitter.mergePaths
[ https://issues.apache.org/jira/browse/MAPREDUCE-7317?focusedWorklogId=542751=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-542751 ] ASF GitHub Bot logged work on MAPREDUCE-7317: - Author: ASF GitHub Bot Created on: 27/Jan/21 10:23 Start Date: 27/Jan/21 10:23 Worklog Time Spent: 10m Work Description: steveloughran commented on a change in pull request #2624: URL: https://github.com/apache/hadoop/pull/2624#discussion_r565187998 ## File path: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/FileOutputCommitter.java ## @@ -455,53 +455,50 @@ protected void commitJobInternal(JobContext context) throws IOException { */ private void mergePaths(FileSystem fs, final FileStatus from, final Path to, JobContext context) throws IOException { -long timeStartNs = -1L; -if (LOG.isDebugEnabled()) { - timeStartNs = System.nanoTime(); - LOG.debug("Merging data from " + from + " to " + to); -} -reportProgress(context); -FileStatus toStat; -try { - toStat = fs.getFileStatus(to); -} catch (FileNotFoundException fnfe) { - toStat = null; -} - -if (from.isFile()) { - if (toStat != null) { -if (!fs.delete(to, true)) { - throw new IOException("Failed to delete " + to); -} +try (DurationInfo d = new DurationInfo(LOG, +false, +"Merged data from %s to %s", from.getPath(), to)) { Review comment: we actually print this *at start* as well as end; end has timings. So you don't need lines 461 & 462 ## File path: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/FileOutputCommitter.java ## @@ -455,53 +455,50 @@ protected void commitJobInternal(JobContext context) throws IOException { */ private void mergePaths(FileSystem fs, final FileStatus from, final Path to, JobContext context) throws IOException { -long timeStartNs = -1L; -if (LOG.isDebugEnabled()) { - timeStartNs = System.nanoTime(); - LOG.debug("Merging data from " + from + " to " + to); -} -reportProgress(context); -FileStatus toStat; -try { - toStat = fs.getFileStatus(to); -} catch (FileNotFoundException fnfe) { - toStat = null; -} - -if (from.isFile()) { - if (toStat != null) { -if (!fs.delete(to, true)) { - throw new IOException("Failed to delete " + to); -} +try (DurationInfo d = new DurationInfo(LOG, +false, +"Merged data from %s to %s", from.getPath(), to)) { + if (LOG.isDebugEnabled()) { Review comment: cut these; duplicate now This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 542751) Time Spent: 1h 40m (was: 1.5h) > Add latency information in FileOutputCommitter.mergePaths > - > > Key: MAPREDUCE-7317 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7317 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client >Reporter: Jungtaek Lim >Priority: Minor > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > We have been observed some occurrences of huge delay from file output > committer V1, where file output committer V2 is not an option. > While the root cause should have investigated on our side, there's another > issue that there's insufficient information to debug. Most likely the huge > delay comes from mergePaths, but the class only provides the "debug" log > message to log the call itself with parameters, nothing else. mergePaths has > been called recursively which is harder to trace how much latency specific > directory takes to merge. > It would be nice and not intrusive to add latency info in mergePath, so that > we can see how much latency specific directory takes to merge, only when > debug log is enabled. > (Ideally it'd be nice if we can log warn message when the call takes huge > time to process, but I don't have the proper threshold for the "huge time", > so I'd avoid dealing with it altogether here.) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For
[jira] [Work logged] (MAPREDUCE-7317) Add latency information in FileOutputCommitter.mergePaths
[ https://issues.apache.org/jira/browse/MAPREDUCE-7317?focusedWorklogId=542670=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-542670 ] ASF GitHub Bot logged work on MAPREDUCE-7317: - Author: ASF GitHub Bot Created on: 27/Jan/21 06:17 Start Date: 27/Jan/21 06:17 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #2624: URL: https://github.com/apache/hadoop/pull/2624#issuecomment-768060327 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 56m 43s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 34m 46s | | trunk passed | | +1 :green_heart: | compile | 0m 46s | | trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | compile | 0m 37s | | trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 | | +1 :green_heart: | checkstyle | 0m 38s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 45s | | trunk passed | | +1 :green_heart: | shadedclient | 15m 1s | | branch has no errors when building and testing our client artifacts. | | +1 :green_heart: | javadoc | 0m 25s | | trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | javadoc | 0m 23s | | trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 | | +0 :ok: | spotbugs | 1m 20s | | Used deprecated FindBugs config; considering switching to SpotBugs. | | +1 :green_heart: | findbugs | 1m 18s | | trunk passed | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 31s | | the patch passed | | +1 :green_heart: | compile | 0m 32s | | the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | javac | 0m 32s | | the patch passed | | +1 :green_heart: | compile | 0m 28s | | the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 | | +1 :green_heart: | javac | 0m 28s | | the patch passed | | +1 :green_heart: | checkstyle | 0m 26s | | the patch passed | | +1 :green_heart: | mvnsite | 0m 31s | | the patch passed | | +1 :green_heart: | whitespace | 0m 0s | | The patch has no whitespace issues. | | +1 :green_heart: | shadedclient | 13m 45s | | patch has no errors when building and testing our client artifacts. | | +1 :green_heart: | javadoc | 0m 19s | | the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | javadoc | 0m 19s | | the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 | | +1 :green_heart: | findbugs | 1m 20s | | the patch passed | _ Other Tests _ | | +1 :green_heart: | unit | 7m 2s | | hadoop-mapreduce-client-core in the patch passed. | | +1 :green_heart: | asflicense | 0m 37s | | The patch does not generate ASF License warnings. | | | | 139m 4s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2624/3/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/2624 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 70605f252ffa 4.15.0-65-generic #74-Ubuntu SMP Tue Sep 17 17:06:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 80c7404b519 | | Default Java | Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2624/3/testReport/ | | Max. process+thread count | 1676 (vs. ulimit of 5500) | | modules | C: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core U:
[jira] [Work logged] (MAPREDUCE-7317) Add latency information in FileOutputCommitter.mergePaths
[ https://issues.apache.org/jira/browse/MAPREDUCE-7317?focusedWorklogId=542620=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-542620 ] ASF GitHub Bot logged work on MAPREDUCE-7317: - Author: ASF GitHub Bot Created on: 27/Jan/21 04:00 Start Date: 27/Jan/21 04:00 Worklog Time Spent: 10m Work Description: HeartSaVioR commented on pull request #2624: URL: https://github.com/apache/hadoop/pull/2624#issuecomment-768006743 Thanks for suggestion. Reflected review comment. The most of add/delete lines are simply indentation change. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 542620) Time Spent: 1h 20m (was: 1h 10m) > Add latency information in FileOutputCommitter.mergePaths > - > > Key: MAPREDUCE-7317 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7317 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client >Reporter: Jungtaek Lim >Priority: Minor > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > We have been observed some occurrences of huge delay from file output > committer V1, where file output committer V2 is not an option. > While the root cause should have investigated on our side, there's another > issue that there's insufficient information to debug. Most likely the huge > delay comes from mergePaths, but the class only provides the "debug" log > message to log the call itself with parameters, nothing else. mergePaths has > been called recursively which is harder to trace how much latency specific > directory takes to merge. > It would be nice and not intrusive to add latency info in mergePath, so that > we can see how much latency specific directory takes to merge, only when > debug log is enabled. > (Ideally it'd be nice if we can log warn message when the call takes huge > time to process, but I don't have the proper threshold for the "huge time", > so I'd avoid dealing with it altogether here.) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Work logged] (MAPREDUCE-7317) Add latency information in FileOutputCommitter.mergePaths
[ https://issues.apache.org/jira/browse/MAPREDUCE-7317?focusedWorklogId=542494=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-542494 ] ASF GitHub Bot logged work on MAPREDUCE-7317: - Author: ASF GitHub Bot Created on: 26/Jan/21 22:58 Start Date: 26/Jan/21 22:58 Worklog Time Spent: 10m Work Description: steveloughran commented on pull request #2624: URL: https://github.com/apache/hadoop/pull/2624#issuecomment-767882952 Use DurationInfo in a try-with-resources; it will log start and end with timings @ debug ```java try (DurationInfo d = new DurationInfo(LOG, "Aborting commit ID %s to path %s", uploadId, destKey)) { writeOperations.abortMultipartCommit(destKey, uploadId); } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 542494) Time Spent: 1h 10m (was: 1h) > Add latency information in FileOutputCommitter.mergePaths > - > > Key: MAPREDUCE-7317 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7317 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client >Reporter: Jungtaek Lim >Priority: Minor > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > We have been observed some occurrences of huge delay from file output > committer V1, where file output committer V2 is not an option. > While the root cause should have investigated on our side, there's another > issue that there's insufficient information to debug. Most likely the huge > delay comes from mergePaths, but the class only provides the "debug" log > message to log the call itself with parameters, nothing else. mergePaths has > been called recursively which is harder to trace how much latency specific > directory takes to merge. > It would be nice and not intrusive to add latency info in mergePath, so that > we can see how much latency specific directory takes to merge, only when > debug log is enabled. > (Ideally it'd be nice if we can log warn message when the call takes huge > time to process, but I don't have the proper threshold for the "huge time", > so I'd avoid dealing with it altogether here.) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Work logged] (MAPREDUCE-7317) Add latency information in FileOutputCommitter.mergePaths
[ https://issues.apache.org/jira/browse/MAPREDUCE-7317?focusedWorklogId=537360=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-537360 ] ASF GitHub Bot logged work on MAPREDUCE-7317: - Author: ASF GitHub Bot Created on: 18/Jan/21 12:28 Start Date: 18/Jan/21 12:28 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #2624: URL: https://github.com/apache/hadoop/pull/2624#issuecomment-762219366 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 1m 16s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 33m 44s | | trunk passed | | +1 :green_heart: | compile | 0m 38s | | trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 | | +1 :green_heart: | compile | 0m 36s | | trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 | | +1 :green_heart: | checkstyle | 0m 31s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 43s | | trunk passed | | +1 :green_heart: | shadedclient | 17m 12s | | branch has no errors when building and testing our client artifacts. | | +1 :green_heart: | javadoc | 0m 25s | | trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 | | +1 :green_heart: | javadoc | 0m 23s | | trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 | | +0 :ok: | spotbugs | 1m 21s | | Used deprecated FindBugs config; considering switching to SpotBugs. | | +1 :green_heart: | findbugs | 1m 19s | | trunk passed | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 34s | | the patch passed | | +1 :green_heart: | compile | 0m 33s | | the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 | | +1 :green_heart: | javac | 0m 33s | | the patch passed | | +1 :green_heart: | compile | 0m 27s | | the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 | | +1 :green_heart: | javac | 0m 27s | | the patch passed | | +1 :green_heart: | checkstyle | 0m 20s | | the patch passed | | +1 :green_heart: | mvnsite | 0m 34s | | the patch passed | | +1 :green_heart: | whitespace | 0m 0s | | The patch has no whitespace issues. | | +1 :green_heart: | shadedclient | 14m 38s | | patch has no errors when building and testing our client artifacts. | | +1 :green_heart: | javadoc | 0m 22s | | the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 | | +1 :green_heart: | javadoc | 0m 20s | | the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 | | +1 :green_heart: | findbugs | 1m 22s | | the patch passed | _ Other Tests _ | | +1 :green_heart: | unit | 7m 16s | | hadoop-mapreduce-client-core in the patch passed. | | +1 :green_heart: | asflicense | 0m 34s | | The patch does not generate ASF License warnings. | | | | 85m 34s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2624/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/2624 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 9462477aca03 4.15.0-65-generic #74-Ubuntu SMP Tue Sep 17 17:06:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 97f843de3a9 | | Default Java | Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2624/2/testReport/ | | Max. process+thread count | 1108 (vs. ulimit of 5500) | | modules | C: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core U:
[jira] [Work logged] (MAPREDUCE-7317) Add latency information in FileOutputCommitter.mergePaths
[ https://issues.apache.org/jira/browse/MAPREDUCE-7317?focusedWorklogId=537337=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-537337 ] ASF GitHub Bot logged work on MAPREDUCE-7317: - Author: ASF GitHub Bot Created on: 18/Jan/21 11:09 Start Date: 18/Jan/21 11:09 Worklog Time Spent: 10m Work Description: HeartSaVioR commented on pull request #2624: URL: https://github.com/apache/hadoop/pull/2624#issuecomment-762177025 -1 from test4tests is expected. Just fixed -1 from checkstyle. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 537337) Time Spent: 50m (was: 40m) > Add latency information in FileOutputCommitter.mergePaths > - > > Key: MAPREDUCE-7317 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7317 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client >Reporter: Jungtaek Lim >Priority: Minor > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > We have been observed some occurrences of huge delay from file output > committer V1, where file output committer V2 is not an option. > While the root cause should have investigated on our side, there's another > issue that there's insufficient information to debug. Most likely the huge > delay comes from mergePaths, but the class only provides the "debug" log > message to log the call itself with parameters, nothing else. mergePaths has > been called recursively which is harder to trace how much latency specific > directory takes to merge. > It would be nice and not intrusive to add latency info in mergePath, so that > we can see how much latency specific directory takes to merge, only when > debug log is enabled. > (Ideally it'd be nice if we can log warn message when the call takes huge > time to process, but I don't have the proper threshold for the "huge time", > so I'd avoid dealing with it altogether here.) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Work logged] (MAPREDUCE-7317) Add latency information in FileOutputCommitter.mergePaths
[ https://issues.apache.org/jira/browse/MAPREDUCE-7317?focusedWorklogId=537296=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-537296 ] ASF GitHub Bot logged work on MAPREDUCE-7317: - Author: ASF GitHub Bot Created on: 18/Jan/21 09:22 Start Date: 18/Jan/21 09:22 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #2624: URL: https://github.com/apache/hadoop/pull/2624#issuecomment-762110273 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 32s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 1s | | No case conflicting files found. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 33m 58s | | trunk passed | | +1 :green_heart: | compile | 0m 40s | | trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 | | +1 :green_heart: | compile | 0m 36s | | trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 | | +1 :green_heart: | checkstyle | 0m 30s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 43s | | trunk passed | | +1 :green_heart: | shadedclient | 16m 36s | | branch has no errors when building and testing our client artifacts. | | +1 :green_heart: | javadoc | 0m 27s | | trunk passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 | | +1 :green_heart: | javadoc | 0m 24s | | trunk passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 | | +0 :ok: | spotbugs | 1m 16s | | Used deprecated FindBugs config; considering switching to SpotBugs. | | +1 :green_heart: | findbugs | 1m 14s | | trunk passed | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 33s | | the patch passed | | +1 :green_heart: | compile | 0m 31s | | the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 | | +1 :green_heart: | javac | 0m 31s | | the patch passed | | +1 :green_heart: | compile | 0m 28s | | the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 | | +1 :green_heart: | javac | 0m 28s | | the patch passed | | -0 :warning: | checkstyle | 0m 21s | [/diff-checkstyle-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2624/1/artifact/out/diff-checkstyle-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core.txt) | hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core: The patch generated 2 new + 24 unchanged - 0 fixed = 26 total (was 24) | | +1 :green_heart: | mvnsite | 0m 32s | | the patch passed | | +1 :green_heart: | whitespace | 0m 0s | | The patch has no whitespace issues. | | +1 :green_heart: | shadedclient | 15m 0s | | patch has no errors when building and testing our client artifacts. | | +1 :green_heart: | javadoc | 0m 21s | | the patch passed with JDK Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04 | | +1 :green_heart: | javadoc | 0m 21s | | the patch passed with JDK Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 | | +1 :green_heart: | findbugs | 1m 16s | | the patch passed | _ Other Tests _ | | +1 :green_heart: | unit | 6m 55s | | hadoop-mapreduce-client-core in the patch passed. | | +1 :green_heart: | asflicense | 0m 34s | | The patch does not generate ASF License warnings. | | | | 84m 38s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2624/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/2624 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 1618abb73623 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 97f843de3a9 | | Default Java | Private Build-1.8.0_275-8u275-b01-0ubuntu1~18.04-b01 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.18.04
[jira] [Work logged] (MAPREDUCE-7317) Add latency information in FileOutputCommitter.mergePaths
[ https://issues.apache.org/jira/browse/MAPREDUCE-7317?focusedWorklogId=537262=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-537262 ] ASF GitHub Bot logged work on MAPREDUCE-7317: - Author: ASF GitHub Bot Created on: 18/Jan/21 07:59 Start Date: 18/Jan/21 07:59 Worklog Time Spent: 10m Work Description: HeartSaVioR commented on pull request #2624: URL: https://github.com/apache/hadoop/pull/2624#issuecomment-762060417 DISCLOSURE: We encountered such issues with Spark, and we had to dump stack trace to confirm the issue. While I'd prefer logging it as INFO but I agree it should be a bit verbose and affects existing users, so just trying to add more information when enabling DEBUG log level. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 537262) Time Spent: 20m (was: 10m) > Add latency information in FileOutputCommitter.mergePaths > - > > Key: MAPREDUCE-7317 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7317 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client >Reporter: Jungtaek Lim >Priority: Minor > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > We have been observed some occurrences of huge delay from file output > committer V1, where file output committer V2 is not an option. > While the root cause should have investigated on our side, there's another > issue that there's insufficient information to debug. Most likely the huge > delay comes from mergePaths, but the class only provides the "debug" log > message to log the call itself with parameters, nothing else. mergePaths has > been called recursively which is harder to trace how much latency specific > directory takes to merge. > It would be nice and not intrusive to add latency info in mergePath, so that > we can see how much latency specific directory takes to merge, only when > debug log is enabled. > (Ideally it'd be nice if we can log warn message when the call takes huge > time to process, but I don't have the proper threshold for the "huge time", > so I'd avoid dealing with it altogether here.) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Work logged] (MAPREDUCE-7317) Add latency information in FileOutputCommitter.mergePaths
[ https://issues.apache.org/jira/browse/MAPREDUCE-7317?focusedWorklogId=537263=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-537263 ] ASF GitHub Bot logged work on MAPREDUCE-7317: - Author: ASF GitHub Bot Created on: 18/Jan/21 07:59 Start Date: 18/Jan/21 07:59 Worklog Time Spent: 10m Work Description: HeartSaVioR edited a comment on pull request #2624: URL: https://github.com/apache/hadoop/pull/2624#issuecomment-762060417 DISCLOSURE: We encountered such issues with Spark, and every time we had to dump stack trace to confirm the issue. While I'd prefer logging it as INFO but I agree it should be a bit verbose and affects existing users, so just trying to add more information when enabling DEBUG log level. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 537263) Time Spent: 0.5h (was: 20m) > Add latency information in FileOutputCommitter.mergePaths > - > > Key: MAPREDUCE-7317 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7317 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client >Reporter: Jungtaek Lim >Priority: Minor > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > We have been observed some occurrences of huge delay from file output > committer V1, where file output committer V2 is not an option. > While the root cause should have investigated on our side, there's another > issue that there's insufficient information to debug. Most likely the huge > delay comes from mergePaths, but the class only provides the "debug" log > message to log the call itself with parameters, nothing else. mergePaths has > been called recursively which is harder to trace how much latency specific > directory takes to merge. > It would be nice and not intrusive to add latency info in mergePath, so that > we can see how much latency specific directory takes to merge, only when > debug log is enabled. > (Ideally it'd be nice if we can log warn message when the call takes huge > time to process, but I don't have the proper threshold for the "huge time", > so I'd avoid dealing with it altogether here.) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Work logged] (MAPREDUCE-7317) Add latency information in FileOutputCommitter.mergePaths
[ https://issues.apache.org/jira/browse/MAPREDUCE-7317?focusedWorklogId=537261=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-537261 ] ASF GitHub Bot logged work on MAPREDUCE-7317: - Author: ASF GitHub Bot Created on: 18/Jan/21 07:56 Start Date: 18/Jan/21 07:56 Worklog Time Spent: 10m Work Description: HeartSaVioR opened a new pull request #2624: URL: https://github.com/apache/hadoop/pull/2624 This PR proposes to add latency information in FileOutputCommitter.mergePaths, so that we can trace how much latency specific directory takes to merge. This information would provide some value on investigation when the commit in FileOutputCommitter takes huge time than expected. This class logged the call with from/to params in debug level which looks insufficient to trace the latency of specific directory due to recursive call. No test added as there's nothing to test actually. Manual test done via adding below in log4j.properties ``` log4j.logger.org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter=DEBUG ``` and ran tests in TestFileOutputCommitter. ``` 2021-01-18 16:14:03,475 DEBUG [main] output.FileOutputCommitter (FileOutputCommitter.java:mergePaths(461)) - Merging data from DeprecatedRawLocalFileStatus{path=file:/Users/jlim/WorkArea/JavaProjects/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/target/test-dir/org.apache.hadoop.mapreduce.lib.output.TestFileOutputCommitter/_temporary/0/task_200707121733_0001_m_00; isDirectory=true; modification_time=1610954043000; access_time=1610954043000; owner=; group=; permission=rwxrwxrwx; isSymlink=false; hasAcl=false; isEncrypted=false; isErasureCoded=false} to file:/Users/jlim/WorkArea/JavaProjects/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/target/test-dir/org.apache.hadoop.mapreduce.lib.output.TestFileOutputCommitter ... 2021-01-18 16:14:03,476 DEBUG [main] output.FileOutputCommitter (FileOutputCommitter.java:mergePaths(502)) - Merged data from file:/.../hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/target/test-dir/org.apache.hadoop.mapreduce.lib.output.TestFileOutputCommitter/_temporary/0/task_200707121733_0001_m_00 to file:/.../hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/target/test-dir/org.apache.hadoop.mapreduce.lib.output.TestFileOutputCommitter in 1 ms ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 537261) Remaining Estimate: 0h Time Spent: 10m > Add latency information in FileOutputCommitter.mergePaths > - > > Key: MAPREDUCE-7317 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7317 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client >Reporter: Jungtaek Lim >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > We have been observed some occurrences of huge delay from file output > committer V1, where file output committer V2 is not an option. > While the root cause should have investigated on our side, there's another > issue that there's insufficient information to debug. Most likely the huge > delay comes from mergePaths, but the class only provides the "debug" log > message to log the call itself with parameters, nothing else. mergePaths has > been called recursively which is harder to trace how much latency specific > directory takes to merge. > It would be nice and not intrusive to add latency info in mergePath, so that > we can see how much latency specific directory takes to merge, only when > debug log is enabled. > (Ideally it'd be nice if we can log warn message when the call takes huge > time to process, but I don't have the proper threshold for the "huge time", > so I'd avoid dealing with it altogether here.) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org