[jira] [Commented] (MAPREDUCE-5907) Improve getSplits() performance for fs implementations that can utilize performance gains from recursive listing
[ https://issues.apache.org/jira/browse/MAPREDUCE-5907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17469445#comment-17469445 ] Hadoop QA commented on MAPREDUCE-5907: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Logfile || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 9s{color} | {color:red}{color} | {color:red} MAPREDUCE-5907 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | MAPREDUCE-5907 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12648040/MAPREDUCE-5907-3.patch | | Console output | https://ci-hadoop.apache.org/job/PreCommit-MAPREDUCE-Build/87/console | | versions | git=2.17.1 | | Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org | This message was automatically generated. > Improve getSplits() performance for fs implementations that can utilize > performance gains from recursive listing > > > Key: MAPREDUCE-5907 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5907 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client >Affects Versions: 2.4.0 >Reporter: Sumit Kumar >Assignee: Sumit Kumar >Priority: Major > Attachments: MAPREDUCE-5907-2.patch, MAPREDUCE-5907-3.patch, > MAPREDUCE-5907.patch > > > FileInputFormat (both mapreduce and mapred implementations) use recursive > listing while calculating splits. They however do this by doing listing level > by level. That means to discover files in /foo/bar means they do listing at > /foo/bar first to get the immediate children, then make the same call on all > immediate children for /foo/bar to discover their immediate children and so > on. This doesn't scale well for object store based fs implementations like s3 > and swift because every listStatus call ends up being a webservice call to > backend. In cases where large number of files are considered for input, this > makes getSplits() call slow. > This patch adds a new set of recursive list apis that gives opportunity to > the fs implementations to optimize. The behavior remains the same for other > implementations (that is a default implementation is provided for other fs so > they don't have to implement anything new). However for objectstore based fs > implementations it provides a simple change to include recursive flag as true > (as shown in the patch) to improve listing performance. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-5907) Improve getSplits() performance for fs implementations that can utilize performance gains from recursive listing
[ https://issues.apache.org/jira/browse/MAPREDUCE-5907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17469434#comment-17469434 ] Steve Loughran commented on MAPREDUCE-5907: --- abfs now does incremental listing, but not deep ones > Improve getSplits() performance for fs implementations that can utilize > performance gains from recursive listing > > > Key: MAPREDUCE-5907 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5907 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client >Affects Versions: 2.4.0 >Reporter: Sumit Kumar >Assignee: Sumit Kumar >Priority: Major > Attachments: MAPREDUCE-5907-2.patch, MAPREDUCE-5907-3.patch, > MAPREDUCE-5907.patch > > > FileInputFormat (both mapreduce and mapred implementations) use recursive > listing while calculating splits. They however do this by doing listing level > by level. That means to discover files in /foo/bar means they do listing at > /foo/bar first to get the immediate children, then make the same call on all > immediate children for /foo/bar to discover their immediate children and so > on. This doesn't scale well for object store based fs implementations like s3 > and swift because every listStatus call ends up being a webservice call to > backend. In cases where large number of files are considered for input, this > makes getSplits() call slow. > This patch adds a new set of recursive list apis that gives opportunity to > the fs implementations to optimize. The behavior remains the same for other > implementations (that is a default implementation is provided for other fs so > they don't have to implement anything new). However for objectstore based fs > implementations it provides a simple change to include recursive flag as true > (as shown in the patch) to improve listing performance. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Work logged] (MAPREDUCE-7371) DistributedCache alternative APIs should not use DistributedCache APIs internally
[ https://issues.apache.org/jira/browse/MAPREDUCE-7371?focusedWorklogId=703896=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-703896 ] ASF GitHub Bot logged work on MAPREDUCE-7371: - Author: ASF GitHub Bot Created on: 05/Jan/22 12:09 Start Date: 05/Jan/22 12:09 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3855: URL: https://github.com/apache/hadoop/pull/3855#issuecomment-1005630630 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 1m 2s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 5 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 11m 44s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 29m 1s | | trunk passed | | +1 :green_heart: | compile | 24m 57s | | trunk passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | compile | 20m 48s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 3m 54s | | trunk passed | | +1 :green_heart: | mvnsite | 2m 58s | | trunk passed | | +1 :green_heart: | javadoc | 2m 23s | | trunk passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | javadoc | 2m 11s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 4m 22s | | trunk passed | | +1 :green_heart: | shadedclient | 23m 38s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 25s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 1m 54s | | the patch passed | | +1 :green_heart: | compile | 23m 46s | | the patch passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | javac | 23m 46s | | root-jdkUbuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04 with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04 generated 0 new + 1863 unchanged - 67 fixed = 1863 total (was 1930) | | +1 :green_heart: | compile | 26m 8s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | javac | 26m 8s | | root-jdkPrivateBuild-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 generated 0 new + 1740 unchanged - 67 fixed = 1740 total (was 1807) | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 4m 3s | | root: The patch generated 0 new + 362 unchanged - 13 fixed = 362 total (was 375) | | +1 :green_heart: | mvnsite | 2m 51s | | the patch passed | | +1 :green_heart: | javadoc | 2m 16s | | the patch passed with JDK Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04 | | +1 :green_heart: | javadoc | 2m 10s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 6m 19s | | the patch passed | | +1 :green_heart: | shadedclient | 28m 35s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 6m 41s | | hadoop-mapreduce-client-core in the patch passed. | | +1 :green_heart: | unit | 1m 22s | | hadoop-mapreduce-client-common in the patch passed. | | +1 :green_heart: | unit | 141m 33s | | hadoop-mapreduce-client-jobclient in the patch passed. | | +1 :green_heart: | unit | 7m 55s | | hadoop-streaming in the patch passed. | | +1 :green_heart: | asflicense | 1m 5s | | The patch does not generate ASF License warnings. | | | | 387m 42s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3855/3/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3855 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux 8871ebdcb38b 4.15.0-163-generic #171-Ubuntu SMP Fri Nov 5 11:55:11 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | |
[jira] [Work logged] (MAPREDUCE-7371) DistributedCache alternative APIs should not use DistributedCache APIs internally
[ https://issues.apache.org/jira/browse/MAPREDUCE-7371?focusedWorklogId=703895=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-703895 ] ASF GitHub Bot logged work on MAPREDUCE-7371: - Author: ASF GitHub Bot Created on: 05/Jan/22 12:07 Start Date: 05/Jan/22 12:07 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on pull request #3855: URL: https://github.com/apache/hadoop/pull/3855#issuecomment-1005629624 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 1m 5s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 5 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 20m 46s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 28m 56s | | trunk passed | | +1 :green_heart: | compile | 31m 23s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | compile | 26m 37s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | checkstyle | 4m 33s | | trunk passed | | +1 :green_heart: | mvnsite | 3m 25s | | trunk passed | | +1 :green_heart: | javadoc | 2m 40s | | trunk passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 2m 24s | | trunk passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 5m 36s | | trunk passed | | +1 :green_heart: | shadedclient | 26m 29s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 25s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 1m 58s | | the patch passed | | +1 :green_heart: | compile | 23m 51s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javac | 23m 51s | | root-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 generated 0 new + 1863 unchanged - 67 fixed = 1863 total (was 1930) | | +1 :green_heart: | compile | 20m 40s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | javac | 20m 40s | | root-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 generated 0 new + 1740 unchanged - 67 fixed = 1740 total (was 1807) | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 3m 50s | | root: The patch generated 0 new + 363 unchanged - 13 fixed = 363 total (was 376) | | +1 :green_heart: | mvnsite | 2m 51s | | the patch passed | | +1 :green_heart: | javadoc | 2m 17s | | the patch passed with JDK Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 | | +1 :green_heart: | javadoc | 2m 12s | | the patch passed with JDK Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 | | +1 :green_heart: | spotbugs | 5m 4s | | the patch passed | | +1 :green_heart: | shadedclient | 23m 33s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 5m 45s | | hadoop-mapreduce-client-core in the patch passed. | | +1 :green_heart: | unit | 1m 10s | | hadoop-mapreduce-client-common in the patch passed. | | +1 :green_heart: | unit | 127m 24s | | hadoop-mapreduce-client-jobclient in the patch passed. | | +1 :green_heart: | unit | 6m 49s | | hadoop-streaming in the patch passed. | | +1 :green_heart: | asflicense | 0m 51s | | The patch does not generate ASF License warnings. | | | | 386m 29s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3855/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/3855 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell | | uname | Linux ae16b3032651 4.15.0-162-generic #170-Ubuntu SMP Mon Oct 18 11:38:05 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven |