[jira] [Commented] (MAPREDUCE-5907) Improve getSplits() performance for fs implementations that can utilize performance gains from recursive listing

2022-01-05 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17469445#comment-17469445
 ] 

Hadoop QA commented on MAPREDUCE-5907:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  9s{color} 
| {color:red}{color} | {color:red} MAPREDUCE-5907 does not apply to trunk. 
Rebase required? Wrong Branch? See 
https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | MAPREDUCE-5907 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12648040/MAPREDUCE-5907-3.patch
 |
| Console output | 
https://ci-hadoop.apache.org/job/PreCommit-MAPREDUCE-Build/87/console |
| versions | git=2.17.1 |
| Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org |


This message was automatically generated.



> Improve getSplits() performance for fs implementations that can utilize 
> performance gains from recursive listing
> 
>
> Key: MAPREDUCE-5907
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5907
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 2.4.0
>Reporter: Sumit Kumar
>Assignee: Sumit Kumar
>Priority: Major
> Attachments: MAPREDUCE-5907-2.patch, MAPREDUCE-5907-3.patch, 
> MAPREDUCE-5907.patch
>
>
> FileInputFormat (both mapreduce and mapred implementations) use recursive 
> listing while calculating splits. They however do this by doing listing level 
> by level. That means to discover files in /foo/bar means they do listing at 
> /foo/bar first to get the immediate children, then make the same call on all 
> immediate children for /foo/bar to discover their immediate children and so 
> on. This doesn't scale well for object store based fs implementations like s3 
> and swift because every listStatus call ends up being a webservice call to 
> backend. In cases where large number of files are considered for input, this 
> makes getSplits() call slow. 
> This patch adds a new set of recursive list apis that gives opportunity to 
> the fs implementations to optimize. The behavior remains the same for other 
> implementations (that is a default implementation is provided for other fs so 
> they don't have to implement anything new). However for objectstore based fs 
> implementations it provides a simple change to include recursive flag as true 
> (as shown in the patch) to improve listing performance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-5907) Improve getSplits() performance for fs implementations that can utilize performance gains from recursive listing

2022-01-05 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17469434#comment-17469434
 ] 

Steve Loughran commented on MAPREDUCE-5907:
---

abfs now does incremental listing, but not deep ones

> Improve getSplits() performance for fs implementations that can utilize 
> performance gains from recursive listing
> 
>
> Key: MAPREDUCE-5907
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5907
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 2.4.0
>Reporter: Sumit Kumar
>Assignee: Sumit Kumar
>Priority: Major
> Attachments: MAPREDUCE-5907-2.patch, MAPREDUCE-5907-3.patch, 
> MAPREDUCE-5907.patch
>
>
> FileInputFormat (both mapreduce and mapred implementations) use recursive 
> listing while calculating splits. They however do this by doing listing level 
> by level. That means to discover files in /foo/bar means they do listing at 
> /foo/bar first to get the immediate children, then make the same call on all 
> immediate children for /foo/bar to discover their immediate children and so 
> on. This doesn't scale well for object store based fs implementations like s3 
> and swift because every listStatus call ends up being a webservice call to 
> backend. In cases where large number of files are considered for input, this 
> makes getSplits() call slow. 
> This patch adds a new set of recursive list apis that gives opportunity to 
> the fs implementations to optimize. The behavior remains the same for other 
> implementations (that is a default implementation is provided for other fs so 
> they don't have to implement anything new). However for objectstore based fs 
> implementations it provides a simple change to include recursive flag as true 
> (as shown in the patch) to improve listing performance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Work logged] (MAPREDUCE-7371) DistributedCache alternative APIs should not use DistributedCache APIs internally

2022-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7371?focusedWorklogId=703896=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-703896
 ]

ASF GitHub Bot logged work on MAPREDUCE-7371:
-

Author: ASF GitHub Bot
Created on: 05/Jan/22 12:09
Start Date: 05/Jan/22 12:09
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3855:
URL: https://github.com/apache/hadoop/pull/3855#issuecomment-1005630630


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m  2s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 5 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  11m 44s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  29m  1s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  24m 57s |  |  trunk passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  compile  |  20m 48s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   3m 54s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m 58s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   2m 23s |  |  trunk passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   2m 11s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   4m 22s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  23m 38s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 25s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 54s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  23m 46s |  |  the patch passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javac  |  23m 46s |  |  
root-jdkUbuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04 with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04 generated 0 new + 1863 unchanged - 67 
fixed = 1863 total (was 1930)  |
   | +1 :green_heart: |  compile  |  26m  8s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |  26m  8s |  |  
root-jdkPrivateBuild-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 with JDK Private 
Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 generated 0 new + 1740 unchanged - 
67 fixed = 1740 total (was 1807)  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   4m  3s |  |  root: The patch generated 
0 new + 362 unchanged - 13 fixed = 362 total (was 375)  |
   | +1 :green_heart: |  mvnsite  |   2m 51s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   2m 16s |  |  the patch passed with JDK 
Ubuntu-11.0.13+8-Ubuntu-0ubuntu1.20.04  |
   | +1 :green_heart: |  javadoc  |   2m 10s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   6m 19s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  28m 35s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   6m 41s |  |  hadoop-mapreduce-client-core in 
the patch passed.  |
   | +1 :green_heart: |  unit  |   1m 22s |  |  hadoop-mapreduce-client-common 
in the patch passed.  |
   | +1 :green_heart: |  unit  | 141m 33s |  |  
hadoop-mapreduce-client-jobclient in the patch passed.  |
   | +1 :green_heart: |  unit  |   7m 55s |  |  hadoop-streaming in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   1m  5s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 387m 42s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3855/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3855 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 8871ebdcb38b 4.15.0-163-generic #171-Ubuntu SMP Fri Nov 5 
11:55:11 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | 

[jira] [Work logged] (MAPREDUCE-7371) DistributedCache alternative APIs should not use DistributedCache APIs internally

2022-01-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7371?focusedWorklogId=703895=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-703895
 ]

ASF GitHub Bot logged work on MAPREDUCE-7371:
-

Author: ASF GitHub Bot
Created on: 05/Jan/22 12:07
Start Date: 05/Jan/22 12:07
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on pull request #3855:
URL: https://github.com/apache/hadoop/pull/3855#issuecomment-1005629624


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m  5s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 5 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  20m 46s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  28m 56s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  31m 23s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |  26m 37s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   4m 33s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   3m 25s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   2m 40s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   2m 24s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   5m 36s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  26m 29s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 25s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   1m 58s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  23m 51s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |  23m 51s |  |  
root-jdkUbuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 generated 0 new + 1863 unchanged - 67 
fixed = 1863 total (was 1930)  |
   | +1 :green_heart: |  compile  |  20m 40s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |  20m 40s |  |  
root-jdkPrivateBuild-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 generated 0 new + 1740 unchanged - 
67 fixed = 1740 total (was 1807)  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   3m 50s |  |  root: The patch generated 
0 new + 363 unchanged - 13 fixed = 363 total (was 376)  |
   | +1 :green_heart: |  mvnsite  |   2m 51s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   2m 17s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   2m 12s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   5m  4s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  23m 33s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   5m 45s |  |  hadoop-mapreduce-client-core in 
the patch passed.  |
   | +1 :green_heart: |  unit  |   1m 10s |  |  hadoop-mapreduce-client-common 
in the patch passed.  |
   | +1 :green_heart: |  unit  | 127m 24s |  |  
hadoop-mapreduce-client-jobclient in the patch passed.  |
   | +1 :green_heart: |  unit  |   6m 49s |  |  hadoop-streaming in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 51s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 386m 29s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3855/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3855 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux ae16b3032651 4.15.0-162-generic #170-Ubuntu SMP Mon Oct 18 
11:38:05 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |