[jira] [Commented] (MAPREDUCE-6995) Uploader tool for Distributed Cache Deploy documentation

2018-01-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16331585#comment-16331585
 ] 

Hadoop QA commented on MAPREDUCE-6995:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
9s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
41s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 41s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
13s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
9s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 21s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
13s{color} | {color:green} hadoop-mapreduce-client-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
20s{color} | {color:green} hadoop-mapreduce-client-uploader in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 51m 43s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | MAPREDUCE-6995 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12906730/MAPREDUCE-6995.005.patch
 |
| Optional Tests |  asflicense  mvnsite  compile  javac  javadoc  mvninstall  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux e2fe8d8e26e2 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 37f4696 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7308/testReport/ |
| Max. process+thread count | 

[jira] [Updated] (MAPREDUCE-6995) Uploader tool for Distributed Cache Deploy documentation

2018-01-18 Thread Miklos Szegedi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi updated MAPREDUCE-6995:
--
Attachment: MAPREDUCE-6995.005.patch

> Uploader tool for Distributed Cache Deploy documentation
> 
>
> Key: MAPREDUCE-6995
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6995
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>Priority: Major
> Attachments: MAPREDUCE-6995.000.patch, MAPREDUCE-6995.001.patch, 
> MAPREDUCE-6995.002.patch, MAPREDUCE-6995.003.patch, MAPREDUCE-6995.004.patch, 
> MAPREDUCE-6995.005.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6995) Uploader tool for Distributed Cache Deploy documentation

2018-01-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16331486#comment-16331486
 ] 

Hadoop QA commented on MAPREDUCE-6995:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
10s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
42s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
38s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
29s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
52s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 42s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
20s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
16s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
48s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
36s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 24s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
14s{color} | {color:green} hadoop-mapreduce-client-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
20s{color} | {color:green} hadoop-mapreduce-client-uploader in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 52m 23s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | MAPREDUCE-6995 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12906722/MAPREDUCE-6995.004.patch
 |
| Optional Tests |  asflicense  mvnsite  compile  javac  javadoc  mvninstall  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 6f73a499fe36 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 37f4696 |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7307/testReport/ |
| Max. process+thread count | 

[jira] [Commented] (MAPREDUCE-6995) Uploader tool for Distributed Cache Deploy documentation

2018-01-18 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16331418#comment-16331418
 ] 

Miklos Szegedi commented on MAPREDUCE-6995:
---

Thank you for the review, [~rkanter]. I updated the patch.

 

> Uploader tool for Distributed Cache Deploy documentation
> 
>
> Key: MAPREDUCE-6995
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6995
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>Priority: Major
> Attachments: MAPREDUCE-6995.000.patch, MAPREDUCE-6995.001.patch, 
> MAPREDUCE-6995.002.patch, MAPREDUCE-6995.003.patch, MAPREDUCE-6995.004.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6995) Uploader tool for Distributed Cache Deploy documentation

2018-01-18 Thread Miklos Szegedi (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Szegedi updated MAPREDUCE-6995:
--
Attachment: MAPREDUCE-6995.004.patch

> Uploader tool for Distributed Cache Deploy documentation
> 
>
> Key: MAPREDUCE-6995
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6995
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>Priority: Major
> Attachments: MAPREDUCE-6995.000.patch, MAPREDUCE-6995.001.patch, 
> MAPREDUCE-6995.002.patch, MAPREDUCE-6995.003.patch, MAPREDUCE-6995.004.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (MAPREDUCE-6995) Uploader tool for Distributed Cache Deploy documentation

2018-01-18 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16331177#comment-16331177
 ] 

Robert Kanter edited comment on MAPREDUCE-6995 at 1/18/18 8:41 PM:
---

Some minor comments:
 # "The tool then returns a suggestion how to set ..." should be "The tool then 
returns a suggestion *of* how to set ..."
 # "Defaults to the default filesystem set by fs.defaultFS." - fs.defaultFs 
should have the ` to make it monospaced.
 # Can you double check that paths starting with just {{hdfs:/}} work? (as 
opposed to {{hdfs://}})
 # The explanations of the arguments (e.g. {{-initialReplication}}) should have 
the ` around the argument names to make them monospaced.
 # "If this value is set to low like a constant 10 ..." is more clear and 
consistent with the next sentence if it's written as "If this is set to a low 
value like 10 ..."
 # The description for {{-acceptableReplication}} should say what it actually 
does, not just the requirements for it. Something like "The tool will wait 
until the tarball has been replicated this number of times before exiting."
 # "The frameworkuploader tool has the following arguments to control, which 
jars end up in the framework tarball:" should not have the comma.
 # "This is an input classpath that is iterated through. jars files found will 
be added to the tarball. Defaults to the classpath." should be "This is *the* 
input classpath to source jar files from to add to the tarball. *It defaults to 
the classpath as returned by the {{hadoop classpath}} command.*"
 # "This is a comma separated regex array to filter the jar file names to 
include from the class path." should be "This is a comma separated regex array 
to filter the jar file names to *exclude* from the class path."
 # It might be good to give an example of {{-nosymlink}} as it's not clear just 
from the description of what this does. Something like "For example, 
{{/a/foo.jar}} and a symlink {{/a/bar.jar}} that points to {{/a/foo.jar}} would 
normally add foo.jar and bar.jar to the tarball as separate files despite them 
actually being the same file. This flag would make the tool exclude 
{{/a/bar.jar}} so only one copy of the file is added."
 # In the {{testExplicitFilesystem}} test, we should have a similar test where 
{{-target}} and {{FS_DEFAULT_NAME_KEY}} have different filesystems.
 # The {{frameworkupload}} command should be added to the MapredCommands.md file


was (Author: rkanter):
Some minor comments:
# "The tool then returns a suggestion how to set ..." should be "The tool then 
returns a suggestion *of* how to set ..."
# "Defaults to the default filesystem set by fs.defaultFS." - fs.defaultFs 
should have the ` to make it monospaced.
# Can you double check that paths starting with just {{hdfs:/}} work?  (as 
opposed to {{hdfs://}})
# The explanations of the arguments (e.g. {{-initialReplication}}) should have 
the ` around the argument names to make them monospaced.
# "If this value is set to low like a constant 10 ..." is more clear and 
consistent with the next sentence if it's written as "If this is set to a low 
value like 10 ..."
# The description for {{-acceptableReplication}} should say what it actually 
does, not just the requirements for it.  Something like "The tool will wait 
until the tarball has been replicated this number of times before exiting."
# "The frameworkuploader tool has the following arguments to control, which 
jars end up in the framework tarball:" should not have the comma.
# "This is an input classpath that is iterated through. jars files found will 
be added to the tarball. Defaults to the classpath." should be "This is *the* 
input classpath to source jar files from to add to the tarball. *It defaults to 
the classpath as returned by the {{hadoop classpath}} command.*"
# "This is a comma separated regex array to filter the jar file names to 
include from the class path." should be "This is a comma separated regex array 
to filter the jar file names to *exclude* from the class path."
# It might be good to give an example of {{-nosymlink}} as it's not clear just 
from the description of what this does.  Something like "For example, 
{{/a/foo.jar}} and a symlink {{/a/bar.jar}} that points to {{/a/foo.jar}} would 
normally add foo.jar and bar.jar to the tarball as separate files despite them 
actually being the same file. This flag would make the tool exclude 
{{/a/bar.jar}} so only one copy of the file is added."
# In the {{testExplicitFilesystem}} test, we should have a similar test where 
{{-target}} and {{FS_DEFAULT_NAME_KEY}} have different filesystems.

> Uploader tool for Distributed Cache Deploy documentation
> 
>
> Key: MAPREDUCE-6995
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6995
> Project: Hadoop Map/Reduce
>  

[jira] [Commented] (MAPREDUCE-6995) Uploader tool for Distributed Cache Deploy documentation

2018-01-18 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16331177#comment-16331177
 ] 

Robert Kanter commented on MAPREDUCE-6995:
--

Some minor comments:
# "The tool then returns a suggestion how to set ..." should be "The tool then 
returns a suggestion *of* how to set ..."
# "Defaults to the default filesystem set by fs.defaultFS." - fs.defaultFs 
should have the ` to make it monospaced.
# Can you double check that paths starting with just {{hdfs:/}} work?  (as 
opposed to {{hdfs://}})
# The explanations of the arguments (e.g. {{-initialReplication}}) should have 
the ` around the argument names to make them monospaced.
# "If this value is set to low like a constant 10 ..." is more clear and 
consistent with the next sentence if it's written as "If this is set to a low 
value like 10 ..."
# The description for {{-acceptableReplication}} should say what it actually 
does, not just the requirements for it.  Something like "The tool will wait 
until the tarball has been replicated this number of times before exiting."
# "The frameworkuploader tool has the following arguments to control, which 
jars end up in the framework tarball:" should not have the comma.
# "This is an input classpath that is iterated through. jars files found will 
be added to the tarball. Defaults to the classpath." should be "This is *the* 
input classpath to source jar files from to add to the tarball. *It defaults to 
the classpath as returned by the {{hadoop classpath}} command.*"
# "This is a comma separated regex array to filter the jar file names to 
include from the class path." should be "This is a comma separated regex array 
to filter the jar file names to *exclude* from the class path."
# It might be good to give an example of {{-nosymlink}} as it's not clear just 
from the description of what this does.  Something like "For example, 
{{/a/foo.jar}} and a symlink {{/a/bar.jar}} that points to {{/a/foo.jar}} would 
normally add foo.jar and bar.jar to the tarball as separate files despite them 
actually being the same file. This flag would make the tool exclude 
{{/a/bar.jar}} so only one copy of the file is added."
# In the {{testExplicitFilesystem}} test, we should have a similar test where 
{{-target}} and {{FS_DEFAULT_NAME_KEY}} have different filesystems.

> Uploader tool for Distributed Cache Deploy documentation
> 
>
> Key: MAPREDUCE-6995
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6995
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Miklos Szegedi
>Assignee: Miklos Szegedi
>Priority: Major
> Attachments: MAPREDUCE-6995.000.patch, MAPREDUCE-6995.001.patch, 
> MAPREDUCE-6995.002.patch, MAPREDUCE-6995.003.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7029) FileOutputCommitter is slow on filesystems lacking recursive delete

2018-01-18 Thread Karthik Palaniappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Palaniappan updated MAPREDUCE-7029:
---
Target Version/s:   (was: 3.0.1, 2.8.4, 2.7.6)

> FileOutputCommitter is slow on filesystems lacking recursive delete
> ---
>
> Key: MAPREDUCE-7029
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7029
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.8.2
> Environment: - Google Cloud Storage (with the GCS connector: 
> https://github.com/GoogleCloudPlatform/bigdata-interop/tree/master/gcs) for 
> HCFS compatibility.
> - FileOutputCommitter algorithm v2.
> - Running on Google Compute Engine with Java 8, Debian 8, Hadoop 2.8.2, Spark 
> 2.2.0.
>Reporter: Karthik Palaniappan
>Assignee: Karthik Palaniappan
>Priority: Minor
> Fix For: 3.1.0, 2.10.0
>
> Attachments: MAPREDUCE-7029-branch-2.004.patch, 
> MAPREDUCE-7029-branch-2.005.patch, MAPREDUCE-7029-branch-2.005.patch, 
> MAPREDUCE-7029.001.patch, MAPREDUCE-7029.002.patch, MAPREDUCE-7029.003.patch, 
> MAPREDUCE-7029.004.patch, MAPREDUCE-7029.005.patch
>
>
> I ran a Spark job that outputs thousands of parquet files (aka there are 
> thousands of reducers), and it hung for several minutes in the driver after 
> all tasks were complete. Here is a very simple repro of the job (to be run in 
> a spark-shell):
> {code:scala}
> spark.range(1L << 20).repartition(1 << 14).write.save("gs://some/path")
> {code}
> Spark actually calls into Mapreduce's FileOuputCommitter. Job committing 
> (specifically cleanupJob()) recursively deletes the job temporary directory, 
> which is something like "gs://some/path/_temporary". If I understand 
> correctly, on HDFS, this would be O(1), but on GCS (and every HCFS I know), 
> this requires a full file tree walk. Deleting tens of thousands of objects in 
> GCS takes several minutes.
> I propose that commitTask() recursively deletes its the task attempt temp 
> directory (something like "gs://some/path/_temporary/attempt1/task1"). On 
> HDFS, this is O(1) per task, so this is very little overhead per task. On GCS 
> (and other HCFSs), this adds parallelism for deleting the job temp directory.
> With the attached patch, the repro above went from taking ~10 minutes to 
> taking ~5 minutes, and task time did not significantly change.
> Side note: I found this issue with Spark, but I assume it applies to a 
> Mapreduce job with thousands of reducers as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7029) FileOutputCommitter is slow on filesystems lacking recursive delete

2018-01-18 Thread Karthik Palaniappan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16331068#comment-16331068
 ] 

Karthik Palaniappan commented on MAPREDUCE-7029:


Directory rename is indeed not atomic – iirc GCS did not write a separate 
output committer because the performance of rename is tolerable O(files) – 
compared to S3's O(data-per-file * files).

Correct me if I'm wrong, but I don't think FileOutputCommiter actually requires 
atomicity.

1) Task commit is already non-atomic: commitTask() calls mergePaths(), which is 
essentially a recursive copy of the attempt directory, *not* an atomic rename 
of the attempt directory. That being said, if the output files have the same 
names across different task attempts (e.g. speculative execution), this is 
still okay, as later tasks will just overwrite older task files with the same 
contents.

2) Job commit is marked by a _SUCCESS file, so it's okay if the directory 
rename is non-atomic.

That being said, I agree that in the long term, parts of the Hadoop ecosystem 
that assume POSIX-ish directory semantics should have different implementations 
for object stores. This is not limited to OutputCommitter.

> FileOutputCommitter is slow on filesystems lacking recursive delete
> ---
>
> Key: MAPREDUCE-7029
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7029
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.8.2
> Environment: - Google Cloud Storage (with the GCS connector: 
> https://github.com/GoogleCloudPlatform/bigdata-interop/tree/master/gcs) for 
> HCFS compatibility.
> - FileOutputCommitter algorithm v2.
> - Running on Google Compute Engine with Java 8, Debian 8, Hadoop 2.8.2, Spark 
> 2.2.0.
>Reporter: Karthik Palaniappan
>Assignee: Karthik Palaniappan
>Priority: Minor
> Fix For: 3.1.0, 2.10.0
>
> Attachments: MAPREDUCE-7029-branch-2.004.patch, 
> MAPREDUCE-7029-branch-2.005.patch, MAPREDUCE-7029-branch-2.005.patch, 
> MAPREDUCE-7029.001.patch, MAPREDUCE-7029.002.patch, MAPREDUCE-7029.003.patch, 
> MAPREDUCE-7029.004.patch, MAPREDUCE-7029.005.patch
>
>
> I ran a Spark job that outputs thousands of parquet files (aka there are 
> thousands of reducers), and it hung for several minutes in the driver after 
> all tasks were complete. Here is a very simple repro of the job (to be run in 
> a spark-shell):
> {code:scala}
> spark.range(1L << 20).repartition(1 << 14).write.save("gs://some/path")
> {code}
> Spark actually calls into Mapreduce's FileOuputCommitter. Job committing 
> (specifically cleanupJob()) recursively deletes the job temporary directory, 
> which is something like "gs://some/path/_temporary". If I understand 
> correctly, on HDFS, this would be O(1), but on GCS (and every HCFS I know), 
> this requires a full file tree walk. Deleting tens of thousands of objects in 
> GCS takes several minutes.
> I propose that commitTask() recursively deletes its the task attempt temp 
> directory (something like "gs://some/path/_temporary/attempt1/task1"). On 
> HDFS, this is O(1) per task, so this is very little overhead per task. On GCS 
> (and other HCFSs), this adds parallelism for deleting the job temp directory.
> With the attached patch, the repro above went from taking ~10 minutes to 
> taking ~5 minutes, and task time did not significantly change.
> Side note: I found this issue with Spark, but I assume it applies to a 
> Mapreduce job with thousands of reducers as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6995) Uploader tool for Distributed Cache Deploy documentation

2018-01-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16331043#comment-16331043
 ] 

Hadoop QA commented on MAPREDUCE-6995:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 10m 
50s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
9s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:red}-1{color} | {color:red} mvninstall {color} | {color:red}  0m 
31s{color} | {color:red} root in trunk failed. {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
50s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
47s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 31s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
36s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m  
8s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
46s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m  9s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
20s{color} | {color:green} hadoop-mapreduce-client-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
19s{color} | {color:green} hadoop-mapreduce-client-uploader in the patch 
passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 48m 23s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | MAPREDUCE-6995 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12906680/MAPREDUCE-6995.003.patch
 |
| Optional Tests |  asflicense  mvnsite  compile  javac  javadoc  mvninstall  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux 1d8236b6fc9c 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 06cceba |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| mvninstall | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/7306/artifact/out/branch-mvninstall-root.txt
 |
| findbugs | v3.1.0-RC1 |
|