[jira] [Commented] (MAPREDUCE-6278) Multithreaded maven build breaks in hadoop-mapreduce-client-core

2018-02-12 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361865#comment-16361865
 ] 

Chris Douglas commented on MAPREDUCE-6278:
--

bq. the patch just works
Well... the patch may work. Without a principled reason to believe it prevents 
the race, we know only that some executions didn't lose it. Moreover, if some 
accident in the build makes a dependency on 
{{hadoop-yarn-applications-distributedshell}} sufficient today, then future 
changes may accidentally break it.

bq. I think ideally we need to put every leaf modules as dependencies of the 
root submodule
This we could explain, at least. I'm not sure if it's necessary or if a better 
pattern exists. Please add a comment to the pom to explain why the dependencies 
are listed explicitly.

> Multithreaded maven build breaks in hadoop-mapreduce-client-core
> 
>
> Key: MAPREDUCE-6278
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6278
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.9.0
> Environment: Linux (Fedora 21)
>Reporter: Ewan Higgs
>Assignee: Duo Xu
>Priority: Major
> Attachments: MAPREDUCE-6278.01.patch, MAPREDUCE-6278.02.patch
>
>
> [As reported on the mailing 
> list|http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.user/52231].
> The following breaks:
> {{mvn -e package -DskipTests -Dmaven.javadoc.skip -Dtar -Pdist,native -T5}}
> ...
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-assembly-plugin:2.4:single (package-mapreduce) 
> on project hadoop-mapreduce: Failed to create assembly: Artifact: 
> org.apache.hadoop:hadoop-mapreduce-client-core:jar:3.0.0-SNAPSHOT (included 
> by module) does not have an artifact with a file. Please ensure the package 
> phase is run before the assembly is generated. -> [Help 1]
> org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute 
> goal org.apache.maven.plugins:maven-assembly-plugin:2.4:single 
> (package-mapreduce) on project hadoop-mapreduce: Failed to create assembly: 
> Artifact: org.apache.hadoop:hadoop-mapreduce-client-core:jar:3.0.0-SNAPSHOT 
> (included by module) does not have an artifact with a file. Please ensure the 
> package phase is run before the assembly is generated.
>   at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:216)
>   at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
>   at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
>   at 
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
>   at 
> org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call(MultiThreadedBuilder.java:188)
>   at 
> org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call(MultiThreadedBuilder.java:184)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.maven.plugin.MojoExecutionException: Failed to create 
> assembly: Artifact: 
> org.apache.hadoop:hadoop-mapreduce-client-core:jar:3.0.0-SNAPSHOT (included 
> by module) does not have an artifact with a file. Please ensure the package 
> phase is run before the assembly is generated.
>   at 
> org.apache.maven.plugin.assembly.mojos.AbstractAssemblyMojo.execute(AbstractAssemblyMojo.java:495)
>   at 
> org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:132)
>   at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
>   ... 11 more
> Caused by: org.apache.maven.plugin.assembly.archive.ArchiveCreationException: 
> Artifact: org.apache.hadoop:hadoop-mapreduce-client-core:jar:3.0.0-SNAPSHOT 
> (included by module) does not have an artifact with a file. Please ensure the 
> package phase is run before the assembly is generated.
>   at 
> org.apache.maven.plugin.assembly.archive.phase.ModuleSetAssemblyPhase.addModuleArtifact(ModuleSetAssemblyPhase.java:318)
>   at 
> org.apache.maven.plugin.assembly.archive.phase.ModuleSetAssemblyPhase.addModuleBinaries(ModuleSetAssemblyPhase.java:228)
>   at 
> org.apache.maven.plugin.assembly.archive.phase.ModuleSetAssemblyPhase.execute(ModuleSetAssemblyPhase.java:111)
>   at 
> 

[jira] [Commented] (MAPREDUCE-6278) Multithreaded maven build breaks in hadoop-mapreduce-client-core

2018-02-09 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359181#comment-16359181
 ] 

Chris Douglas commented on MAPREDUCE-6278:
--

I reproduced the problem on branch-2 and in trunk. The patch works as intended 
on branch-2, but in trunk it caused an error because the 
{{hadoop-mapreduce-client-nativetask}} and {{hadoop-mapreduce-client-uploader}} 
modules weren't included in the pom. Updated patch to include these.

After reverting the new dependency in {{yarn-project}} on 
{{hadoop-yarn-applications-distributedshell}}, I couldn't reproduce build 
errors on the trunk version. Is there a reason this particular application 
requires special handling among {{hadoop-yarn-applications}}?

> Multithreaded maven build breaks in hadoop-mapreduce-client-core
> 
>
> Key: MAPREDUCE-6278
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6278
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.9.0
> Environment: Linux (Fedora 21)
>Reporter: Ewan Higgs
>Assignee: Duo Xu
>Priority: Major
> Attachments: MAPREDUCE-6278.01.patch, MAPREDUCE-6278.02.patch
>
>
> [As reported on the mailing 
> list|http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.user/52231].
> The following breaks:
> {{mvn -e package -DskipTests -Dmaven.javadoc.skip -Dtar -Pdist,native -T5}}
> ...
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-assembly-plugin:2.4:single (package-mapreduce) 
> on project hadoop-mapreduce: Failed to create assembly: Artifact: 
> org.apache.hadoop:hadoop-mapreduce-client-core:jar:3.0.0-SNAPSHOT (included 
> by module) does not have an artifact with a file. Please ensure the package 
> phase is run before the assembly is generated. -> [Help 1]
> org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute 
> goal org.apache.maven.plugins:maven-assembly-plugin:2.4:single 
> (package-mapreduce) on project hadoop-mapreduce: Failed to create assembly: 
> Artifact: org.apache.hadoop:hadoop-mapreduce-client-core:jar:3.0.0-SNAPSHOT 
> (included by module) does not have an artifact with a file. Please ensure the 
> package phase is run before the assembly is generated.
>   at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:216)
>   at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
>   at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
>   at 
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
>   at 
> org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call(MultiThreadedBuilder.java:188)
>   at 
> org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call(MultiThreadedBuilder.java:184)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.maven.plugin.MojoExecutionException: Failed to create 
> assembly: Artifact: 
> org.apache.hadoop:hadoop-mapreduce-client-core:jar:3.0.0-SNAPSHOT (included 
> by module) does not have an artifact with a file. Please ensure the package 
> phase is run before the assembly is generated.
>   at 
> org.apache.maven.plugin.assembly.mojos.AbstractAssemblyMojo.execute(AbstractAssemblyMojo.java:495)
>   at 
> org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:132)
>   at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
>   ... 11 more
> Caused by: org.apache.maven.plugin.assembly.archive.ArchiveCreationException: 
> Artifact: org.apache.hadoop:hadoop-mapreduce-client-core:jar:3.0.0-SNAPSHOT 
> (included by module) does not have an artifact with a file. Please ensure the 
> package phase is run before the assembly is generated.
>   at 
> org.apache.maven.plugin.assembly.archive.phase.ModuleSetAssemblyPhase.addModuleArtifact(ModuleSetAssemblyPhase.java:318)
>   at 
> org.apache.maven.plugin.assembly.archive.phase.ModuleSetAssemblyPhase.addModuleBinaries(ModuleSetAssemblyPhase.java:228)
>   at 
> org.apache.maven.plugin.assembly.archive.phase.ModuleSetAssemblyPhase.execute(ModuleSetAssemblyPhase.java:111)
>   at 
> 

[jira] [Updated] (MAPREDUCE-6278) Multithreaded maven build breaks in hadoop-mapreduce-client-core

2018-02-09 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-6278:
-
Attachment: MAPREDUCE-6278.02.patch

> Multithreaded maven build breaks in hadoop-mapreduce-client-core
> 
>
> Key: MAPREDUCE-6278
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6278
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.9.0
> Environment: Linux (Fedora 21)
>Reporter: Ewan Higgs
>Assignee: Duo Xu
>Priority: Major
> Attachments: MAPREDUCE-6278.01.patch, MAPREDUCE-6278.02.patch
>
>
> [As reported on the mailing 
> list|http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.user/52231].
> The following breaks:
> {{mvn -e package -DskipTests -Dmaven.javadoc.skip -Dtar -Pdist,native -T5}}
> ...
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-assembly-plugin:2.4:single (package-mapreduce) 
> on project hadoop-mapreduce: Failed to create assembly: Artifact: 
> org.apache.hadoop:hadoop-mapreduce-client-core:jar:3.0.0-SNAPSHOT (included 
> by module) does not have an artifact with a file. Please ensure the package 
> phase is run before the assembly is generated. -> [Help 1]
> org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute 
> goal org.apache.maven.plugins:maven-assembly-plugin:2.4:single 
> (package-mapreduce) on project hadoop-mapreduce: Failed to create assembly: 
> Artifact: org.apache.hadoop:hadoop-mapreduce-client-core:jar:3.0.0-SNAPSHOT 
> (included by module) does not have an artifact with a file. Please ensure the 
> package phase is run before the assembly is generated.
>   at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:216)
>   at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
>   at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
>   at 
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
>   at 
> org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call(MultiThreadedBuilder.java:188)
>   at 
> org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call(MultiThreadedBuilder.java:184)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.maven.plugin.MojoExecutionException: Failed to create 
> assembly: Artifact: 
> org.apache.hadoop:hadoop-mapreduce-client-core:jar:3.0.0-SNAPSHOT (included 
> by module) does not have an artifact with a file. Please ensure the package 
> phase is run before the assembly is generated.
>   at 
> org.apache.maven.plugin.assembly.mojos.AbstractAssemblyMojo.execute(AbstractAssemblyMojo.java:495)
>   at 
> org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:132)
>   at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
>   ... 11 more
> Caused by: org.apache.maven.plugin.assembly.archive.ArchiveCreationException: 
> Artifact: org.apache.hadoop:hadoop-mapreduce-client-core:jar:3.0.0-SNAPSHOT 
> (included by module) does not have an artifact with a file. Please ensure the 
> package phase is run before the assembly is generated.
>   at 
> org.apache.maven.plugin.assembly.archive.phase.ModuleSetAssemblyPhase.addModuleArtifact(ModuleSetAssemblyPhase.java:318)
>   at 
> org.apache.maven.plugin.assembly.archive.phase.ModuleSetAssemblyPhase.addModuleBinaries(ModuleSetAssemblyPhase.java:228)
>   at 
> org.apache.maven.plugin.assembly.archive.phase.ModuleSetAssemblyPhase.execute(ModuleSetAssemblyPhase.java:111)
>   at 
> org.apache.maven.plugin.assembly.archive.DefaultAssemblyArchiver.createArchive(DefaultAssemblyArchiver.java:183)
>   at 
> org.apache.maven.plugin.assembly.mojos.AbstractAssemblyMojo.execute(AbstractAssemblyMojo.java:436)
>   ... 13 more
> {code}
> Dmitry Siminov appears to be building on Windows. I'm using Linux.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6278) Multithreaded maven build breaks in hadoop-mapreduce-client-core

2018-02-06 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354875#comment-16354875
 ] 

Chris Douglas commented on MAPREDUCE-6278:
--

Thanks for the patch. Can you explain how it resolves the issue?

> Multithreaded maven build breaks in hadoop-mapreduce-client-core
> 
>
> Key: MAPREDUCE-6278
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6278
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.9.0
> Environment: Linux (Fedora 21)
>Reporter: Ewan Higgs
>Assignee: Duo Xu
>Priority: Major
> Attachments: MAPREDUCE-6278.01.patch
>
>
> [As reported on the mailing 
> list|http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.user/52231].
> The following breaks:
> {{mvn -e package -DskipTests -Dmaven.javadoc.skip -Dtar -Pdist,native -T5}}
> ...
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-assembly-plugin:2.4:single (package-mapreduce) 
> on project hadoop-mapreduce: Failed to create assembly: Artifact: 
> org.apache.hadoop:hadoop-mapreduce-client-core:jar:3.0.0-SNAPSHOT (included 
> by module) does not have an artifact with a file. Please ensure the package 
> phase is run before the assembly is generated. -> [Help 1]
> org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute 
> goal org.apache.maven.plugins:maven-assembly-plugin:2.4:single 
> (package-mapreduce) on project hadoop-mapreduce: Failed to create assembly: 
> Artifact: org.apache.hadoop:hadoop-mapreduce-client-core:jar:3.0.0-SNAPSHOT 
> (included by module) does not have an artifact with a file. Please ensure the 
> package phase is run before the assembly is generated.
>   at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:216)
>   at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
>   at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
>   at 
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
>   at 
> org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call(MultiThreadedBuilder.java:188)
>   at 
> org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call(MultiThreadedBuilder.java:184)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.maven.plugin.MojoExecutionException: Failed to create 
> assembly: Artifact: 
> org.apache.hadoop:hadoop-mapreduce-client-core:jar:3.0.0-SNAPSHOT (included 
> by module) does not have an artifact with a file. Please ensure the package 
> phase is run before the assembly is generated.
>   at 
> org.apache.maven.plugin.assembly.mojos.AbstractAssemblyMojo.execute(AbstractAssemblyMojo.java:495)
>   at 
> org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:132)
>   at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
>   ... 11 more
> Caused by: org.apache.maven.plugin.assembly.archive.ArchiveCreationException: 
> Artifact: org.apache.hadoop:hadoop-mapreduce-client-core:jar:3.0.0-SNAPSHOT 
> (included by module) does not have an artifact with a file. Please ensure the 
> package phase is run before the assembly is generated.
>   at 
> org.apache.maven.plugin.assembly.archive.phase.ModuleSetAssemblyPhase.addModuleArtifact(ModuleSetAssemblyPhase.java:318)
>   at 
> org.apache.maven.plugin.assembly.archive.phase.ModuleSetAssemblyPhase.addModuleBinaries(ModuleSetAssemblyPhase.java:228)
>   at 
> org.apache.maven.plugin.assembly.archive.phase.ModuleSetAssemblyPhase.execute(ModuleSetAssemblyPhase.java:111)
>   at 
> org.apache.maven.plugin.assembly.archive.DefaultAssemblyArchiver.createArchive(DefaultAssemblyArchiver.java:183)
>   at 
> org.apache.maven.plugin.assembly.mojos.AbstractAssemblyMojo.execute(AbstractAssemblyMojo.java:436)
>   ... 13 more
> {code}
> Dmitry Siminov appears to be building on Windows. I'm using Linux.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6278) Multithreaded maven build breaks in hadoop-mapreduce-client-core

2018-02-05 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353374#comment-16353374
 ] 

Chris Douglas commented on MAPREDUCE-6278:
--

I added you as a contributor on the MAPREDUCE project. You should be able to 
upload a patch (More > Attach Files), submit it to the CI infra (Submit Patch), 
and assign this JIRA to yourself.

> Multithreaded maven build breaks in hadoop-mapreduce-client-core
> 
>
> Key: MAPREDUCE-6278
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6278
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.4.0, 2.5.0, 3.0.0-alpha1
> Environment: Linux (Fedora 21)
>Reporter: Ewan Higgs
>Priority: Major
>
> [As reported on the mailing 
> list|http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.user/52231].
> The following breaks:
> {{mvn -e package -DskipTests -Dmaven.javadoc.skip -Dtar -Pdist,native -T5}}
> ...
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-assembly-plugin:2.4:single (package-mapreduce) 
> on project hadoop-mapreduce: Failed to create assembly: Artifact: 
> org.apache.hadoop:hadoop-mapreduce-client-core:jar:3.0.0-SNAPSHOT (included 
> by module) does not have an artifact with a file. Please ensure the package 
> phase is run before the assembly is generated. -> [Help 1]
> org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute 
> goal org.apache.maven.plugins:maven-assembly-plugin:2.4:single 
> (package-mapreduce) on project hadoop-mapreduce: Failed to create assembly: 
> Artifact: org.apache.hadoop:hadoop-mapreduce-client-core:jar:3.0.0-SNAPSHOT 
> (included by module) does not have an artifact with a file. Please ensure the 
> package phase is run before the assembly is generated.
>   at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:216)
>   at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
>   at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
>   at 
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
>   at 
> org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call(MultiThreadedBuilder.java:188)
>   at 
> org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call(MultiThreadedBuilder.java:184)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.maven.plugin.MojoExecutionException: Failed to create 
> assembly: Artifact: 
> org.apache.hadoop:hadoop-mapreduce-client-core:jar:3.0.0-SNAPSHOT (included 
> by module) does not have an artifact with a file. Please ensure the package 
> phase is run before the assembly is generated.
>   at 
> org.apache.maven.plugin.assembly.mojos.AbstractAssemblyMojo.execute(AbstractAssemblyMojo.java:495)
>   at 
> org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:132)
>   at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
>   ... 11 more
> Caused by: org.apache.maven.plugin.assembly.archive.ArchiveCreationException: 
> Artifact: org.apache.hadoop:hadoop-mapreduce-client-core:jar:3.0.0-SNAPSHOT 
> (included by module) does not have an artifact with a file. Please ensure the 
> package phase is run before the assembly is generated.
>   at 
> org.apache.maven.plugin.assembly.archive.phase.ModuleSetAssemblyPhase.addModuleArtifact(ModuleSetAssemblyPhase.java:318)
>   at 
> org.apache.maven.plugin.assembly.archive.phase.ModuleSetAssemblyPhase.addModuleBinaries(ModuleSetAssemblyPhase.java:228)
>   at 
> org.apache.maven.plugin.assembly.archive.phase.ModuleSetAssemblyPhase.execute(ModuleSetAssemblyPhase.java:111)
>   at 
> org.apache.maven.plugin.assembly.archive.DefaultAssemblyArchiver.createArchive(DefaultAssemblyArchiver.java:183)
>   at 
> org.apache.maven.plugin.assembly.mojos.AbstractAssemblyMojo.execute(AbstractAssemblyMojo.java:436)
>   ... 13 more
> {code}
> Dmitry Siminov appears to be building on Windows. I'm using Linux.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For 

[jira] [Updated] (MAPREDUCE-7016) Avoid making separate RPC calls for FileStatus and block locations in FileInputFormat

2017-11-28 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-7016:
-
Description: {{FileInputFormat::getSplits}} uses {{FileSystem::globStatus}} 
to determine its inputs. When the glob returns directories, each is traversed 
and {{LocatedFileStatus}} instances are returned with the block locations. 
However, when the glob returns files, this is a {{FileStatus}} that requires a 
second RPC to obtain its locations.  (was: {{FileInputFormat::getSplits}} uses 
{{FileSystem::globStatus}} to determine its inputs. When the glob returns 
directories, each is traversed and {{LocatedFileStatus}} instances are returned 
with the block locations. However, when the glob returns files, each requires a 
second RPC to obtain its locations.)

> Avoid making separate RPC calls for FileStatus and block locations in 
> FileInputFormat
> -
>
> Key: MAPREDUCE-7016
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7016
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Chris Douglas
>
> {{FileInputFormat::getSplits}} uses {{FileSystem::globStatus}} to determine 
> its inputs. When the glob returns directories, each is traversed and 
> {{LocatedFileStatus}} instances are returned with the block locations. 
> However, when the glob returns files, this is a {{FileStatus}} that requires 
> a second RPC to obtain its locations.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-7016) Avoid making separate RPC calls for FileStatus and block locations in FileInputFormat

2017-11-28 Thread Chris Douglas (JIRA)
Chris Douglas created MAPREDUCE-7016:


 Summary: Avoid making separate RPC calls for FileStatus and block 
locations in FileInputFormat
 Key: MAPREDUCE-7016
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7016
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Chris Douglas


{{FileInputFormat::getSplits}} uses {{FileSystem::globStatus}} to determine its 
inputs. When the glob returns directories, each is traversed and 
{{LocatedFileStatus}} instances are returned with the block locations. However, 
when the glob returns files, each requires a second RPC to obtain its locations.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (MAPREDUCE-7013) Tests of internal logic should not use the local FS as scratch space

2017-11-22 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-7013:
-
Comment: was deleted

(was: MAPREDUCE-7011 is an example)

> Tests of internal logic should not use the local FS as scratch space
> 
>
> Key: MAPREDUCE-7013
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7013
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>Reporter: Chris Douglas
>
> MapReduce often manipulates files/permissions to ensure splits, dependencies, 
> and other user data are consistently managed. Unit tests of these internal 
> methods sometimes set up temporary hierarchies in a scratch directory on the 
> local FS to exercise these modules. However, dev environment quirks (e.g., 
> umask) can cause these tests to fail spuriously. Instead, this logic should 
> be validated by mocking the filesystem.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-7013) Tests of internal logic should not use the local FS as scratch space

2017-11-22 Thread Chris Douglas (JIRA)
Chris Douglas created MAPREDUCE-7013:


 Summary: Tests of internal logic should not use the local FS as 
scratch space
 Key: MAPREDUCE-7013
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7013
 Project: Hadoop Map/Reduce
  Issue Type: Test
Reporter: Chris Douglas


MapReduce often manipulates files/permissions to ensure splits, dependencies, 
and other user data are consistently managed. Unit tests of these internal 
methods sometimes set up temporary hierarchies in a scratch directory on the 
local FS to exercise these modules. However, dev environment quirks (e.g., 
umask) can cause these tests to fail spuriously. Instead, this logic should be 
validated by mocking the filesystem.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7013) Tests of internal logic should not use the local FS as scratch space

2017-11-22 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16262899#comment-16262899
 ] 

Chris Douglas commented on MAPREDUCE-7013:
--

MAPREDUCE-7011 is an example

> Tests of internal logic should not use the local FS as scratch space
> 
>
> Key: MAPREDUCE-7013
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7013
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>Reporter: Chris Douglas
>
> MapReduce often manipulates files/permissions to ensure splits, dependencies, 
> and other user data are consistently managed. Unit tests of these internal 
> methods sometimes set up temporary hierarchies in a scratch directory on the 
> local FS to exercise these modules. However, dev environment quirks (e.g., 
> umask) can cause these tests to fail spuriously. Instead, this logic should 
> be validated by mocking the filesystem.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7011) TestClientDistributedCacheManager::testDetermineCacheVisibilities assumes all parent dirs set other exec

2017-11-21 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-7011:
-
   Resolution: Fixed
 Assignee: Chris Douglas
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.1
   Status: Resolved  (was: Patch Available)

I committed this. Thanks for the review, [~subru]

> TestClientDistributedCacheManager::testDetermineCacheVisibilities assumes all 
> parent dirs set other exec
> 
>
> Key: MAPREDUCE-7011
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7011
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>Reporter: Chris Douglas
>Assignee: Chris Douglas
>Priority: Trivial
> Fix For: 3.0.1
>
> Attachments: MAPREDUCE-7011.000.patch
>
>
> {{TestClientDistributedCacheManager}} sets up some local directories to check 
> the visibility set for dependencies, given their filesystem permissions. 
> However, if it is run in an environment where the scratch directory is not 
> itself PUBLIC ({{ClientDistributedCacheManager::isPublic}}), then it will 
> fail.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7011) TestClientDistributedCacheManager::testDetermineCacheVisibilities assumes all parent dirs set other exec

2017-11-21 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-7011:
-
Priority: Trivial  (was: Minor)

> TestClientDistributedCacheManager::testDetermineCacheVisibilities assumes all 
> parent dirs set other exec
> 
>
> Key: MAPREDUCE-7011
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7011
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>Reporter: Chris Douglas
>Priority: Trivial
> Attachments: MAPREDUCE-7011.000.patch
>
>
> {{TestClientDistributedCacheManager}} sets up some local directories to check 
> the visibility set for dependencies, given their filesystem permissions. 
> However, if it is run in an environment where the scratch directory is not 
> itself PUBLIC ({{ClientDistributedCacheManager::isPublic}}), then it will 
> fail.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-7011) TestClientDistributedCacheManager::testDetermineCacheVisibilities assumes all parent dirs set other exec

2017-11-21 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261966#comment-16261966
 ] 

Chris Douglas commented on MAPREDUCE-7011:
--

bq. Does it make sense to open another jira to refactor 
TestClientDistributedCacheManager to not use the local FS?
It wouldn't hurt. I'll open something generic that includes this as an example.

> TestClientDistributedCacheManager::testDetermineCacheVisibilities assumes all 
> parent dirs set other exec
> 
>
> Key: MAPREDUCE-7011
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7011
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>Reporter: Chris Douglas
>Priority: Minor
> Attachments: MAPREDUCE-7011.000.patch
>
>
> {{TestClientDistributedCacheManager}} sets up some local directories to check 
> the visibility set for dependencies, given their filesystem permissions. 
> However, if it is run in an environment where the scratch directory is not 
> itself PUBLIC ({{ClientDistributedCacheManager::isPublic}}), then it will 
> fail.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7011) TestClientDistributedCacheManager::testDetermineCacheVisibilities assumes all parent dirs set other exec

2017-11-21 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-7011:
-
Status: Patch Available  (was: Open)

> TestClientDistributedCacheManager::testDetermineCacheVisibilities assumes all 
> parent dirs set other exec
> 
>
> Key: MAPREDUCE-7011
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7011
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>Reporter: Chris Douglas
>Priority: Minor
> Attachments: MAPREDUCE-7011.000.patch
>
>
> {{TestClientDistributedCacheManager}} sets up some local directories to check 
> the visibility set for dependencies, given their filesystem permissions. 
> However, if it is run in an environment where the scratch directory is not 
> itself PUBLIC ({{ClientDistributedCacheManager::isPublic}}), then it will 
> fail.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-7011) TestClientDistributedCacheManager::testDetermineCacheVisibilities assumes all parent dirs set other exec

2017-11-21 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-7011:
-
Attachment: MAPREDUCE-7011.000.patch

While it would clearly be better if the test validated the visibility logic 
without requiring the local filesystem, that would likely require some 
refactoring. v000 simply skips the test.

> TestClientDistributedCacheManager::testDetermineCacheVisibilities assumes all 
> parent dirs set other exec
> 
>
> Key: MAPREDUCE-7011
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7011
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>Reporter: Chris Douglas
>Priority: Minor
> Attachments: MAPREDUCE-7011.000.patch
>
>
> {{TestClientDistributedCacheManager}} sets up some local directories to check 
> the visibility set for dependencies, given their filesystem permissions. 
> However, if it is run in an environment where the scratch directory is not 
> itself PUBLIC ({{ClientDistributedCacheManager::isPublic}}), then it will 
> fail.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Moved] (MAPREDUCE-7011) TestClientDistributedCacheManager::testDetermineCacheVisibilities assumes all parent dirs set other exec

2017-11-21 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas moved HDFS-12844 to MAPREDUCE-7011:
-

Key: MAPREDUCE-7011  (was: HDFS-12844)
Project: Hadoop Map/Reduce  (was: Hadoop HDFS)

> TestClientDistributedCacheManager::testDetermineCacheVisibilities assumes all 
> parent dirs set other exec
> 
>
> Key: MAPREDUCE-7011
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7011
> Project: Hadoop Map/Reduce
>  Issue Type: Test
>Reporter: Chris Douglas
>Priority: Minor
>
> {{TestClientDistributedCacheManager}} sets up some local directories to check 
> the visibility set for dependencies, given their filesystem permissions. 
> However, if it is run in an environment where the scratch directory is not 
> itself PUBLIC ({{ClientDistributedCacheManager::isPublic}}), then it will 
> fail.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6958) Shuffle audit logger should log size of shuffle transfer

2017-09-18 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16170837#comment-16170837
 ] 

Chris Douglas commented on MAPREDUCE-6958:
--

bq. Since 3.0 hasn't officially shipped yet, I propose to revert the 003 patch 
I committed to trunk and branch-3.0 and instead commit patch version 002 which 
preserves the job-then-reducer ordering already established in the 2.x line. 
Objections?

Nope, that sounds reasonable. Thanks for the extra audit

> Shuffle audit logger should log size of shuffle transfer
> 
>
> Key: MAPREDUCE-6958
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6958
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Minor
> Fix For: 3.0.0-beta1
>
> Attachments: MAPREDUCE-6958.001.patch, MAPREDUCE-6958.002.patch, 
> MAPREDUCE-6958.003.patch, MAPREDUCE-6958-branch-2.002.patch
>
>
> The shuffle audit logger currently logs the job ID and reducer ID but nothing 
> about the size of the requested transfer.  It calculates this as part of the 
> HTTP response headers, so it would be trivial to log the response size.  This 
> would be very valuable for debugging network traffic storms from the shuffle 
> handler.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6958) Shuffle audit logger should log size of shuffle transfer

2017-09-18 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16170487#comment-16170487
 ] 

Chris Douglas commented on MAPREDUCE-6958:
--

Thanks for updating the patch, [~jlowe]. +1

> Shuffle audit logger should log size of shuffle transfer
> 
>
> Key: MAPREDUCE-6958
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6958
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Minor
> Attachments: MAPREDUCE-6958.001.patch, MAPREDUCE-6958.002.patch, 
> MAPREDUCE-6958.003.patch
>
>
> The shuffle audit logger currently logs the job ID and reducer ID but nothing 
> about the size of the requested transfer.  It calculates this as part of the 
> HTTP response headers, so it would be trivial to log the response size.  This 
> would be very valuable for debugging network traffic storms from the shuffle 
> handler.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6958) Shuffle audit logger should log size of shuffle transfer

2017-09-15 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16168577#comment-16168577
 ] 

Chris Douglas commented on MAPREDUCE-6958:
--

Sorry to ask for revs on this kind of patch, but this changes the format of the 
audit log in a way that might break downstream consumers. The mapIds are 
printed after the reducer in the revised version. Could this keep the format 
as-is, with the length appended?

The shuffle sizes used to be available in the clienttrace log. Was that removed 
from the ShuffleHandler at some point?

> Shuffle audit logger should log size of shuffle transfer
> 
>
> Key: MAPREDUCE-6958
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6958
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Minor
> Attachments: MAPREDUCE-6958.001.patch, MAPREDUCE-6958.002.patch
>
>
> The shuffle audit logger currently logs the job ID and reducer ID but nothing 
> about the size of the requested transfer.  It calculates this as part of the 
> HTTP response headers, so it would be trivial to log the response size.  This 
> would be very valuable for debugging network traffic storms from the shuffle 
> handler.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6433) launchTime may be negative

2017-06-25 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-6433:
-
Status: Patch Available  (was: Reopened)

> launchTime may be negative
> --
>
> Key: MAPREDUCE-6433
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6433
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver, mrv2
>Affects Versions: 2.4.1
>Reporter: Allen Wittenauer
>Assignee: zhihai xu
>  Labels: release-blocker
> Fix For: 3.0.0-alpha1, 2.8.0
>
> Attachments: MAPREDUCE-6433.000.patch, MAPREDUCE-6433.001.patch, 
> MAPREDUCE-6433-branch-2.7.001.patch, REPRODUCE.patch
>
>
> Under extremely rare conditions (.0017% in our sample size), launchTime in 
> the jhist files may be set to -1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6433) launchTime may be negative

2017-06-25 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-6433:
-
Target Version/s: 2.7.4

> launchTime may be negative
> --
>
> Key: MAPREDUCE-6433
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6433
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver, mrv2
>Affects Versions: 2.4.1
>Reporter: Allen Wittenauer
>Assignee: zhihai xu
>  Labels: release-blocker
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: MAPREDUCE-6433.000.patch, MAPREDUCE-6433.001.patch, 
> MAPREDUCE-6433-branch-2.7.001.patch, REPRODUCE.patch
>
>
> Under extremely rare conditions (.0017% in our sample size), launchTime in 
> the jhist files may be set to -1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6433) launchTime may be negative

2017-06-25 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-6433:
-
Labels: release-blocker  (was: )

> launchTime may be negative
> --
>
> Key: MAPREDUCE-6433
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6433
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver, mrv2
>Affects Versions: 2.4.1
>Reporter: Allen Wittenauer
>Assignee: zhihai xu
>  Labels: release-blocker
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: MAPREDUCE-6433.000.patch, MAPREDUCE-6433.001.patch, 
> MAPREDUCE-6433-branch-2.7.001.patch, REPRODUCE.patch
>
>
> Under extremely rare conditions (.0017% in our sample size), launchTime in 
> the jhist files may be set to -1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6433) launchTime may be negative

2017-06-25 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-6433:
-
Attachment: MAPREDUCE-6433-branch-2.7.001.patch

Backport to branch-2.7.

This doesn't seem major, but it's easy to include in the upcoming 2.7.4 release.

> launchTime may be negative
> --
>
> Key: MAPREDUCE-6433
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6433
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver, mrv2
>Affects Versions: 2.4.1
>Reporter: Allen Wittenauer
>Assignee: zhihai xu
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: MAPREDUCE-6433.000.patch, MAPREDUCE-6433.001.patch, 
> MAPREDUCE-6433-branch-2.7.001.patch, REPRODUCE.patch
>
>
> Under extremely rare conditions (.0017% in our sample size), launchTime in 
> the jhist files may be set to -1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Reopened] (MAPREDUCE-6433) launchTime may be negative

2017-06-25 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas reopened MAPREDUCE-6433:
--

> launchTime may be negative
> --
>
> Key: MAPREDUCE-6433
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6433
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver, mrv2
>Affects Versions: 2.4.1
>Reporter: Allen Wittenauer
>Assignee: zhihai xu
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: MAPREDUCE-6433.000.patch, MAPREDUCE-6433.001.patch, 
> MAPREDUCE-6433-branch-2.7.001.patch, REPRODUCE.patch
>
>
> Under extremely rare conditions (.0017% in our sample size), launchTime in 
> the jhist files may be set to -1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6883) AuditLogger and TestAuditLogger are dead code

2017-05-09 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-6883:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0-alpha3
   Status: Resolved  (was: Patch Available)

+1

I committed this. Thanks Vrushali

> AuditLogger and TestAuditLogger are dead code
> -
>
> Key: MAPREDUCE-6883
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6883
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 2.8.0
>Reporter: Daniel Templeton
>Assignee: Vrushali C
>Priority: Minor
>  Labels: newbie
> Fix For: 3.0.0-alpha3
>
> Attachments: MAPREDUCE-6883.001.patch, MAPREDUCE-6883.002.patch, 
> MAPREDUCE-6883.003.patch
>
>
> The {{AuditLogger}} and {{TestAuditLogger}} classes appear to be dead code.  
> I can't find anything that uses or references {{AuditLogger}}.  No one has 
> touched the code 2011.  I think it's safe to remove.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6883) AuditLogger and TestAuditLogger are dead code

2017-05-09 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-6883:
-
Attachment: MAPREDUCE-6883.003.patch

> AuditLogger and TestAuditLogger are dead code
> -
>
> Key: MAPREDUCE-6883
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6883
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 2.8.0
>Reporter: Daniel Templeton
>Assignee: Vrushali C
>Priority: Minor
>  Labels: newbie
> Attachments: MAPREDUCE-6883.001.patch, MAPREDUCE-6883.002.patch, 
> MAPREDUCE-6883.003.patch
>
>
> The {{AuditLogger}} and {{TestAuditLogger}} classes appear to be dead code.  
> I can't find anything that uses or references {{AuditLogger}}.  No one has 
> touched the code 2011.  I think it's safe to remove.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6628) Potential memory leak in CryptoOutputStream

2016-09-09 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-6628:
-
Fix Version/s: 2.9.0

> Potential memory leak in CryptoOutputStream
> ---
>
> Key: MAPREDUCE-6628
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6628
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.6.4
>Reporter: Mariappan Asokan
>Assignee: Mariappan Asokan
> Fix For: 2.9.0, 3.0.0-alpha2
>
> Attachments: MAPREDUCE-6628.001.patch, MAPREDUCE-6628.002.patch, 
> MAPREDUCE-6628.003.patch, MAPREDUCE-6628.004.patch, MAPREDUCE-6628.005.patch, 
> MAPREDUCE-6628.006.patch, MAPREDUCE-6628.007.patch, MAPREDUCE-6628.008.patch, 
> MAPREDUCE-6628.009.patch
>
>
> There is a potential memory leak in {{CryptoOutputStream.java.}}  It 
> allocates two direct byte buffers ({{inBuffer}} and {{outBuffer}}) that get 
> freed when {{close()}} method is called.  Most of the time, {{close()}} 
> method is called.  However, when writing to intermediate Map output file or 
> the spill files in {{MapTask}}, {{close()}} is never called since calling so  
> would close the underlying stream which is not desirable.  There is a single 
> underlying physical stream that contains multiple logical streams one per 
> partition of Map output.  
> By default the amount of memory allocated per byte buffer is 128 KB and  so 
> the total memory allocated is 256 KB,  This may not sound much.  However, if 
> the number of partitions (or number of reducers) is large (in the hundreds) 
> and/or there are spill files created in {{MapTask}}, this can grow into a few 
> hundred MB. 
> I can think of two ways to address this issue:
> h2. Possible Fix - 1
> According to JDK documentation:
> {quote}
> The contents of direct buffers may reside outside of the normal 
> garbage-collected heap, and so their impact upon the memory footprint of an 
> application might not be obvious.  It is therefore recommended that direct 
> buffers be allocated primarily for large, long-lived buffers that are subject 
> to the underlying system's native I/O operations.  In general it is best to 
> allocate direct buffers only when they yield a measureable gain in program 
> performance.
> {quote}
> It is not clear to me whether there is any benefit of allocating direct byte 
> buffers in {{CryptoOutputStream.java}}.  In fact, there is a slight CPU 
> overhead in moving data from {{outBuffer}} to a temporary byte array as per 
> the following code in {{CryptoOutputStream.java}}.
> {code}
> /*
>  * If underlying stream supports {@link ByteBuffer} write in future, needs
>  * refine here. 
>  */
> final byte[] tmp = getTmpBuf();
> outBuffer.get(tmp, 0, len);
> out.write(tmp, 0, len);
> {code}
> Even if the underlying stream supports direct byte buffer IO (or direct IO in 
> OS parlance), it is not clear whether it will yield any measurable 
> performance gain.
> The fix would be to allocate a ByteBuffer on the heap for inBuffer and wrap a 
> byte array in a {{ByteBuffer}} for {{outBuffer}}.  By the way, the 
> {{inBuffer}} and {{outBuffer}} have to be {{ByteBuffer}} as demanded by the 
> {{encrypt()}} method in {{Encryptor}}.
> h2. Possible Fix - 2
> Assuming that we want to keep the buffers as direct byte buffers, we can 
> create a new constructor to {{CryptoOutputStream}} and pass a boolean flag 
> {{ownOutputStream}} to indicate whether the underlying stream will be owned 
> by {{CryptoOutputStream}}. If it is true, then calling the {{close()}} method 
> will close the underlying stream.  Otherwise, when {{close()}} is called only 
> the direct byte buffers will be freed and the underlying stream will not be 
> closed.
> The scope of changes for this fix will be somewhat wider.  We need to modify 
> {{MapTask.java}}, {{CryptoUtils.java}}, and {{CryptoFSDataOutputStream.java}} 
> as well to pass the ownership flag mentioned above.
> I can post a patch for either of the above.  I welcome any other ideas from 
> developers to fix this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6628) Potential memory leak in CryptoOutputStream

2016-09-09 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-6628:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0-alpha2
   Status: Resolved  (was: Patch Available)

+1

I committed this. Thanks [~masokan]

> Potential memory leak in CryptoOutputStream
> ---
>
> Key: MAPREDUCE-6628
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6628
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.6.4
>Reporter: Mariappan Asokan
>Assignee: Mariappan Asokan
> Fix For: 3.0.0-alpha2
>
> Attachments: MAPREDUCE-6628.001.patch, MAPREDUCE-6628.002.patch, 
> MAPREDUCE-6628.003.patch, MAPREDUCE-6628.004.patch, MAPREDUCE-6628.005.patch, 
> MAPREDUCE-6628.006.patch, MAPREDUCE-6628.007.patch, MAPREDUCE-6628.008.patch, 
> MAPREDUCE-6628.009.patch
>
>
> There is a potential memory leak in {{CryptoOutputStream.java.}}  It 
> allocates two direct byte buffers ({{inBuffer}} and {{outBuffer}}) that get 
> freed when {{close()}} method is called.  Most of the time, {{close()}} 
> method is called.  However, when writing to intermediate Map output file or 
> the spill files in {{MapTask}}, {{close()}} is never called since calling so  
> would close the underlying stream which is not desirable.  There is a single 
> underlying physical stream that contains multiple logical streams one per 
> partition of Map output.  
> By default the amount of memory allocated per byte buffer is 128 KB and  so 
> the total memory allocated is 256 KB,  This may not sound much.  However, if 
> the number of partitions (or number of reducers) is large (in the hundreds) 
> and/or there are spill files created in {{MapTask}}, this can grow into a few 
> hundred MB. 
> I can think of two ways to address this issue:
> h2. Possible Fix - 1
> According to JDK documentation:
> {quote}
> The contents of direct buffers may reside outside of the normal 
> garbage-collected heap, and so their impact upon the memory footprint of an 
> application might not be obvious.  It is therefore recommended that direct 
> buffers be allocated primarily for large, long-lived buffers that are subject 
> to the underlying system's native I/O operations.  In general it is best to 
> allocate direct buffers only when they yield a measureable gain in program 
> performance.
> {quote}
> It is not clear to me whether there is any benefit of allocating direct byte 
> buffers in {{CryptoOutputStream.java}}.  In fact, there is a slight CPU 
> overhead in moving data from {{outBuffer}} to a temporary byte array as per 
> the following code in {{CryptoOutputStream.java}}.
> {code}
> /*
>  * If underlying stream supports {@link ByteBuffer} write in future, needs
>  * refine here. 
>  */
> final byte[] tmp = getTmpBuf();
> outBuffer.get(tmp, 0, len);
> out.write(tmp, 0, len);
> {code}
> Even if the underlying stream supports direct byte buffer IO (or direct IO in 
> OS parlance), it is not clear whether it will yield any measurable 
> performance gain.
> The fix would be to allocate a ByteBuffer on the heap for inBuffer and wrap a 
> byte array in a {{ByteBuffer}} for {{outBuffer}}.  By the way, the 
> {{inBuffer}} and {{outBuffer}} have to be {{ByteBuffer}} as demanded by the 
> {{encrypt()}} method in {{Encryptor}}.
> h2. Possible Fix - 2
> Assuming that we want to keep the buffers as direct byte buffers, we can 
> create a new constructor to {{CryptoOutputStream}} and pass a boolean flag 
> {{ownOutputStream}} to indicate whether the underlying stream will be owned 
> by {{CryptoOutputStream}}. If it is true, then calling the {{close()}} method 
> will close the underlying stream.  Otherwise, when {{close()}} is called only 
> the direct byte buffers will be freed and the underlying stream will not be 
> closed.
> The scope of changes for this fix will be somewhat wider.  We need to modify 
> {{MapTask.java}}, {{CryptoUtils.java}}, and {{CryptoFSDataOutputStream.java}} 
> as well to pass the ownership flag mentioned above.
> I can post a patch for either of the above.  I welcome any other ideas from 
> developers to fix this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6628) Potential memory leak in CryptoOutputStream

2016-09-06 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15468199#comment-15468199
 ] 

Chris Douglas commented on MAPREDUCE-6628:
--

[~masokan] thank you for your patience with this.

The unit test looks useful for debugging, but it doesn't actually verify the 
fix. As written, it's also expensive to run (starts a cluster) and relies on a 
platform-dependent scan of {{/proc/self/status}}, rather than using 
{{java.lang.management}} APIs. That said, unit testing this corner of MapReduce 
is not straightforward, and your posted results demonstrate both the issue and 
the fix. We can commit this without a MR test.

Would it be possible to write a short unit test for {{CryptoOutputStream}} 
verifying the new {{closeOutputStream}} semantics? This should be very 
straightforward in Mockito, just checking that {{close}} behaves as expected 
when the flag is passed.

It's unfortunate that we're switching behavior based on object reference 
equality, to check whether the stream was wrapped. As designed, I don't see a 
cleaner way to improve this without refactoring the crypto implementation.

> Potential memory leak in CryptoOutputStream
> ---
>
> Key: MAPREDUCE-6628
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6628
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.6.4
>Reporter: Mariappan Asokan
>Assignee: Mariappan Asokan
> Attachments: MAPREDUCE-6628.001.patch, MAPREDUCE-6628.002.patch, 
> MAPREDUCE-6628.003.patch, MAPREDUCE-6628.004.patch, MAPREDUCE-6628.005.patch, 
> MAPREDUCE-6628.006.patch, MAPREDUCE-6628.007.patch
>
>
> There is a potential memory leak in {{CryptoOutputStream.java.}}  It 
> allocates two direct byte buffers ({{inBuffer}} and {{outBuffer}}) that get 
> freed when {{close()}} method is called.  Most of the time, {{close()}} 
> method is called.  However, when writing to intermediate Map output file or 
> the spill files in {{MapTask}}, {{close()}} is never called since calling so  
> would close the underlying stream which is not desirable.  There is a single 
> underlying physical stream that contains multiple logical streams one per 
> partition of Map output.  
> By default the amount of memory allocated per byte buffer is 128 KB and  so 
> the total memory allocated is 256 KB,  This may not sound much.  However, if 
> the number of partitions (or number of reducers) is large (in the hundreds) 
> and/or there are spill files created in {{MapTask}}, this can grow into a few 
> hundred MB. 
> I can think of two ways to address this issue:
> h2. Possible Fix - 1
> According to JDK documentation:
> {quote}
> The contents of direct buffers may reside outside of the normal 
> garbage-collected heap, and so their impact upon the memory footprint of an 
> application might not be obvious.  It is therefore recommended that direct 
> buffers be allocated primarily for large, long-lived buffers that are subject 
> to the underlying system's native I/O operations.  In general it is best to 
> allocate direct buffers only when they yield a measureable gain in program 
> performance.
> {quote}
> It is not clear to me whether there is any benefit of allocating direct byte 
> buffers in {{CryptoOutputStream.java}}.  In fact, there is a slight CPU 
> overhead in moving data from {{outBuffer}} to a temporary byte array as per 
> the following code in {{CryptoOutputStream.java}}.
> {code}
> /*
>  * If underlying stream supports {@link ByteBuffer} write in future, needs
>  * refine here. 
>  */
> final byte[] tmp = getTmpBuf();
> outBuffer.get(tmp, 0, len);
> out.write(tmp, 0, len);
> {code}
> Even if the underlying stream supports direct byte buffer IO (or direct IO in 
> OS parlance), it is not clear whether it will yield any measurable 
> performance gain.
> The fix would be to allocate a ByteBuffer on the heap for inBuffer and wrap a 
> byte array in a {{ByteBuffer}} for {{outBuffer}}.  By the way, the 
> {{inBuffer}} and {{outBuffer}} have to be {{ByteBuffer}} as demanded by the 
> {{encrypt()}} method in {{Encryptor}}.
> h2. Possible Fix - 2
> Assuming that we want to keep the buffers as direct byte buffers, we can 
> create a new constructor to {{CryptoOutputStream}} and pass a boolean flag 
> {{ownOutputStream}} to indicate whether the underlying stream will be owned 
> by {{CryptoOutputStream}}. If it is true, then calling the {{close()}} method 
> will close the underlying stream.  Otherwise, when {{close()}} is called only 
> the direct byte buffers will be freed and the underlying stream will not be 
> closed.
> The scope of changes for this fix will be somewhat wider.  We need to modify 
> {{MapTask.java}}, {{CryptoUtils.java}}, and {{CryptoFSDataOutputStream.java}} 
> 

[jira] [Updated] (MAPREDUCE-6767) TestSlive fails after a common change

2016-08-24 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-6767:
-
Status: Patch Available  (was: Open)

> TestSlive fails after a common change
> -
>
> Key: MAPREDUCE-6767
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6767
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Kihwal Lee
>Assignee: Daniel Templeton
> Attachments: MAPREDUCE-6767.001.patch
>
>
> It looks like this was broken after HADOOP-12726.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6240) Hadoop client displays confusing error message

2016-06-08 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320699#comment-15320699
 ] 

Chris Douglas commented on MAPREDUCE-6240:
--

+1 lgtm

bq. would creating IOException inside catch block will be better?
The suppressed exceptions are the interesting part. The code is easier to read 
as-is (IMO), but either way is fine.

> Hadoop client displays confusing error message
> --
>
> Key: MAPREDUCE-6240
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6240
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Affects Versions: 2.7.0
>Reporter: Mohammad Kamrul Islam
>Assignee: Gera Shegalov
> Attachments: MAPREDUCE-6240-gera.001.patch, 
> MAPREDUCE-6240-gera.001.patch, MAPREDUCE-6240-gera.002.patch, 
> MAPREDUCE-6240.003.patch, MAPREDUCE-6240.004.patch, MAPREDUCE-6240.1.patch
>
>
> Hadoop client often throws exception  with "java.io.IOException: Cannot 
> initialize Cluster. Please check your configuration for 
> mapreduce.framework.name and the correspond server addresses".
> This is a misleading and generic message for any cluster initialization 
> problem. It takes a lot of debugging hours to identify the root cause. The 
> correct error message could resolve this problem quickly.
> In one such instance, Oozie log showed the following exception  while the 
> root cause was CNF  that Hadoop client didn't return in the exception.
> {noformat}
>  JA009: Cannot initialize Cluster. Please check your configuration for 
> mapreduce.framework.name and the correspond server addresses.
> at 
> org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:412)
> at 
> org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:392)
> at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:979)
> at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1134)
> at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:228)
> at 
> org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
> at org.apache.oozie.command.XCommand.call(XCommand.java:281)
> at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:323)
> at 
> org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:252)
> at 
> org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:174)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.io.IOException: Cannot initialize Cluster. Please check your 
> configuration for mapreduce.framework.name and the correspond server 
> addresses.
> at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120)
> at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:82)
> at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:75)
> at org.apache.hadoop.mapred.JobClient.init(JobClient.java:470)
> at org.apache.hadoop.mapred.JobClient.(JobClient.java:449)
> at 
> org.apache.oozie.service.HadoopAccessorService$1.run(HadoopAccessorService.java:372)
> at 
> org.apache.oozie.service.HadoopAccessorService$1.run(HadoopAccessorService.java:370)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> at 
> org.apache.oozie.service.HadoopAccessorService.createJobClient(HadoopAccessorService.java:379)
> at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.createJobClient(JavaActionExecutor.java:1185)
> at 
> org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:927)
>  ... 10 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6423) MapOutput Sampler

2015-08-21 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-6423:
-
Status: Open  (was: Patch Available)

 MapOutput Sampler
 -

 Key: MAPREDUCE-6423
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6423
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Ram Manohar Bheemana
Assignee: Ram Manohar Bheemana
Priority: Minor
 Attachments: MapOutputSampler.java


 Need a sampler based on the MapOutput Keys. Current InputSampler 
 implementation has a major drawback which is input and output of a mapper 
 should be same, generally this isn't the case.
 approach:
 1. Create a Sampler which samples the data based on the input.
 2. Run a small map reduce in uber task mode using the original job mapper and 
 identity reducer to generate required MapOutputSample keys
 3. Optionally, we can input the input file to be sample. For example inputs 
 files A, B; we should be able to specify to use only file A for sampling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6423) MapOutput Sampler

2015-08-21 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707436#comment-14707436
 ] 

Chris Douglas commented on MAPREDUCE-6423:
--

Thanks for taking a look at this. That the sampler only works on input data was 
always a weakness for jobs requiring their output be totally ordered.

Could you generate a patch? The contribution wiki is 
[here|http://wiki.apache.org/hadoop/HowToContribute].

It might be easier for others to use if the Mapper was integrated with the 
InputSampler, but a separate tool is still an improvement.

 MapOutput Sampler
 -

 Key: MAPREDUCE-6423
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6423
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Ram Manohar Bheemana
Assignee: Ram Manohar Bheemana
Priority: Minor
 Attachments: MapOutputSampler.java


 Need a sampler based on the MapOutput Keys. Current InputSampler 
 implementation has a major drawback which is input and output of a mapper 
 should be same, generally this isn't the case.
 approach:
 1. Create a Sampler which samples the data based on the input.
 2. Run a small map reduce in uber task mode using the original job mapper and 
 identity reducer to generate required MapOutputSample keys
 3. Optionally, we can input the input file to be sample. For example inputs 
 files A, B; we should be able to specify to use only file A for sampling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6434) Add support for PartialFileOutputCommiter when checkpointing is an option during preemption

2015-08-21 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707130#comment-14707130
 ] 

Chris Douglas commented on MAPREDUCE-6434:
--

Offhand, I'd guess adding 
{{TaskType.REDUCE.equals(context.getTaskAttemptID().getTaskType())}} to the 
expression would prevent it from affecting more than reducers, but I haven't 
looked into it. Could you test with a map-only job, where 
{{context.getReducerClass()}} is undefined or not on the classpath?

 Add support for PartialFileOutputCommiter when checkpointing is an option 
 during preemption
 ---

 Key: MAPREDUCE-6434
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6434
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Augusto Souza
Assignee: Augusto Souza
 Attachments: MAPREDUCE-6434.001.patch, MAPREDUCE-6434.002.patch, 
 MAPREDUCE-6434.003.patch, MAPREDUCE-6434.004.patch, MAPREDUCE-6434.005.patch


 Finish up some renaming work related to the annotation @Preemptable (it 
 should be @Checkpointable now) and help in the splitting of patch in 
 MAPREDUCE-5269 that is too large for being reviewed or accepted by Jenkins CI 
 scripts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6434) Add support for PartialFileOutputCommiter when checkpointing is an option during preemption

2015-08-21 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707498#comment-14707498
 ] 

Chris Douglas commented on MAPREDUCE-6434:
--

Agreed, the NPE is usually not a problem since the default should be defined in 
mapred-defaults, though {{JobContextImpl::getReducerClass}} can return null. At 
least two cases shouldn't cause a problem for map-only jobs:
# The base {{mapreduce.Reducer}} is {{\@Checkpointable}}, so it would 
instantiate a {{PartialFileOutputCommitter}}
# A {{Reducer}} in the config shouldn't cause a map-only job to fail if it's 
not on the classpath (this may not be true in the current code, but this 
shouldn't add another case)

We also don't want to do anything surprising for setup/cleanup tasks.

 Add support for PartialFileOutputCommiter when checkpointing is an option 
 during preemption
 ---

 Key: MAPREDUCE-6434
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6434
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Augusto Souza
Assignee: Augusto Souza
 Attachments: MAPREDUCE-6434.001.patch, MAPREDUCE-6434.002.patch, 
 MAPREDUCE-6434.003.patch, MAPREDUCE-6434.004.patch, MAPREDUCE-6434.005.patch, 
 MAPREDUCE-6434.006.patch


 Finish up some renaming work related to the annotation @Preemptable (it 
 should be @Checkpointable now) and help in the splitting of patch in 
 MAPREDUCE-5269 that is too large for being reviewed or accepted by Jenkins CI 
 scripts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6434) Add support for PartialFileOutputCommiter when checkpointing is an option during preemption

2015-08-20 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706018#comment-14706018
 ] 

Chris Douglas commented on MAPREDUCE-6434:
--

Thanks for updating the patch, [~augustorsouza]. Could you check this change?
{noformat}
-  committer = new FileOutputCommitter(output, context);
+  try {
+if (context.getConfiguration().getBoolean(MRJobConfig.TASK_PREEMPTION,
+false)
+ context.getReducerClass()
+.isAnnotationPresent(Checkpointable.class)) {
+  committer = new PartialFileOutputCommitter(output, context);
+} else {
+  committer = new FileOutputCommitter(output, context);
+}
+  } catch (ClassNotFoundException c) {
+throw new RuntimeException(
+Internal error: reducer class is not defined , c);
+  }
{noformat}

Since preemption in MAPREDUCE-5269 only supports reduce tasks, even if 
preemption is enabled for map-only jobs, the reduce class can be undefined.

 Add support for PartialFileOutputCommiter when checkpointing is an option 
 during preemption
 ---

 Key: MAPREDUCE-6434
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6434
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Augusto Souza
Assignee: Augusto Souza
 Attachments: MAPREDUCE-6434.001.patch, MAPREDUCE-6434.002.patch, 
 MAPREDUCE-6434.003.patch, MAPREDUCE-6434.004.patch, MAPREDUCE-6434.005.patch


 Finish up some renaming work related to the annotation @Preemptable (it 
 should be @Checkpointable now) and help in the splitting of patch in 
 MAPREDUCE-5269 that is too large for being reviewed or accepted by Jenkins CI 
 scripts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-2454) Allow external sorter plugin for MR

2015-08-17 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-2454:
-
Assignee: Mariappan Asokan  (was: Bharat Jha)

 Allow external sorter plugin for MR
 ---

 Key: MAPREDUCE-2454
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2454
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 2.0.0-alpha, 3.0.0, 2.0.2-alpha
Reporter: Mariappan Asokan
Assignee: Mariappan Asokan
Priority: Minor
  Labels: features, performance, plugin, sort
 Fix For: 2.0.3-alpha

 Attachments: HadoopSortPlugin.pdf, HadoopSortPlugin.pdf, 
 KeyValueIterator.java, MR-2454-trunkPatchPreview.gz, MapOutputSorter.java, 
 MapOutputSorterAbstract.java, ReduceInputSorter.java, 
 mapreduce-2454-modified-code.patch, mapreduce-2454-modified-test.patch, 
 mapreduce-2454-new-test.patch, mapreduce-2454-protection-change.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, 
 mr-2454-on-mr-279-build82.patch.gz


 Define interfaces and some abstract classes in the Hadoop framework to 
 facilitate external sorter plugins both on the Map and Reduce sides.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5817) mappers get rescheduled on node transition even after all reducers are completed

2015-08-12 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14693807#comment-14693807
 ] 

Chris Douglas commented on MAPREDUCE-5817:
--

bq. The current patch skips re-running mappers only if all reducers are 
complete. So I don't think reducers will fail beyond that point? Did I 
understand your question right?

I see; sorry, I hadn't read the rest of the JIRA carefully. That's a fairly 
narrow window, isn't it? We may not need an extra state, if we kill all running 
maps when the last reducer completes. The condition this adds prevents new maps 
from being scheduled while cleanup/commit code is running.

Minor: could {{allReducersComplete()}} call {{getCompletedReduces()}}?

+1 on the patch

 mappers get rescheduled on node transition even after all reducers are 
 completed
 

 Key: MAPREDUCE-5817
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5817
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster
Affects Versions: 2.3.0
Reporter: Sangjin Lee
Assignee: Sangjin Lee
 Attachments: MAPREDUCE-5817.001.patch, mapreduce-5817.patch


 We're seeing a behavior where a job runs long after all reducers were already 
 finished. We found that the job was rescheduling and running a number of 
 mappers beyond the point of reducer completion. In one situation, the job ran 
 for some 9 more hours after all reducers completed!
 This happens because whenever a node transition (to an unusable state) comes 
 into the app master, it just reschedules all mappers that already ran on the 
 node in all cases.
 Therefore, if any node transition has a potential to extend the job period. 
 Once this window opens, another node transition can prolong it, and this can 
 happen indefinitely in theory.
 If there is some instability in the pool (unhealthy, etc.) for a duration, 
 then any big job is severely vulnerable to this problem.
 If all reducers have been completed, JobImpl.actOnUnusableNode() should not 
 reschedule mapper tasks. If all reducers are completed, the mapper outputs 
 are no longer needed, and there is no need to reschedule mapper tasks as they 
 would not be consumed anyway.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5817) mappers get rescheduled on node transition even after all reducers are completed

2015-08-11 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692298#comment-14692298
 ] 

Chris Douglas commented on MAPREDUCE-5817:
--

Does this work if the reducer fails subsequently? Presumably reexecution will 
be triggered by fetch failures?

 mappers get rescheduled on node transition even after all reducers are 
 completed
 

 Key: MAPREDUCE-5817
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5817
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster
Affects Versions: 2.3.0
Reporter: Sangjin Lee
Assignee: Sangjin Lee
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-5817.001.patch, mapreduce-5817.patch


 We're seeing a behavior where a job runs long after all reducers were already 
 finished. We found that the job was rescheduling and running a number of 
 mappers beyond the point of reducer completion. In one situation, the job ran 
 for some 9 more hours after all reducers completed!
 This happens because whenever a node transition (to an unusable state) comes 
 into the app master, it just reschedules all mappers that already ran on the 
 node in all cases.
 Therefore, if any node transition has a potential to extend the job period. 
 Once this window opens, another node transition can prolong it, and this can 
 happen indefinitely in theory.
 If there is some instability in the pool (unhealthy, etc.) for a duration, 
 then any big job is severely vulnerable to this problem.
 If all reducers have been completed, JobImpl.actOnUnusableNode() should not 
 reschedule mapper tasks. If all reducers are completed, the mapper outputs 
 are no longer needed, and there is no need to reschedule mapper tasks as they 
 would not be consumed anyway.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6240) Hadoop client displays confusing error message

2015-08-06 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660599#comment-14660599
 ] 

Chris Douglas commented on MAPREDUCE-6240:
--

bq.  how about refactoring it and removing this class 
org.apache.hadoop.io.MultipleIOException

We'd have to audit where it's used. If there could be systems that expect and 
handle it, we'd have to deprecate it first, but I think it makes sense to 
remove it in trunk. Separate issue, of course.

 Hadoop client displays confusing error message
 --

 Key: MAPREDUCE-6240
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6240
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 2.7.0
Reporter: Mohammad Kamrul Islam
Assignee: Gera Shegalov
 Attachments: MAPREDUCE-6240-gera.001.patch, 
 MAPREDUCE-6240-gera.001.patch, MAPREDUCE-6240-gera.002.patch, 
 MAPREDUCE-6240.003.patch, MAPREDUCE-6240.1.patch


 Hadoop client often throws exception  with java.io.IOException: Cannot 
 initialize Cluster. Please check your configuration for 
 mapreduce.framework.name and the correspond server addresses.
 This is a misleading and generic message for any cluster initialization 
 problem. It takes a lot of debugging hours to identify the root cause. The 
 correct error message could resolve this problem quickly.
 In one such instance, Oozie log showed the following exception  while the 
 root cause was CNF  that Hadoop client didn't return in the exception.
 {noformat}
  JA009: Cannot initialize Cluster. Please check your configuration for 
 mapreduce.framework.name and the correspond server addresses.
 at 
 org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:412)
 at 
 org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:392)
 at 
 org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:979)
 at 
 org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1134)
 at 
 org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:228)
 at 
 org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
 at org.apache.oozie.command.XCommand.call(XCommand.java:281)
 at 
 org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:323)
 at 
 org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:252)
 at 
 org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:174)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: java.io.IOException: Cannot initialize Cluster. Please check your 
 configuration for mapreduce.framework.name and the correspond server 
 addresses.
 at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120)
 at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:82)
 at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:75)
 at org.apache.hadoop.mapred.JobClient.init(JobClient.java:470)
 at org.apache.hadoop.mapred.JobClient.init(JobClient.java:449)
 at 
 org.apache.oozie.service.HadoopAccessorService$1.run(HadoopAccessorService.java:372)
 at 
 org.apache.oozie.service.HadoopAccessorService$1.run(HadoopAccessorService.java:370)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
 at 
 org.apache.oozie.service.HadoopAccessorService.createJobClient(HadoopAccessorService.java:379)
 at 
 org.apache.oozie.action.hadoop.JavaActionExecutor.createJobClient(JavaActionExecutor.java:1185)
 at 
 org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:927)
  ... 10 more
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6240) Hadoop client displays confusing error message

2015-08-05 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658589#comment-14658589
 ] 

Chris Douglas commented on MAPREDUCE-6240:
--

bq. Any reason why we dint just throw a IOException with all these exception 
added as suppressed exceptions.?

Neat! I didn't know this was added to 1.7. I like this approach

 Hadoop client displays confusing error message
 --

 Key: MAPREDUCE-6240
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6240
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 2.7.0
Reporter: Mohammad Kamrul Islam
Assignee: Gera Shegalov
 Attachments: MAPREDUCE-6240-gera.001.patch, 
 MAPREDUCE-6240-gera.001.patch, MAPREDUCE-6240-gera.002.patch, 
 MAPREDUCE-6240.003.patch, MAPREDUCE-6240.1.patch


 Hadoop client often throws exception  with java.io.IOException: Cannot 
 initialize Cluster. Please check your configuration for 
 mapreduce.framework.name and the correspond server addresses.
 This is a misleading and generic message for any cluster initialization 
 problem. It takes a lot of debugging hours to identify the root cause. The 
 correct error message could resolve this problem quickly.
 In one such instance, Oozie log showed the following exception  while the 
 root cause was CNF  that Hadoop client didn't return in the exception.
 {noformat}
  JA009: Cannot initialize Cluster. Please check your configuration for 
 mapreduce.framework.name and the correspond server addresses.
 at 
 org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:412)
 at 
 org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:392)
 at 
 org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:979)
 at 
 org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1134)
 at 
 org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:228)
 at 
 org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
 at org.apache.oozie.command.XCommand.call(XCommand.java:281)
 at 
 org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:323)
 at 
 org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:252)
 at 
 org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:174)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: java.io.IOException: Cannot initialize Cluster. Please check your 
 configuration for mapreduce.framework.name and the correspond server 
 addresses.
 at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120)
 at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:82)
 at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:75)
 at org.apache.hadoop.mapred.JobClient.init(JobClient.java:470)
 at org.apache.hadoop.mapred.JobClient.init(JobClient.java:449)
 at 
 org.apache.oozie.service.HadoopAccessorService$1.run(HadoopAccessorService.java:372)
 at 
 org.apache.oozie.service.HadoopAccessorService$1.run(HadoopAccessorService.java:370)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
 at 
 org.apache.oozie.service.HadoopAccessorService.createJobClient(HadoopAccessorService.java:379)
 at 
 org.apache.oozie.action.hadoop.JavaActionExecutor.createJobClient(JavaActionExecutor.java:1185)
 at 
 org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:927)
  ... 10 more
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6427) Fix typo in JobHistoryEventHandler

2015-07-14 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-6427:
-
Summary: Fix typo in JobHistoryEventHandler  (was: Fix Typo in 
JobHistoryEventHandler#processEventForTimelineServer)

 Fix typo in JobHistoryEventHandler
 --

 Key: MAPREDUCE-6427
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6427
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Brahma Reddy Battula
Assignee: Ray Chiang
Priority: Minor
 Fix For: 2.8.0

 Attachments: MAPREDUCE-6427.001.branch-2.patch, 
 MAPREDUCE-6427.001.patch


  JobHistoryEventHandler#processEventForTimelineServer
  {code}tEvent.addEventInfo(WORKLFOW_ID, jse.getWorkflowId());{code}
  *should be like below.* 
  {code}tEvent.addEventInfo(WORKFLOW_ID, jse.getWorkflowId()); {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6427) Fix Typo in JobHistoryEventHandler#processEventForTimelineServer

2015-07-14 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-6427:
-
   Resolution: Fixed
 Hadoop Flags: Incompatible change,Reviewed  (was: Incompatible change)
Fix Version/s: 2.8.0
 Release Note: There is a typo in the event string WORKFLOW_ID (as 
WORKLFOW_ID).  The branch-2 change will publish both event strings for 
compatibility with consumers, but the misspelled metric will be removed in 
trunk.  (was: There is a typo in the event string WORKFLOW_ID (as 
WORKLFOW_ID).  The branch-2 change will publish both event strings.

The trunk solution will be an incompatible change going forward, with only the 
correctly spelled string.)
   Status: Resolved  (was: Patch Available)

+1 I committed this. Thanks Ray

 Fix Typo in JobHistoryEventHandler#processEventForTimelineServer
 

 Key: MAPREDUCE-6427
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6427
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Brahma Reddy Battula
Assignee: Ray Chiang
Priority: Minor
 Fix For: 2.8.0

 Attachments: MAPREDUCE-6427.001.branch-2.patch, 
 MAPREDUCE-6427.001.patch


  JobHistoryEventHandler#processEventForTimelineServer
  {code}tEvent.addEventInfo(WORKLFOW_ID, jse.getWorkflowId());{code}
  *should be like below.* 
  {code}tEvent.addEventInfo(WORKFLOW_ID, jse.getWorkflowId()); {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6427) Fix Typo in JobHistoryEventHandler#processEventForTimelineServer

2015-07-08 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619698#comment-14619698
 ] 

Chris Douglas commented on MAPREDUCE-6427:
--

bq. Initial version. branch-2 and trunk versions appear identical.

The branch-2 version should publish both the old metric _and_ the correctly 
spelled label. Trunk can replace it outright. Please also add a release note.

 Fix Typo in JobHistoryEventHandler#processEventForTimelineServer
 

 Key: MAPREDUCE-6427
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6427
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Brahma Reddy Battula
Assignee: Ray Chiang
Priority: Minor
 Attachments: MAPREDUCE-6427.001.patch


  JobHistoryEventHandler#processEventForTimelineServer
  {code}tEvent.addEventInfo(WORKLFOW_ID, jse.getWorkflowId());{code}
  *should be like below.* 
  {code}tEvent.addEventInfo(WORKFLOW_ID, jse.getWorkflowId()); {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6038) A boolean may be set error in the Word Count v2.0 in MapReduce Tutorial

2015-07-07 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-6038:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

+1 I committed this. Thanks Tsuyoshi

 A boolean may be set error in the Word Count v2.0 in MapReduce Tutorial
 ---

 Key: MAPREDUCE-6038
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6038
 Project: Hadoop Map/Reduce
  Issue Type: Bug
 Environment: java version 1.8.0_11 hostspot 64-bit
Reporter: Pei Ma
Assignee: Tsuyoshi Ozawa
Priority: Minor
  Labels: BB2015-05-TBR
 Fix For: 2.8.0

 Attachments: MAPREDUCE-6038.1.patch, MAPREDUCE-6038.2.patch


 As a beginner, when I learned about the basic of the mr, I found that I 
 cound't run the WordCount2 using the command bin/hadoop jar wc.jar 
 WordCount2 /user/joe/wordcount/input /user/joe/wordcount/output in the 
 Tutorial. The VM throwed the NullPoniterException at the line 47. In the line 
 45, the returned default value of conf.getBoolean is true. That is to say  
 when wordcount.skip.patterns is not set ,the WordCount2 will continue to 
 execute getCacheFiles.. Then patternsURIs gets the null value. When the 
 -skip option dosen't exist,  wordcount.skip.patterns will not be set. 
 Then a NullPointerException come out.
 At all, the block after the if-statement in line no. 45 shoudn't be executed 
 when the -skip option dosen't exist in command. Maybe the line 45 should 
 like that  if (conf.getBoolean(wordcount.skip.patterns, false)) { 
 .Just change the boolean.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6240) Hadoop client displays confusing error message

2015-06-30 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609224#comment-14609224
 ] 

Chris Douglas commented on MAPREDUCE-6240:
--

Minor (not blocking for commit):
- This could just add the {{IOException}} to the list instead of throw/catch
- The message doesn't need to append {{t}}:
{noformat}
+ioExceptions.add(new IOException(Failed to initialize protocol: 
++ t, t));
{noformat}
- Should this continue to catch only {{Exception}}, instead of {{Throwable}}? 
- If the cause of the exception is an {{IOException}}, this discards the caught 
exception? Is this a special/common case?

It's a little odd to wrap all of this as an {{IOException}}, but I don't think 
it's worth adding another composite type.

 Hadoop client displays confusing error message
 --

 Key: MAPREDUCE-6240
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6240
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 2.7.0
Reporter: Mohammad Kamrul Islam
Assignee: Gera Shegalov
 Attachments: MAPREDUCE-6240-gera.001.patch, 
 MAPREDUCE-6240-gera.001.patch, MAPREDUCE-6240-gera.002.patch, 
 MAPREDUCE-6240.003.patch, MAPREDUCE-6240.1.patch


 Hadoop client often throws exception  with java.io.IOException: Cannot 
 initialize Cluster. Please check your configuration for 
 mapreduce.framework.name and the correspond server addresses.
 This is a misleading and generic message for any cluster initialization 
 problem. It takes a lot of debugging hours to identify the root cause. The 
 correct error message could resolve this problem quickly.
 In one such instance, Oozie log showed the following exception  while the 
 root cause was CNF  that Hadoop client didn't return in the exception.
 {noformat}
  JA009: Cannot initialize Cluster. Please check your configuration for 
 mapreduce.framework.name and the correspond server addresses.
 at 
 org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:412)
 at 
 org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:392)
 at 
 org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:979)
 at 
 org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1134)
 at 
 org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:228)
 at 
 org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
 at org.apache.oozie.command.XCommand.call(XCommand.java:281)
 at 
 org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:323)
 at 
 org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:252)
 at 
 org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:174)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: java.io.IOException: Cannot initialize Cluster. Please check your 
 configuration for mapreduce.framework.name and the correspond server 
 addresses.
 at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120)
 at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:82)
 at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:75)
 at org.apache.hadoop.mapred.JobClient.init(JobClient.java:470)
 at org.apache.hadoop.mapred.JobClient.init(JobClient.java:449)
 at 
 org.apache.oozie.service.HadoopAccessorService$1.run(HadoopAccessorService.java:372)
 at 
 org.apache.oozie.service.HadoopAccessorService$1.run(HadoopAccessorService.java:370)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
 at 
 org.apache.oozie.service.HadoopAccessorService.createJobClient(HadoopAccessorService.java:379)
 at 
 org.apache.oozie.action.hadoop.JavaActionExecutor.createJobClient(JavaActionExecutor.java:1185)
 at 
 org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:927)
  ... 10 more
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-2094) org.apache.hadoop.mapreduce.lib.input.FileInputFormat: isSplitable implements unsafe default behaviour that is different from the documented behaviour.

2015-05-08 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-2094:
-
Attachment: M2094.patch

Rewrote exception message.

[~nielsbasjes], I know you think this undersells the severity of the bug. I'll 
rewrite the description to limit the scope of this fix, if you still want to 
litigate the point in another JIRA.

 org.apache.hadoop.mapreduce.lib.input.FileInputFormat: isSplitable implements 
 unsafe default behaviour that is different from the documented behaviour.
 ---

 Key: MAPREDUCE-2094
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2094
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Reporter: Niels Basjes
Assignee: Niels Basjes
 Attachments: M2094.patch, MAPREDUCE-2094-2011-05-19.patch, 
 MAPREDUCE-2094-20140727-svn-fixed-spaces.patch, 
 MAPREDUCE-2094-20140727-svn.patch, MAPREDUCE-2094-20140727.patch, 
 MAPREDUCE-2094-2015-05-05-2328.patch, 
 MAPREDUCE-2094-FileInputFormat-docs-v2.patch


 When implementing a custom derivative of FileInputFormat we ran into the 
 effect that a large Gzipped input file would be processed several times. 
 A near 1GiB file would be processed around 36 times in its entirety. Thus 
 producing garbage results and taking up a lot more CPU time than needed.
 It took a while to figure out and what we found is that the default 
 implementation of the isSplittable method in 
 [org.apache.hadoop.mapreduce.lib.input.FileInputFormat | 
 http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java?view=markup
  ] is simply return true;. 
 This is a very unsafe default and is in contradiction with the JavaDoc of the 
 method which states: Is the given filename splitable? Usually, true, but if 
 the file is stream compressed, it will not be.  . The actual implementation 
 effectively does Is the given filename splitable? Always true, even if the 
 file is stream compressed using an unsplittable compression codec. 
 For our situation (where we always have Gzipped input) we took the easy way 
 out and simply implemented an isSplittable in our class that does return 
 false; 
 Now there are essentially 3 ways I can think of for fixing this (in order of 
 what I would find preferable):
 # Implement something that looks at the used compression of the file (i.e. do 
 migrate the implementation from TextInputFormat to FileInputFormat). This 
 would make the method do what the JavaDoc describes.
 # Force developers to think about it and make this method abstract.
 # Use a safe default (i.e. return false)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-2094) LineRecordReader should not seek into non-splittable, compressed streams.

2015-05-08 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-2094:
-
Summary: LineRecordReader should not seek into non-splittable, compressed 
streams.  (was: org.apache.hadoop.mapreduce.lib.input.FileInputFormat: 
isSplitable implements unsafe default behaviour that is different from the 
documented behaviour.)

 LineRecordReader should not seek into non-splittable, compressed streams.
 -

 Key: MAPREDUCE-2094
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2094
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Reporter: Niels Basjes
Assignee: Niels Basjes
 Attachments: M2094.patch, MAPREDUCE-2094-2011-05-19.patch, 
 MAPREDUCE-2094-20140727-svn-fixed-spaces.patch, 
 MAPREDUCE-2094-20140727-svn.patch, MAPREDUCE-2094-20140727.patch, 
 MAPREDUCE-2094-2015-05-05-2328.patch, 
 MAPREDUCE-2094-FileInputFormat-docs-v2.patch


 When implementing a custom derivative of FileInputFormat we ran into the 
 effect that a large Gzipped input file would be processed several times. 
 A near 1GiB file would be processed around 36 times in its entirety. Thus 
 producing garbage results and taking up a lot more CPU time than needed.
 It took a while to figure out and what we found is that the default 
 implementation of the isSplittable method in 
 [org.apache.hadoop.mapreduce.lib.input.FileInputFormat | 
 http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java?view=markup
  ] is simply return true;. 
 This is a very unsafe default and is in contradiction with the JavaDoc of the 
 method which states: Is the given filename splitable? Usually, true, but if 
 the file is stream compressed, it will not be.  . The actual implementation 
 effectively does Is the given filename splitable? Always true, even if the 
 file is stream compressed using an unsplittable compression codec. 
 For our situation (where we always have Gzipped input) we took the easy way 
 out and simply implemented an isSplittable in our class that does return 
 false; 
 Now there are essentially 3 ways I can think of for fixing this (in order of 
 what I would find preferable):
 # Implement something that looks at the used compression of the file (i.e. do 
 migrate the implementation from TextInputFormat to FileInputFormat). This 
 would make the method do what the JavaDoc describes.
 # Force developers to think about it and make this method abstract.
 # Use a safe default (i.e. return false)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4882) Error in estimating the length of the output file in Spill Phase

2015-05-08 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-4882:
-
  Resolution: Duplicate
   Fix Version/s: 2.6.0
Target Version/s:   (was: )
  Status: Resolved  (was: Patch Available)

Fixed in MAPREDUCE-6063. Sorry Jerry; didn't see this.

 Error in estimating the length of the output file in Spill Phase
 

 Key: MAPREDUCE-4882
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4882
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 0.20.2, 1.0.3
 Environment: Any Environment
Reporter: Lijie Xu
Assignee: Jerry Chen
  Labels: BB2015-05-TBR, patch
 Fix For: 2.6.0

 Attachments: MAPREDUCE-4882.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 The sortAndSpill() method in MapTask.java has an error in estimating the 
 length of the output file. 
 The long size should be (bufvoid - bufstart) + bufend not (bufvoid - 
 bufend) + bufstart when bufend  bufstart.
 Here is the original code in MapTask.java.
  private void sortAndSpill() throws IOException, ClassNotFoundException,
InterruptedException {
   //approximate the length of the output file to be the length of the
   //buffer + header lengths for the partitions
   long size = (bufend = bufstart
   ? bufend - bufstart
   : (bufvoid - bufend) + bufstart) +
   partitions * APPROX_HEADER_LENGTH;
   FSDataOutputStream out = null;
 --
 I had a test on TeraSort. A snippet from mapper's log is as follows:
 MapTask: Spilling map output: record full = true
 MapTask: bufstart = 157286200; bufend = 10485460; bufvoid = 199229440
 MapTask: kvstart = 262142; kvend = 131069; length = 655360
 MapTask: Finished spill 3
 In this occasioin, Spill Bytes should be (199229440 - 157286200) + 10485460 = 
 52428700 (52 MB) because the number of spilled records is 524287 and each 
 record costs 100B.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-2094) LineRecordReader should not seek into non-splittable, compressed streams.

2015-05-08 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-2094:
-
Attachment: M2094-1.patch

Ran test-patch locally, all OK except a spurious whitespace and a release audit 
warning (fixed)

 LineRecordReader should not seek into non-splittable, compressed streams.
 -

 Key: MAPREDUCE-2094
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2094
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Reporter: Niels Basjes
Assignee: Niels Basjes
 Attachments: M2094-1.patch, M2094.patch, 
 MAPREDUCE-2094-2011-05-19.patch, 
 MAPREDUCE-2094-20140727-svn-fixed-spaces.patch, 
 MAPREDUCE-2094-20140727-svn.patch, MAPREDUCE-2094-20140727.patch, 
 MAPREDUCE-2094-2015-05-05-2328.patch, 
 MAPREDUCE-2094-FileInputFormat-docs-v2.patch


 When implementing a custom derivative of FileInputFormat we ran into the 
 effect that a large Gzipped input file would be processed several times. 
 A near 1GiB file would be processed around 36 times in its entirety. Thus 
 producing garbage results and taking up a lot more CPU time than needed.
 It took a while to figure out and what we found is that the default 
 implementation of the isSplittable method in 
 [org.apache.hadoop.mapreduce.lib.input.FileInputFormat | 
 http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java?view=markup
  ] is simply return true;. 
 This is a very unsafe default and is in contradiction with the JavaDoc of the 
 method which states: Is the given filename splitable? Usually, true, but if 
 the file is stream compressed, it will not be.  . The actual implementation 
 effectively does Is the given filename splitable? Always true, even if the 
 file is stream compressed using an unsplittable compression codec. 
 For our situation (where we always have Gzipped input) we took the easy way 
 out and simply implemented an isSplittable in our class that does return 
 false; 
 Now there are essentially 3 ways I can think of for fixing this (in order of 
 what I would find preferable):
 # Implement something that looks at the used compression of the file (i.e. do 
 migrate the implementation from TextInputFormat to FileInputFormat). This 
 would make the method do what the JavaDoc describes.
 # Force developers to think about it and make this method abstract.
 # Use a safe default (i.e. return false)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-2094) LineRecordReader should not seek into non-splittable, compressed streams.

2015-05-08 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-2094:
-
   Resolution: Fixed
Fix Version/s: 2.8.0
 Release Note:   (was: Throw an Exception in the most common error scenario 
present in many FileInputFormat derivatives that do not override isSplitable. )
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

+1

I committed this to trunk and branch-2. Thanks Niels

 LineRecordReader should not seek into non-splittable, compressed streams.
 -

 Key: MAPREDUCE-2094
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2094
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Reporter: Niels Basjes
Assignee: Niels Basjes
 Fix For: 2.8.0

 Attachments: M2094-1.patch, M2094.patch, 
 MAPREDUCE-2094-2011-05-19.patch, 
 MAPREDUCE-2094-20140727-svn-fixed-spaces.patch, 
 MAPREDUCE-2094-20140727-svn.patch, MAPREDUCE-2094-20140727.patch, 
 MAPREDUCE-2094-2015-05-05-2328.patch, 
 MAPREDUCE-2094-FileInputFormat-docs-v2.patch


 When implementing a custom derivative of FileInputFormat we ran into the 
 effect that a large Gzipped input file would be processed several times. 
 A near 1GiB file would be processed around 36 times in its entirety. Thus 
 producing garbage results and taking up a lot more CPU time than needed.
 It took a while to figure out and what we found is that the default 
 implementation of the isSplittable method in 
 [org.apache.hadoop.mapreduce.lib.input.FileInputFormat | 
 http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java?view=markup
  ] is simply return true;. 
 This is a very unsafe default and is in contradiction with the JavaDoc of the 
 method which states: Is the given filename splitable? Usually, true, but if 
 the file is stream compressed, it will not be.  . The actual implementation 
 effectively does Is the given filename splitable? Always true, even if the 
 file is stream compressed using an unsplittable compression codec. 
 For our situation (where we always have Gzipped input) we took the easy way 
 out and simply implemented an isSplittable in our class that does return 
 false; 
 Now there are essentially 3 ways I can think of for fixing this (in order of 
 what I would find preferable):
 # Implement something that looks at the used compression of the file (i.e. do 
 migrate the implementation from TextInputFormat to FileInputFormat). This 
 would make the method do what the JavaDoc describes.
 # Force developers to think about it and make this method abstract.
 # Use a safe default (i.e. return false)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4469) Resource calculation in child tasks is CPU-heavy

2015-05-08 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-4469:
-
Status: Open  (was: Patch Available)

branch-1 is not getting new fixes, but it looks like {{ProcfsBasedProcessTree}} 
in trunk could benefit from this same set of optimizations. I'll leave the 
issue open, in case someone has the time and inclination to rebase it.

 Resource calculation in child tasks is CPU-heavy
 

 Key: MAPREDUCE-4469
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4469
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: performance, task
Affects Versions: 1.0.3
Reporter: Todd Lipcon
Assignee: Ahmed Radwan
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-4469.patch, MAPREDUCE-4469_rev2.patch, 
 MAPREDUCE-4469_rev3.patch, MAPREDUCE-4469_rev4.patch, 
 MAPREDUCE-4469_rev5.patch


 In doing some benchmarking on a hadoop-1 derived codebase, I noticed that 
 each of the child tasks was doing a ton of syscalls. Upon stracing, I noticed 
 that it's spending a lot of time looping through all the files in /proc to 
 calculate resource usage.
 As a test, I added a flag to disable use of the ResourceCalculatorPlugin 
 within the tasks. On a CPU-bound 500G-sort workload, this improved total job 
 runtime by about 10% (map slot-seconds by 14%, reduce slot seconds by 8%)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-3936) Clients should not enforce counter limits

2015-05-08 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-3936:
-
Status: Open  (was: Patch Available)

Patch no longer applies.

In the context of YARN-2928, presumably the timeline server can handle 
unlimited counters. The limit can already be set very high... is this still 
relevant for MR? If nobody plans to work on this, please close as later or 
won't fix

 Clients should not enforce counter limits 
 --

 Key: MAPREDUCE-3936
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3936
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv1
Reporter: Tom White
Assignee: Tom White
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-3936.patch, MAPREDUCE-3936.patch


 The code for enforcing counter limits (from MAPREDUCE-1943) creates a static 
 JobConf instance to load the limits, which may throw an exception if the 
 client limit is set to be lower than the limit on the cluster (perhaps 
 because the cluster limit was raised from the default).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6038) A boolean may be set error in the Word Count v2.0 in MapReduce Tutorial

2015-05-08 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-6038:
-
Attachment: MAPREDUCE-6038.2.patch

Rebased patch

 A boolean may be set error in the Word Count v2.0 in MapReduce Tutorial
 ---

 Key: MAPREDUCE-6038
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6038
 Project: Hadoop Map/Reduce
  Issue Type: Bug
 Environment: java version 1.8.0_11 hostspot 64-bit
Reporter: Pei Ma
Assignee: Tsuyoshi Ozawa
Priority: Minor
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-6038.1.patch, MAPREDUCE-6038.2.patch


 As a beginner, when I learned about the basic of the mr, I found that I 
 cound't run the WordCount2 using the command bin/hadoop jar wc.jar 
 WordCount2 /user/joe/wordcount/input /user/joe/wordcount/output in the 
 Tutorial. The VM throwed the NullPoniterException at the line 47. In the line 
 45, the returned default value of conf.getBoolean is true. That is to say  
 when wordcount.skip.patterns is not set ,the WordCount2 will continue to 
 execute getCacheFiles.. Then patternsURIs gets the null value. When the 
 -skip option dosen't exist,  wordcount.skip.patterns will not be set. 
 Then a NullPointerException come out.
 At all, the block after the if-statement in line no. 45 shoudn't be executed 
 when the -skip option dosen't exist in command. Maybe the line 45 should 
 like that  if (conf.getBoolean(wordcount.skip.patterns, false)) { 
 .Just change the boolean.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4473) tasktracker rank on machines.jsp?type=active

2015-05-08 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-4473:
-
  Resolution: Won't Fix
Target Version/s: 1.0.3, 1.0.2, 1.0.1, 1.0.0  (was: 1.0.0, 1.0.1, 1.0.2, 
1.0.3)
  Status: Resolved  (was: Patch Available)

Closing as branch-1 is unlikely to be released

 tasktracker rank on machines.jsp?type=active
 

 Key: MAPREDUCE-4473
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4473
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: tasktracker
Affects Versions: 0.20.2, 0.21.0, 0.22.0, 0.23.0, 0.23.1, 1.0.0, 1.0.1, 
 1.0.2, 1.0.3
Reporter: jian fan
Priority: Minor
  Labels: BB2015-05-TBR, tasktracker
 Attachments: MAPREDUCE-4473.patch


 sometimes we need to simple judge which tasktracker is down from the page of 
 machines.jsp?type=active



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5150) Backport 2009 terasort (MAPREDUCE-639) to branch-1

2015-05-08 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-5150:
-
Resolution: Won't Fix
Status: Resolved  (was: Patch Available)

Closing as WONTFIX, since branch-1 is unlikely to be released.

 Backport 2009 terasort (MAPREDUCE-639) to branch-1
 --

 Key: MAPREDUCE-5150
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5150
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: examples
Affects Versions: 1.2.0
Reporter: Gera Shegalov
Priority: Minor
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-5150-branch-1.patch


 Users evaluate performance of Hadoop clusters using different benchmarks such 
 as TeraSort. However, terasort version in branch-1 is outdated. It works on 
 teragen dataset that cannot exceed 4 billion unique keys and it does not have 
 the fast non-sampling partitioner SimplePartitioner either.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6023) Fix SuppressWarnings from unchecked to rawtypes in O.A.H.mapreduce.lib.input.TaggedInputSplit

2015-05-08 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-6023:
-
Status: Open  (was: Patch Available)

This is creating new javac warnings, not correcting the errant usage.

 Fix SuppressWarnings from unchecked to rawtypes in 
 O.A.H.mapreduce.lib.input.TaggedInputSplit
 -

 Key: MAPREDUCE-6023
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6023
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Junping Du
Assignee: Abhilash Srimat Tirumala Pallerlamudi
Priority: Minor
  Labels: BB2015-05-TBR, newbie
 Attachments: MAPREDUCE-6023.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-2094) org.apache.hadoop.mapreduce.lib.input.FileInputFormat: isSplitable implements unsafe default behaviour that is different from the documented behaviour.

2015-05-06 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14531171#comment-14531171
 ] 

Chris Douglas commented on MAPREDUCE-2094:
--

[Given|http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201405.mbox/%3CCADoiZqoBKme-HYoM%3DhRxPEs1w2qdevo0%3DaoihqiWT4vS8D42Yg%40mail.gmail.com%3E]
 
[discussion|https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201406.mbox/%3ccadoizqoqkpn_7b9w75dcrvjxz1sqbkryqbrwlw1rwo26a4e...@mail.gmail.com%3E]
 on the dev list, the following error message:
{noformat}
+  throw new IOException(
+Implementation bug in the used FileInputFormat:  +
+The isSplitable method returned 'true' on a file that  +
+was compressed with a non splittable compression codec.  
+
+If you get this right after upgrading Hadoop then know +
+that you have been looking at reports based on  +
+corrupt data for a long time !!! (see: MAPREDUCE-2094));
{noformat}
is a little over the top. Please just report the error detected e.g., {{Cannot 
seek in  + codec.getClass().getSimpleName() +   compressed stream}}

 org.apache.hadoop.mapreduce.lib.input.FileInputFormat: isSplitable implements 
 unsafe default behaviour that is different from the documented behaviour.
 ---

 Key: MAPREDUCE-2094
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2094
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: task
Reporter: Niels Basjes
Assignee: Niels Basjes
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-2094-2011-05-19.patch, 
 MAPREDUCE-2094-20140727-svn-fixed-spaces.patch, 
 MAPREDUCE-2094-20140727-svn.patch, MAPREDUCE-2094-20140727.patch, 
 MAPREDUCE-2094-2015-05-05-2328.patch, 
 MAPREDUCE-2094-FileInputFormat-docs-v2.patch


 When implementing a custom derivative of FileInputFormat we ran into the 
 effect that a large Gzipped input file would be processed several times. 
 A near 1GiB file would be processed around 36 times in its entirety. Thus 
 producing garbage results and taking up a lot more CPU time than needed.
 It took a while to figure out and what we found is that the default 
 implementation of the isSplittable method in 
 [org.apache.hadoop.mapreduce.lib.input.FileInputFormat | 
 http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java?view=markup
  ] is simply return true;. 
 This is a very unsafe default and is in contradiction with the JavaDoc of the 
 method which states: Is the given filename splitable? Usually, true, but if 
 the file is stream compressed, it will not be.  . The actual implementation 
 effectively does Is the given filename splitable? Always true, even if the 
 file is stream compressed using an unsplittable compression codec. 
 For our situation (where we always have Gzipped input) we took the easy way 
 out and simply implemented an isSplittable in our class that does return 
 false; 
 Now there are essentially 3 ways I can think of for fixing this (in order of 
 what I would find preferable):
 # Implement something that looks at the used compression of the file (i.e. do 
 migrate the implementation from TextInputFormat to FileInputFormat). This 
 would make the method do what the JavaDoc describes.
 # Force developers to think about it and make this method abstract.
 # Use a safe default (i.e. return false)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6220) Provide option to suppress stdout of MapReduce task

2015-04-24 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512014#comment-14512014
 ] 

Chris Douglas commented on MAPREDUCE-6220:
--

bq. I think YangHao may have been thinking of putting this option in there for 
production clusters.

I understood the intent, but the value in the patch is taken from the user 
configuration.

Agree on closing this as WONTFIX.

 Provide option to suppress stdout of MapReduce task
 ---

 Key: MAPREDUCE-6220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6220
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Reporter: Yang Hao
Assignee: Yang Hao
 Attachments: MAPREDUCE-6220.patch, MAPREDUCE-6220.v2.patch


 System.out is a ugly way to print log, and many times it would do harm to 
 Hadoop cluster. So we can provide an option to forbid it



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6220) Provide option to suppress stdout of MapReduce task

2015-04-23 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510090#comment-14510090
 ] 

Chris Douglas commented on MAPREDUCE-6220:
--

The patch won't work on Windows.

This seems unlikely to solve the problem... users printing lots of messages to 
stdout/stderr won't think to suppress the output using an esoteric config knob 
before they submit. Until YARN limits disk usage, corralling users to run on 
partitions separated from HDFS, etc. will at least limit the damage done by 
containers.

 Provide option to suppress stdout of MapReduce task
 ---

 Key: MAPREDUCE-6220
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6220
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Reporter: Yang Hao
Assignee: Yang Hao
 Attachments: MAPREDUCE-6220.patch, MAPREDUCE-6220.v2.patch


 System.out is a ugly way to print log, and many times it would do harm to 
 Hadoop cluster. So we can provide an option to forbid it



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6240) Hadoop client displays confusing error message

2015-02-04 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-6240:
-
Status: Open  (was: Patch Available)

bq. The intuition behind chaining is the following. We would not try provider 2 
if provider 1 worked, in other words: our provider 2 failures are indirectly 
caused by provider 1 failures.

Wait, this is synthetically stitching exceptions together from a retry loop? 
That would be very confusing to debug. Have you looked at an approach like 
[MultipleIOException|https://git1-us-west.apache.org/repos/asf?p=hadoop.git;a=blob;f=hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/MultipleIOException.java;h=5e584c9cd0705471a826932d782eec409b5bae37;hb=HEAD]?

There's also a spurious change to {{AbstractFileSystem}} in the latest patch.

 Hadoop client displays confusing error message
 --

 Key: MAPREDUCE-6240
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6240
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Reporter: Mohammad Kamrul Islam
Assignee: Mohammad Kamrul Islam
 Attachments: MAPREDUCE-6240-gera.001.patch, 
 MAPREDUCE-6240-gera.001.patch, MAPREDUCE-6240-gera.002.patch, 
 MAPREDUCE-6240.1.patch


 Hadoop client often throws exception  with java.io.IOException: Cannot 
 initialize Cluster. Please check your configuration for 
 mapreduce.framework.name and the correspond server addresses.
 This is a misleading and generic message for any cluster initialization 
 problem. It takes a lot of debugging hours to identify the root cause. The 
 correct error message could resolve this problem quickly.
 In one such instance, Oozie log showed the following exception  while the 
 root cause was CNF  that Hadoop client didn't return in the exception.
 {noformat}
  JA009: Cannot initialize Cluster. Please check your configuration for 
 mapreduce.framework.name and the correspond server addresses.
 at 
 org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:412)
 at 
 org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:392)
 at 
 org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:979)
 at 
 org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1134)
 at 
 org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:228)
 at 
 org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
 at org.apache.oozie.command.XCommand.call(XCommand.java:281)
 at 
 org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:323)
 at 
 org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:252)
 at 
 org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:174)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: java.io.IOException: Cannot initialize Cluster. Please check your 
 configuration for mapreduce.framework.name and the correspond server 
 addresses.
 at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120)
 at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:82)
 at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:75)
 at org.apache.hadoop.mapred.JobClient.init(JobClient.java:470)
 at org.apache.hadoop.mapred.JobClient.init(JobClient.java:449)
 at 
 org.apache.oozie.service.HadoopAccessorService$1.run(HadoopAccessorService.java:372)
 at 
 org.apache.oozie.service.HadoopAccessorService$1.run(HadoopAccessorService.java:370)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
 at 
 org.apache.oozie.service.HadoopAccessorService.createJobClient(HadoopAccessorService.java:379)
 at 
 org.apache.oozie.action.hadoop.JavaActionExecutor.createJobClient(JavaActionExecutor.java:1185)
 at 
 org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:927)
  ... 10 more
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6240) Hadoop client displays confusing error message

2015-02-04 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14306365#comment-14306365
 ] 

Chris Douglas commented on MAPREDUCE-6240:
--

I see. Would it be possible to construct a test that's both more direct for the 
fix and doesn't require a change to common? Possibly initialize two providers 
that throw different exceptions, then verify that both messages are in the 
output. Just looking at the patch so I don't know the context, but wouldn't the 
old code pass, also?

 Hadoop client displays confusing error message
 --

 Key: MAPREDUCE-6240
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6240
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Reporter: Mohammad Kamrul Islam
Assignee: Mohammad Kamrul Islam
 Attachments: MAPREDUCE-6240-gera.001.patch, 
 MAPREDUCE-6240-gera.001.patch, MAPREDUCE-6240-gera.002.patch, 
 MAPREDUCE-6240.1.patch


 Hadoop client often throws exception  with java.io.IOException: Cannot 
 initialize Cluster. Please check your configuration for 
 mapreduce.framework.name and the correspond server addresses.
 This is a misleading and generic message for any cluster initialization 
 problem. It takes a lot of debugging hours to identify the root cause. The 
 correct error message could resolve this problem quickly.
 In one such instance, Oozie log showed the following exception  while the 
 root cause was CNF  that Hadoop client didn't return in the exception.
 {noformat}
  JA009: Cannot initialize Cluster. Please check your configuration for 
 mapreduce.framework.name and the correspond server addresses.
 at 
 org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:412)
 at 
 org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:392)
 at 
 org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:979)
 at 
 org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1134)
 at 
 org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:228)
 at 
 org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63)
 at org.apache.oozie.command.XCommand.call(XCommand.java:281)
 at 
 org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:323)
 at 
 org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:252)
 at 
 org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:174)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:744)
 Caused by: java.io.IOException: Cannot initialize Cluster. Please check your 
 configuration for mapreduce.framework.name and the correspond server 
 addresses.
 at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120)
 at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:82)
 at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:75)
 at org.apache.hadoop.mapred.JobClient.init(JobClient.java:470)
 at org.apache.hadoop.mapred.JobClient.init(JobClient.java:449)
 at 
 org.apache.oozie.service.HadoopAccessorService$1.run(HadoopAccessorService.java:372)
 at 
 org.apache.oozie.service.HadoopAccessorService$1.run(HadoopAccessorService.java:370)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
 at 
 org.apache.oozie.service.HadoopAccessorService.createJobClient(HadoopAccessorService.java:379)
 at 
 org.apache.oozie.action.hadoop.JavaActionExecutor.createJobClient(JavaActionExecutor.java:1185)
 at 
 org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:927)
  ... 10 more
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6094) TestMRCJCFileInputFormat.testAddInputPath() fails on trunk

2014-09-26 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14149716#comment-14149716
 ] 

Chris Douglas commented on MAPREDUCE-6094:
--

+1

 TestMRCJCFileInputFormat.testAddInputPath() fails on trunk
 --

 Key: MAPREDUCE-6094
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6094
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Reporter: Sangjin Lee
Assignee: Akira AJISAKA
Priority: Minor
 Attachments: MAPREDUCE-6094.patch


 {noformat}
 Tests run: 6, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.624 sec  
 FAILURE! - in org.apache.hadoop.mapreduce.lib.input.TestMRCJCFileInputFormat
 testAddInputPath(org.apache.hadoop.mapreduce.lib.input.TestMRCJCFileInputFormat)
   Time elapsed: 0.886 sec   ERROR!
 java.io.IOException: No FileSystem for scheme: s3
   at 
 org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2583)
   at 
 org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2590)
   at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
   at 
 org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2629)
   at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2611)
   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:169)
   at 
 org.apache.hadoop.mapreduce.lib.input.TestMRCJCFileInputFormat.testAddInputPath(TestMRCJCFileInputFormat.java:55)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (MAPREDUCE-6103) Adding reservation APIs to resource manager delegate

2014-09-24 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-6103:
-
Comment: was deleted

(was: bq. For example to run the sample pi job with queue sla and reservation 
**, you now can

Read it as For example to run the sample pi job with queue sla and reservation 
*reservation_1411602647912_0001 *, you now can)

 Adding reservation APIs to resource manager delegate
 

 Key: MAPREDUCE-6103
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6103
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Subru Krishnan
Assignee: Subru Krishnan
 Attachments: MR-6103.patch, MR-6103.patch


 YARN-1051 introduces the ReservationSystem and the corresponding APIs for 
 create/update/delete ops. The MR resource manager delegate needs to to be 
 updated with the APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (MAPREDUCE-6103) Adding reservation APIs to resource manager delegate

2014-09-24 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14147130#comment-14147130
 ] 

Chris Douglas edited comment on MAPREDUCE-6103 at 9/25/14 12:11 AM:


Thanks [~chris.douglas] for taking a look. I realized that there was no easy 
way to specify the reservation id so added support for it in YARNRunner so now 
users can specify reservation id just like they currently do queue names.

For example to run the sample _pi_ job with queue _sla_ and reservation 
*reservation_1411602647912_0001*, you now can
{noformat}
 hadoop jar 
hadoop-3.0.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-SNAPSHOT.jar
 pi \
  -Dmapreduce.job.queuename=sla \
  -Dmapreduce.job.reservation.id=reservation_1411602647912_0001 \
  -Dyarn.app.mapreduce.am.resource.mb=1024 \
  3 10
{noformat}


was (Author: subru):
Thanks [~chris.douglas] for taking a look. I realized that there was no easy 
way to specify the reservation id so added support for it in YARNRunner so now 
users can specify reservation id just like they currently do queue names.

For example to run the sample _pi_ job with queue _sla_ and reservation **, you 
now can
 hadoop jar 
hadoop-3.0.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-SNAPSHOT.jar
 pi -Dmapreduce.job.queuename=sla 
-Dmapreduce.job.reservation.id=reservation_1411602647912_0001 
-Dyarn.app.mapreduce.am.resource.mb=1024 3 10

 Adding reservation APIs to resource manager delegate
 

 Key: MAPREDUCE-6103
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6103
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Subru Krishnan
Assignee: Subru Krishnan
 Attachments: MR-6103.patch, MR-6103.patch


 YARN-1051 introduces the ReservationSystem and the corresponding APIs for 
 create/update/delete ops. The MR resource manager delegate needs to to be 
 updated with the APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6103) Adding reservation APIs to resource manager delegate

2014-09-24 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14147195#comment-14147195
 ] 

Chris Douglas commented on MAPREDUCE-6103:
--

The {{YARNRunner}} changes look good, though discussing with [~subru] there are 
a few conditions where {{ReservationId::parseReservationId()}} can be 
malformed, but return {{null}}. It might be clearer if this method were to 
throw an exception on all input that can't be parsed into a {{ReservationId}}, 
including null.

+1 overall, though

 Adding reservation APIs to resource manager delegate
 

 Key: MAPREDUCE-6103
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6103
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Subru Krishnan
Assignee: Subru Krishnan
 Attachments: MR-6103.patch, MR-6103.patch


 YARN-1051 introduces the ReservationSystem and the corresponding APIs for 
 create/update/delete ops. The MR resource manager delegate needs to to be 
 updated with the APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6103) Adding reservation APIs to resource manager delegate

2014-09-22 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144171#comment-14144171
 ] 

Chris Douglas commented on MAPREDUCE-6103:
--

+1 straightforward update to the branch

 Adding reservation APIs to resource manager delegate
 

 Key: MAPREDUCE-6103
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6103
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Subru Krishnan
Assignee: Subru Krishnan
 Attachments: MR-6103.patch


 YARN-1051 introduces the ReservationSystem and the corresponding APIs for 
 create/update/delete ops. The MR resource manager delegate needs to to be 
 updated with the APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6063) In sortAndSpill of MapTask.java, size is calculated wrongly when bufend bufstart.

2014-09-04 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14121968#comment-14121968
 ] 

Chris Douglas commented on MAPREDUCE-6063:
--

bq. This issue is also in MR1 (branch-1) . I attached a patch 
MAPREDUCE-6063.branch-1.patch for branch-1.

Done. Thanks again for the fix

 In sortAndSpill of MapTask.java, size is calculated wrongly when bufend  
 bufstart.
 ---

 Key: MAPREDUCE-6063
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6063
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1, mrv2
Reporter: zhihai xu
Assignee: zhihai xu
 Fix For: 3.0.0, 2.6.0

 Attachments: MAPREDUCE-6063.000.patch, MAPREDUCE-6063.branch-1.patch


 In sortAndSpill of MapTask.java, size is calculated wrongly when bufend  
 bufstart.  we should change (bufvoid - bufend) + bufstart to (bufvoid - 
 bufstart) + bufend.
 Should change
 {code}
  long size = (bufend = bufstart
   ? bufend - bufstart
   : (bufvoid - bufend) + bufstart) +
   partitions * APPROX_HEADER_LENGTH;
 {code}
 to:
 {code}
  long size = (bufend = bufstart
   ? bufend - bufstart
   : (bufvoid - bufstart) + bufend) +
   partitions * APPROX_HEADER_LENGTH;
 {code}
 It is because when wraparound happen (bufend  bufstart) ,  the size should 
 bufvoid - bufstart (bigger one) + bufend(small one).
 You can find similar code implementation in MapTask.java:
 {code}
 mapOutputByteCounter.increment(valend = keystart
 ? valend - keystart
 : (bufvoid - keystart) + valend);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6063) In sortAndSpill of MapTask.java, size is calculated wrongly when bufend bufstart.

2014-09-03 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-6063:
-
   Resolution: Fixed
Fix Version/s: 2.6.0
   3.0.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

+1 Good catch

I committed this to trunk and branch-2

 In sortAndSpill of MapTask.java, size is calculated wrongly when bufend  
 bufstart.
 ---

 Key: MAPREDUCE-6063
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6063
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv1, mrv2
Reporter: zhihai xu
Assignee: zhihai xu
 Fix For: 3.0.0, 2.6.0

 Attachments: MAPREDUCE-6063.000.patch


 In sortAndSpill of MapTask.java, size is calculated wrongly when bufend  
 bufstart.  we should change (bufvoid - bufend) + bufstart to (bufvoid - 
 bufstart) + bufend.
 Should change
 {code}
  long size = (bufend = bufstart
   ? bufend - bufstart
   : (bufvoid - bufend) + bufstart) +
   partitions * APPROX_HEADER_LENGTH;
 {code}
 to:
 {code}
  long size = (bufend = bufstart
   ? bufend - bufstart
   : (bufvoid - bufstart) + bufend) +
   partitions * APPROX_HEADER_LENGTH;
 {code}
 It is because when wraparound happen (bufend  bufstart) ,  the size should 
 bufvoid - bufstart (bigger one) + bufend(small one).
 You can find similar code implementation in MapTask.java:
 {code}
 mapOutputByteCounter.increment(valend = keystart
 ? valend - keystart
 : (bufvoid - keystart) + valend);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6051) Fix typos in log messages

2014-08-28 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-6051:
-

   Resolution: Fixed
Fix Version/s: 2.6.0
   3.0.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

+1

I committed this. Thanks, Ray

 Fix typos in log messages
 -

 Key: MAPREDUCE-6051
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6051
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Ray Chiang
Assignee: Ray Chiang
Priority: Trivial
  Labels: newbie
 Fix For: 3.0.0, 2.6.0

 Attachments: MAPREDUCE-6051-01.patch


 There are a bunch of typos in log messages. HADOOP-10946 was initially 
 created, but may have failed due to being in multiple components. Try fixing 
 typos on a per-component basis.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5974) Allow map output collector fallback

2014-08-15 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099345#comment-14099345
 ] 

Chris Douglas commented on MAPREDUCE-5974:
--

Sorry, I didn't mean to hold this up. +0

 Allow map output collector fallback
 ---

 Key: MAPREDUCE-5974
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5974
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: task
Affects Versions: 2.6.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: mapreduce-5974.txt


 Currently we only allow specifying a single MapOutputCollector implementation 
 class in a job. It would be nice to allow a comma-separated list of classes: 
 we should try each collector implementation in the user-specified order until 
 we find one that can be successfully instantiated and initted.
 This is useful for cases where a particular optimized collector 
 implementation cannot operate on all key/value types, or requires native 
 code. The cluster administrator can configure the cluster to try to use the 
 optimized collector and fall back to the default collector.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5974) Allow map output collector fallback

2014-07-23 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14071923#comment-14071923
 ] 

Chris Douglas commented on MAPREDUCE-5974:
--

bq. Implementing this inside the native collector init() method itself might be 
messy – you'd have to essentially write a wrapper collector and have every 
method delegate to the real implementation. I would hope that the delegation 
would get devirtualized and inlined, but not certain about that.

I hadn't considered that; I'm not sure either. I'm mostly ambivalent about the 
alternatives, assuming the majority of jobs will configure a single collector. 
There's a case to be made for throwing the original exception in that case, but 
it's not worth much hand-wringing.

 Allow map output collector fallback
 ---

 Key: MAPREDUCE-5974
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5974
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: task
Affects Versions: 2.6.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: mapreduce-5974.txt


 Currently we only allow specifying a single MapOutputCollector implementation 
 class in a job. It would be nice to allow a comma-separated list of classes: 
 we should try each collector implementation in the user-specified order until 
 we find one that can be successfully instantiated and initted.
 This is useful for cases where a particular optimized collector 
 implementation cannot operate on all key/value types, or requires native 
 code. The cluster administrator can configure the cluster to try to use the 
 optimized collector and fall back to the default collector.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5974) Allow map output collector fallback

2014-07-22 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14071174#comment-14071174
 ] 

Chris Douglas commented on MAPREDUCE-5974:
--

bq. Doing fallback as the records are emitted would be pretty neat, but may 
also be somewhat difficult. [snip]

*nod* Fair enough, though if each MapTask is making independent decisions about 
the collector, they still need to agree on the format for the shuffle. Spilling 
one collector to disk and changing strategies should be compatible, assuming 
there isn't a different format for intermediate spills. But yeah, this is very 
abstract, given the use cases we have.

If the goal is to support a fallback collector when native libs aren't 
available; given the dependency on intermediate format, should the swap be 
internal to the native collector, even in init? If the interface were like the 
serialization, then one might use the keytype, etc. to pick the 
most-appropriate collector. As failover, I'm struggling to come up with a case 
that's not covered by making this an internal detail of the native collector.

 Allow map output collector fallback
 ---

 Key: MAPREDUCE-5974
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5974
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: task
Affects Versions: 2.6.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: mapreduce-5974.txt


 Currently we only allow specifying a single MapOutputCollector implementation 
 class in a job. It would be nice to allow a comma-separated list of classes: 
 we should try each collector implementation in the user-specified order until 
 we find one that can be successfully instantiated and initted.
 This is useful for cases where a particular optimized collector 
 implementation cannot operate on all key/value types, or requires native 
 code. The cluster administrator can configure the cluster to try to use the 
 optimized collector and fall back to the default collector.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5974) Allow map output collector fallback

2014-07-17 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065752#comment-14065752
 ] 

Chris Douglas commented on MAPREDUCE-5974:
--

Could this be equivalently implemented as a composite collector using the 
existing plugin architecture? Trading off collector implementations at runtime 
is a cool idea, but if the criteria are available to {{init}} then they're also 
available during submission (excluding availability of local dependencies or 
arch restrictions in heterogeneous clusters).

Changing strategies during the collection phase based on the records emitted 
seems to have equivalent or better potential, and is covered by the composite 
strategy, also.

 Allow map output collector fallback
 ---

 Key: MAPREDUCE-5974
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5974
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: task
Affects Versions: 2.6.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: mapreduce-5974.txt


 Currently we only allow specifying a single MapOutputCollector implementation 
 class in a job. It would be nice to allow a comma-separated list of classes: 
 we should try each collector implementation in the user-specified order until 
 we find one that can be successfully instantiated and initted.
 This is useful for cases where a particular optimized collector 
 implementation cannot operate on all key/value types, or requires native 
 code. The cluster administrator can configure the cluster to try to use the 
 optimized collector and fall back to the default collector.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem

2014-07-10 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058364#comment-14058364
 ] 

Chris Douglas commented on MAPREDUCE-5890:
--

Yes; thanks [~asuresh] for your patience in seeing this through.

 Support for encrypting Intermediate data and spills in local filesystem
 ---

 Key: MAPREDUCE-5890
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: security
Affects Versions: 2.4.0
Reporter: Alejandro Abdelnur
Assignee: Arun Suresh
  Labels: encryption
 Fix For: fs-encryption

 Attachments: MAPREDUCE-5890.10.patch, MAPREDUCE-5890.11.patch, 
 MAPREDUCE-5890.12.patch, MAPREDUCE-5890.13.patch, MAPREDUCE-5890.14.patch, 
 MAPREDUCE-5890.15.patch, MAPREDUCE-5890.3.patch, MAPREDUCE-5890.4.patch, 
 MAPREDUCE-5890.5.patch, MAPREDUCE-5890.6.patch, MAPREDUCE-5890.7.patch, 
 MAPREDUCE-5890.8.patch, MAPREDUCE-5890.9.patch, 
 org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt, 
 syslog.tar.gz


 For some sensitive data, encryption while in flight (network) is not 
 sufficient, it is required that while at rest it should be encrypted. 
 HADOOP-10150  HDFS-6134 bring encryption at rest for data in filesystem 
 using Hadoop FileSystem API. MapReduce intermediate data and spills should 
 also be encrypted while at rest.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem

2014-07-09 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056631#comment-14056631
 ] 

Chris Douglas commented on MAPREDUCE-5890:
--

I was thinking {{o.a.h.mapred}}, with other internal classes.

 Support for encrypting Intermediate data and spills in local filesystem
 ---

 Key: MAPREDUCE-5890
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: security
Affects Versions: 2.4.0
Reporter: Alejandro Abdelnur
Assignee: Arun Suresh
  Labels: encryption
 Attachments: MAPREDUCE-5890.10.patch, MAPREDUCE-5890.11.patch, 
 MAPREDUCE-5890.12.patch, MAPREDUCE-5890.13.patch, MAPREDUCE-5890.3.patch, 
 MAPREDUCE-5890.4.patch, MAPREDUCE-5890.5.patch, MAPREDUCE-5890.6.patch, 
 MAPREDUCE-5890.7.patch, MAPREDUCE-5890.8.patch, MAPREDUCE-5890.9.patch, 
 org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt, 
 syslog.tar.gz


 For some sensitive data, encryption while in flight (network) is not 
 sufficient, it is required that while at rest it should be encrypted. 
 HADOOP-10150  HDFS-6134 bring encryption at rest for data in filesystem 
 using Hadoop FileSystem API. MapReduce intermediate data and spills should 
 also be encrypted while at rest.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem

2014-07-08 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14055613#comment-14055613
 ] 

Chris Douglas commented on MAPREDUCE-5890:
--

Yes, I'm OK with the current patch. This approach won't scale to another 
feature, but it can be preserved in a refactoring.

My only remaining ask (fine to add during commit) is that {{CryptoUtils}} be 
annotated with {{@Private}} and {{@Unstable}}, so it's clearly marked as an 
implementation detail. If it could be package-private that would be even 
better, though I haven't checked to see if there's anything else in the 
{{o.a.h.mapreduce.task.crypto}} package.

 Support for encrypting Intermediate data and spills in local filesystem
 ---

 Key: MAPREDUCE-5890
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: security
Affects Versions: 2.4.0
Reporter: Alejandro Abdelnur
Assignee: Arun Suresh
  Labels: encryption
 Attachments: MAPREDUCE-5890.10.patch, MAPREDUCE-5890.11.patch, 
 MAPREDUCE-5890.12.patch, MAPREDUCE-5890.3.patch, MAPREDUCE-5890.4.patch, 
 MAPREDUCE-5890.5.patch, MAPREDUCE-5890.6.patch, MAPREDUCE-5890.7.patch, 
 MAPREDUCE-5890.8.patch, MAPREDUCE-5890.9.patch, 
 org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt, 
 syslog.tar.gz


 For some sensitive data, encryption while in flight (network) is not 
 sufficient, it is required that while at rest it should be encrypted. 
 HADOOP-10150  HDFS-6134 bring encryption at rest for data in filesystem 
 using Hadoop FileSystem API. MapReduce intermediate data and spills should 
 also be encrypted while at rest.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem

2014-07-08 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14055718#comment-14055718
 ] 

Chris Douglas commented on MAPREDUCE-5890:
--

Sorry, I meant that if {{o.a.h.mapreduce.task.crypto}} only has {{CryptoUtils}} 
in it, then maybe the new package isn't necessary.

 Support for encrypting Intermediate data and spills in local filesystem
 ---

 Key: MAPREDUCE-5890
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: security
Affects Versions: 2.4.0
Reporter: Alejandro Abdelnur
Assignee: Arun Suresh
  Labels: encryption
 Attachments: MAPREDUCE-5890.10.patch, MAPREDUCE-5890.11.patch, 
 MAPREDUCE-5890.12.patch, MAPREDUCE-5890.13.patch, MAPREDUCE-5890.3.patch, 
 MAPREDUCE-5890.4.patch, MAPREDUCE-5890.5.patch, MAPREDUCE-5890.6.patch, 
 MAPREDUCE-5890.7.patch, MAPREDUCE-5890.8.patch, MAPREDUCE-5890.9.patch, 
 org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt, 
 syslog.tar.gz


 For some sensitive data, encryption while in flight (network) is not 
 sufficient, it is required that while at rest it should be encrypted. 
 HADOOP-10150  HDFS-6134 bring encryption at rest for data in filesystem 
 using Hadoop FileSystem API. MapReduce intermediate data and spills should 
 also be encrypted while at rest.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem

2014-07-06 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14053251#comment-14053251
 ] 

Chris Douglas commented on MAPREDUCE-5890:
--

OK... untangling the abstractions can be deferred. The current patch spreads 
the feature across the code in a way that's not ideal to maintain, but it 
addresses all the functional feedback by moving the IV inline.

Thanks [~asuresh] for all the iterations on this.

 Support for encrypting Intermediate data and spills in local filesystem
 ---

 Key: MAPREDUCE-5890
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: security
Affects Versions: 2.4.0
Reporter: Alejandro Abdelnur
Assignee: Arun Suresh
  Labels: encryption
 Attachments: MAPREDUCE-5890.10.patch, MAPREDUCE-5890.11.patch, 
 MAPREDUCE-5890.12.patch, MAPREDUCE-5890.3.patch, MAPREDUCE-5890.4.patch, 
 MAPREDUCE-5890.5.patch, MAPREDUCE-5890.6.patch, MAPREDUCE-5890.7.patch, 
 MAPREDUCE-5890.8.patch, MAPREDUCE-5890.9.patch, 
 org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt, 
 syslog.tar.gz


 For some sensitive data, encryption while in flight (network) is not 
 sufficient, it is required that while at rest it should be encrypted. 
 HADOOP-10150  HDFS-6134 bring encryption at rest for data in filesystem 
 using Hadoop FileSystem API. MapReduce intermediate data and spills should 
 also be encrypted while at rest.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-2841) Task level native optimization

2014-06-30 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047804#comment-14047804
 ] 

Chris Douglas commented on MAPREDUCE-2841:
--

If [~clockfly] is close to a patch, that would make the scope concrete. It 
sounds like there are more than zero changes to the framework (i.e., the 
MAPREDUCE-2454 API is insufficient), but fewer than a full replacement of the 
{{Task}} code with C\+\+. Would it be difficult to produce and post a patch to 
ground the discussion?

 Task level native optimization
 --

 Key: MAPREDUCE-2841
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: task
 Environment: x86-64 Linux/Unix
Reporter: Binglin Chang
Assignee: Sean Zhong
 Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, 
 MAPREDUCE-2841.v2.patch, dualpivot-0.patch, dualpivotv20-0.patch, 
 fb-shuffle.patch


 I'm recently working on native optimization for MapTask based on JNI. 
 The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs 
 emitted by mapper, therefore sort, spill, IFile serialization can all be done 
 in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising 
 results:
 1. Sort is about 3x-10x as fast as java(only binary string compare is 
 supported)
 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware 
 CRC32C is used, things can get much faster(1G/
 3. Merge code is not completed yet, so the test use enough io.sort.mb to 
 prevent mid-spill
 This leads to a total speed up of 2x~3x for the whole MapTask, if 
 IdentityMapper(mapper does nothing) is used
 There are limitations of course, currently only Text and BytesWritable is 
 supported, and I have not think through many things right now, such as how to 
 support map side combine. I had some discussion with somebody familiar with 
 hive, it seems that these limitations won't be much problem for Hive to 
 benefit from those optimizations, at least. Advices or discussions about 
 improving compatibility are most welcome:) 
 Currently NativeMapOutputCollector has a static method called canEnable(), 
 which checks if key/value type, comparator type, combiner are all compatible, 
 then MapTask can choose to enable NativeMapOutputCollector.
 This is only a preliminary test, more work need to be done. I expect better 
 final results, and I believe similar optimization can be adopt to reduce task 
 and shuffle too. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem

2014-06-29 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047206#comment-14047206
 ] 

Chris Douglas commented on MAPREDUCE-5890:
--

The current patch still injects the IV and length into the stream, then fixes 
up the offsets. If the IV were part of the {{IFile}} format, then this would 
not be necessary. If this format were ever changed, then someone would need to 
go back and fix all this arithmetic or take its framing as a requirement for 
any intermediate data format. 

Am I missing why it's easier to wrap/unwrap streams?

 Support for encrypting Intermediate data and spills in local filesystem
 ---

 Key: MAPREDUCE-5890
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: security
Affects Versions: 2.4.0
Reporter: Alejandro Abdelnur
Assignee: Arun Suresh
  Labels: encryption
 Attachments: MAPREDUCE-5890.10.patch, MAPREDUCE-5890.11.patch, 
 MAPREDUCE-5890.12.patch, MAPREDUCE-5890.3.patch, MAPREDUCE-5890.4.patch, 
 MAPREDUCE-5890.5.patch, MAPREDUCE-5890.6.patch, MAPREDUCE-5890.7.patch, 
 MAPREDUCE-5890.8.patch, MAPREDUCE-5890.9.patch, 
 org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt, 
 syslog.tar.gz


 For some sensitive data, encryption while in flight (network) is not 
 sufficient, it is required that while at rest it should be encrypted. 
 HADOOP-10150  HDFS-6134 bring encryption at rest for data in filesystem 
 using Hadoop FileSystem API. MapReduce intermediate data and spills should 
 also be encrypted while at rest.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem

2014-06-25 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14044144#comment-14044144
 ] 

Chris Douglas commented on MAPREDUCE-5890:
--

bq. I am trying to trade off that complexity in software with an admin 
prerequisite to install one or few disks/partitions that selective users can 
chose to use via their job-configuration.

This would work also, but (Alejandro/Arun, correct me if this is mistaken) 
encrypted intermediate data is probably motivated by compliance regimes that 
require it. An audit would need to verify that every job used the encrypted 
local dirs, that those mounts were configured to encrypt when the job ran, etc. 
One would also need to do capacity planning for encrypted vs unencrypted space 
across nodes, possibly even federating jobs. It's workable, but kind of ad hoc. 
In contrast, verifying that the MR job set this switch is straightforward and 
has no ops overhead. I have no idea whether it's common to combine these 
workloads, but this would make it easier.

It's not so inconsistent to add this to MapReduce... frameworks are currently 
responsible for intra-application security, particularly RPC. If there's a 
general mechanism then this should use it. If that layer were developed, we'd 
want MapReduce to use it instead of its own, custom encryption. Today, the 
alternative is to develop that general-purpose layer.

To reduce the overhead, this could use the plugin mechanism in MAPREDUCE-2454 
because this no longer requires any changes to the {{ShuffleHandler}} or index 
formats. I haven't looked at the latest patch, but if the {{IFile}} format 
omits the 16 byte IV for each spill, then the only overhead it's adding is for 
the checks in the config (most of which can be pulled into the buffer init and 
cached).

Has this been tested in a cluster? Would the perf hit be simple to measure?

 Support for encrypting Intermediate data and spills in local filesystem
 ---

 Key: MAPREDUCE-5890
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: security
Affects Versions: 2.4.0
Reporter: Alejandro Abdelnur
Assignee: Arun Suresh
  Labels: encryption
 Attachments: MAPREDUCE-5890.3.patch, MAPREDUCE-5890.4.patch, 
 MAPREDUCE-5890.5.patch, MAPREDUCE-5890.6.patch, MAPREDUCE-5890.7.patch, 
 MAPREDUCE-5890.8.patch, MAPREDUCE-5890.9.patch, 
 org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt, 
 syslog.tar.gz


 For some sensitive data, encryption while in flight (network) is not 
 sufficient, it is required that while at rest it should be encrypted. 
 HADOOP-10150  HDFS-6134 bring encryption at rest for data in filesystem 
 using Hadoop FileSystem API. MapReduce intermediate data and spills should 
 also be encrypted while at rest.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem

2014-06-24 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041955#comment-14041955
 ] 

Chris Douglas commented on MAPREDUCE-5890:
--

Thanks for updating the patch, Arun. Adding seeks for serving map output would 
be regrettable.

Few nits:
* unused, private static field {{counter}} added to {{Fetcher}}
* unit test should use JUnit4 annotations rather than extending {{TestCase}}
* {noformat}
+  InputStream is = input;
+  is = CryptoUtils.wrap(jobConf, iv, is, offset, compressedLength);
{noformat} is equivalently {{InputStream is = CryptoUtils.wrap(jobConf, iv, 
input, offset, compressedLength);}}
* While not terribly expensive, there are a lot of redundant lookups for the 
encrypted shuffle config parameter.
* There are many counterexamples, but running a MR job is a heavy way to test 
this.
* To be sure I understand the IV logic, it's injected in the stream as a prefix 
to the segment during a merge, but is part of the index record during a spill. 
Is that accurate? Adding a few comments calling this out would be appreciated, 
particularly since it's hard to spot in the merge.
* Has this been tested on spills with intermediate merges? With more than a 
single reduce? Looking at the patch, it looks like it creates the stream with 
the IV, it doesn't reset the IV for each segment (apologies, I haven't tried 
applying it, so I might just be misreading the context).
* Since the IV size is hard-coded in {{CryptoUtils}} to 16 bytes (and part of 
the {{IndexRecord}} format), it should probably fail if the 
{{CryptoCodec::getAlgorithmBlockSize}} returns anything else.

Much of the logic in here is internal to MapReduce, so it would be unfair to 
ask that this create better abstractions than what exists, but the IV handling 
is pretty ad hoc. Other improvements under consideration- particularly native 
implementations and other frameworks building on the {{ShuffleHandler}}- may 
rely on this code, as well as older versions of MapReduce that will fail 
without deploying two versions of the ShuffleHandler.

To make it backwards compatible, the IV can be part of each {{IFile}} segment 
(requiring no changes to {{ShuffleHandler}} or the 
{{SpillRecord}}/{{IndexRecord}} format), or the IVs can be added to the end of 
the {{SpillRecord}}. In the latter case, the {{Fetcher}} will need to request 
that the alternate interpretation by including a header; old versions will get 
the existing interpretation of the {{SpillRecord}}.

 Support for encrypting Intermediate data and spills in local filesystem
 ---

 Key: MAPREDUCE-5890
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: security
Affects Versions: 2.4.0
Reporter: Alejandro Abdelnur
Assignee: Arun Suresh
  Labels: encryption
 Attachments: MAPREDUCE-5890.3.patch, MAPREDUCE-5890.4.patch, 
 org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt, 
 syslog.tar.gz


 For some sensitive data, encryption while in flight (network) is not 
 sufficient, it is required that while at rest it should be encrypted. 
 HADOOP-10150  HDFS-6134 bring encryption at rest for data in filesystem 
 using Hadoop FileSystem API. MapReduce intermediate data and spills should 
 also be encrypted while at rest.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem

2014-06-24 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14042451#comment-14042451
 ] 

Chris Douglas commented on MAPREDUCE-5890:
--

The repeated config lookup and unit test are not blockers, but they're places 
where the patch could be improved.

bq. The ShuffleHandler is a private class of MapReduce, if other frameworks use 
it, it is at their own risk.

Every version of the patch has broken compatibility with existing versions of 
_MapReduce_. Other frameworks may rely on functionality we don't guarantee, but 
breaking them is avoidable.

bq. Regarding adding new abstractions, I’m OK if they are small and 
non-intrusive. I just don’t want to send Arun chasing a goose a wild goose and 
when he finally does we backtrack because the changes are too pervasive in the 
core of MapReduce

Adding a new file just to pass 16 bytes to the {{ShuffleHandler}} will harm 
performance; breaking backwards compatibility is not OK, and not necessary for 
this feature. Aside from those, I've asked for some formatting fixes and that 
the code not return an IV that doesn't match the hard-coded 16-byte size. These 
are reasonable, limited requests and bug fixes, and I've suggested two possible 
implementations that would address them. These would be blockers during the 
merge, too.

 Support for encrypting Intermediate data and spills in local filesystem
 ---

 Key: MAPREDUCE-5890
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: security
Affects Versions: 2.4.0
Reporter: Alejandro Abdelnur
Assignee: Arun Suresh
  Labels: encryption
 Attachments: MAPREDUCE-5890.3.patch, MAPREDUCE-5890.4.patch, 
 org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt, 
 syslog.tar.gz


 For some sensitive data, encryption while in flight (network) is not 
 sufficient, it is required that while at rest it should be encrypted. 
 HADOOP-10150  HDFS-6134 bring encryption at rest for data in filesystem 
 using Hadoop FileSystem API. MapReduce intermediate data and spills should 
 also be encrypted while at rest.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem

2014-06-22 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040237#comment-14040237
 ] 

Chris Douglas commented on MAPREDUCE-5890:
--

bq. Given that current abstraction does not provide a clean cut to hide this 
within the IFile without a significant refactoring throughout the code, I think 
is the least evil.

It's expedient, but this code is already difficult to follow. Arun, would you 
mind making an attempt at refactoring? The current code doesn't have an 
existing abstraction for this, but writing a separate file for every spill just 
to store a few bytes of IV doesn't seem like a reasonable tradeoff in either 
performance or complexity. Adding a metadata block to the {{IFile}} segment or 
adding the IV to the spill index (to be added in the header, as in the current 
patch) would both work.

A couple nits:
* In {{OnDiskMapOutput}}, the {{disk}} field can stay final, since the only 
assignment is in the cstr
* Minor indentation/braces issue in {{MapTask}}:
{{noformat}}
+  if (CryptoUtils.isShuffleEncrypted(job))
+  CryptoUtils.deleteIVFile(rfs, filename[i]);
{{noformat}}

Minor nit: please leave old patches attached to avoid orphaning the discussion 
around them.

 Support for encrypting Intermediate data and spills in local filesystem
 ---

 Key: MAPREDUCE-5890
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: security
Affects Versions: 2.4.0
Reporter: Alejandro Abdelnur
Assignee: Arun Suresh
  Labels: encryption
 Attachments: MAPREDUCE-5890.3.patch, 
 org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt, 
 syslog.tar.gz


 For some sensitive data, encryption while in flight (network) is not 
 sufficient, it is required that while at rest it should be encrypted. 
 HADOOP-10150  HDFS-6134 bring encryption at rest for data in filesystem 
 using Hadoop FileSystem API. MapReduce intermediate data and spills should 
 also be encrypted while at rest.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5912) Task.calculateOutputSize does not handle Windows files after MAPREDUCE-5196

2014-06-12 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029625#comment-14029625
 ] 

Chris Douglas commented on MAPREDUCE-5912:
--

bq. If in the future we want to revisit the idea of map outputs going somewhere 
different than the local file system, then I think we'd need a different patch. 
I think we'd want to make sure that the map output's Path instance contains an 
explicit scheme, so that the code here doesn't need to assume local vs. default 
vs. something else.

Agreed. MAPREDUCE-5269 changed all {{Path}} instances returned from 
{{YARNOutputFiles}} to be fully qualified, but the two changes were separated.

+1 for committing the workaround until HADOOP-10663 is ready.

 Task.calculateOutputSize does not handle Windows files after MAPREDUCE-5196
 ---

 Key: MAPREDUCE-5912
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5912
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 3.0.0
Reporter: Remus Rusanu
Assignee: Remus Rusanu
 Fix For: 3.0.0

 Attachments: MAPREDUCE-5912.1.patch


 {code}
 @@ -1098,8 +1120,8 @@ private long calculateOutputSize() throws IOException {
  if (isMapTask()  conf.getNumReduceTasks()  0) {
try {
  Path mapOutput =  mapOutputFile.getOutputFile();
 -FileSystem localFS = FileSystem.getLocal(conf);
 -return localFS.getFileStatus(mapOutput).getLen();
 +FileSystem fs = mapOutput.getFileSystem(conf);
 +return fs.getFileStatus(mapOutput).getLen();
} catch (IOException e) {
  LOG.warn (Could not find output size  , e);
}
 {code}
 causes Windows local output files to be routed through HDFS:
 {code}
 2014-06-02 00:14:53,891 WARN [main] org.apache.hadoop.mapred.YarnChild: 
 Exception running child : java.lang.IllegalArgumentException: Pathname 
 /c:/Hadoop/Data/Hadoop/local/usercache/HadoopUser/appcache/application_1401693085139_0001/output/attempt_1401693085139_0001_m_00_0/file.out
  from 
 c:/Hadoop/Data/Hadoop/local/usercache/HadoopUser/appcache/application_1401693085139_0001/output/attempt_1401693085139_0001_m_00_0/file.out
  is not a valid DFS filename.
at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:187)
at 
 org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:101)
at 
 org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1024)
at 
 org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1020)
at 
 org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1020)
at org.apache.hadoop.mapred.Task.calculateOutputSize(Task.java:1124)
at org.apache.hadoop.mapred.Task.sendLastUpdate(Task.java:1102)
at org.apache.hadoop.mapred.Task.done(Task.java:1048)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5912) Task.calculateOutputSize does not handle Windows files after MAPREDUCE-5196

2014-06-04 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018246#comment-14018246
 ] 

Chris Douglas commented on MAPREDUCE-5912:
--

As you identified in HADOOP-10663, returning the default filesystem for local 
paths is not correct.

 Task.calculateOutputSize does not handle Windows files after MAPREDUCE-5196
 ---

 Key: MAPREDUCE-5912
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5912
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Remus Rusanu
Assignee: Remus Rusanu
 Fix For: 3.0.0

 Attachments: MAPREDUCE-5912.1.patch


 {code}
 @@ -1098,8 +1120,8 @@ private long calculateOutputSize() throws IOException {
  if (isMapTask()  conf.getNumReduceTasks()  0) {
try {
  Path mapOutput =  mapOutputFile.getOutputFile();
 -FileSystem localFS = FileSystem.getLocal(conf);
 -return localFS.getFileStatus(mapOutput).getLen();
 +FileSystem fs = mapOutput.getFileSystem(conf);
 +return fs.getFileStatus(mapOutput).getLen();
} catch (IOException e) {
  LOG.warn (Could not find output size  , e);
}
 {code}
 causes Windows local output files to be routed through HDFS:
 {code}
 2014-06-02 00:14:53,891 WARN [main] org.apache.hadoop.mapred.YarnChild: 
 Exception running child : java.lang.IllegalArgumentException: Pathname 
 /c:/Hadoop/Data/Hadoop/local/usercache/HadoopUser/appcache/application_1401693085139_0001/output/attempt_1401693085139_0001_m_00_0/file.out
  from 
 c:/Hadoop/Data/Hadoop/local/usercache/HadoopUser/appcache/application_1401693085139_0001/output/attempt_1401693085139_0001_m_00_0/file.out
  is not a valid DFS filename.
at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:187)
at 
 org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:101)
at 
 org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1024)
at 
 org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1020)
at 
 org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
 org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1020)
at org.apache.hadoop.mapred.Task.calculateOutputSize(Task.java:1124)
at org.apache.hadoop.mapred.Task.sendLastUpdate(Task.java:1102)
at org.apache.hadoop.mapred.Task.done(Task.java:1048)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5821) IFile merge allocates new byte array for every value

2014-05-15 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-5821:
-

   Resolution: Fixed
Fix Version/s: 2.4.1
   2.5.0
   3.0.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I committed this. Thanks, Todd

 IFile merge allocates new byte array for every value
 

 Key: MAPREDUCE-5821
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5821
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: performance, task
Affects Versions: 2.4.1
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: 3.0.0, 2.5.0, 2.4.1

 Attachments: after-patch.png, before-patch.png, mapreduce-5821.txt, 
 mapreduce-5821.txt


 I wrote a standalone benchmark of the MapOutputBuffer and found that it did a 
 lot of allocations during the merge phase. After looking at an allocation 
 profile, I found that IFile.Reader.nextRawValue() would always allocate a new 
 byte array for every value, so the allocation rate goes way up during the 
 merge phase of the mapper. I imagine this also affects the reducer input, 
 though I didn't profile that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5821) IFile merge allocates new byte array for every value

2014-04-12 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967652#comment-13967652
 ] 

Chris Douglas commented on MAPREDUCE-5821:
--

+1 This looks like the intended behavior from HADOOP-5494

Good catch

 IFile merge allocates new byte array for every value
 

 Key: MAPREDUCE-5821
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5821
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: performance, task
Affects Versions: 2.4.1
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: after-patch.png, before-patch.png, mapreduce-5821.txt, 
 mapreduce-5821.txt


 I wrote a standalone benchmark of the MapOutputBuffer and found that it did a 
 lot of allocations during the merge phase. After looking at an allocation 
 profile, I found that IFile.Reader.nextRawValue() would always allocate a new 
 byte array for every value, so the allocation rate goes way up during the 
 merge phase of the mapper. I imagine this also affects the reducer input, 
 though I didn't profile that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5821) IFile merge allocates new byte array for every value

2014-04-08 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963651#comment-13963651
 ] 

Chris Douglas commented on MAPREDUCE-5821:
--

Sure, I can take a look later this week if it can wait.

 IFile merge allocates new byte array for every value
 

 Key: MAPREDUCE-5821
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5821
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: performance, task
Affects Versions: 2.4.1
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: after-patch.png, before-patch.png, mapreduce-5821.txt, 
 mapreduce-5821.txt


 I wrote a standalone benchmark of the MapOutputBuffer and found that it did a 
 lot of allocations during the merge phase. After looking at an allocation 
 profile, I found that IFile.Reader.nextRawValue() would always allocate a new 
 byte array for every value, so the allocation rate goes way up during the 
 merge phase of the mapper. I imagine this also affects the reducer input, 
 though I didn't profile that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5717) Task pings are interpreted as task progress

2014-01-11 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868942#comment-13868942
 ] 

Chris Douglas commented on MAPREDUCE-5717:
--

+1

Thanks for catching this, Jason

 Task pings are interpreted as task progress
 ---

 Key: MAPREDUCE-5717
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5717
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Jason Lowe
Assignee: Jason Lowe
 Attachments: MAPREDUCE-5717.patch






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (MAPREDUCE-5196) CheckpointAMPreemptionPolicy implements preemption in MR AM via checkpointing

2013-12-28 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-5196:
-

   Resolution: Fixed
Fix Version/s: 3.0.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I committed this. Thanks, Carlo

 CheckpointAMPreemptionPolicy implements preemption in MR AM via checkpointing 
 --

 Key: MAPREDUCE-5196
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5196
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mr-am, mrv2
Reporter: Carlo Curino
Assignee: Carlo Curino
 Fix For: 3.0.0

 Attachments: MAPREDUCE-5196.1.patch, MAPREDUCE-5196.2.patch, 
 MAPREDUCE-5196.3.patch, MAPREDUCE-5196.patch, MAPREDUCE-5196.patch


 This JIRA tracks a checkpoint-based AM preemption policy. The policy handles 
 propagation of the preemption requests received from the RM to the 
 appropriate tasks, and bookeeping of checkpoints. Actual checkpointing of the 
 task state is handled in upcoming JIRAs.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (MAPREDUCE-5196) CheckpointAMPreemptionPolicy implements preemption in MR AM via checkpointing

2013-12-21 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-5196:
-

Status: Open  (was: Patch Available)

 CheckpointAMPreemptionPolicy implements preemption in MR AM via checkpointing 
 --

 Key: MAPREDUCE-5196
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5196
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mr-am, mrv2
Reporter: Carlo Curino
Assignee: Carlo Curino
 Attachments: MAPREDUCE-5196.1.patch, MAPREDUCE-5196.2.patch, 
 MAPREDUCE-5196.patch, MAPREDUCE-5196.patch


 This JIRA tracks a checkpoint-based AM preemption policy. The policy handles 
 propagation of the preemption requests received from the RM to the 
 appropriate tasks, and bookeeping of checkpoints. Actual checkpointing of the 
 task state is handled in upcoming JIRAs.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (MAPREDUCE-5196) CheckpointAMPreemptionPolicy implements preemption in MR AM via checkpointing

2013-12-21 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-5196:
-

Attachment: MAPREDUCE-5196.3.patch

 CheckpointAMPreemptionPolicy implements preemption in MR AM via checkpointing 
 --

 Key: MAPREDUCE-5196
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5196
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mr-am, mrv2
Reporter: Carlo Curino
Assignee: Carlo Curino
 Attachments: MAPREDUCE-5196.1.patch, MAPREDUCE-5196.2.patch, 
 MAPREDUCE-5196.3.patch, MAPREDUCE-5196.patch, MAPREDUCE-5196.patch


 This JIRA tracks a checkpoint-based AM preemption policy. The policy handles 
 propagation of the preemption requests received from the RM to the 
 appropriate tasks, and bookeeping of checkpoints. Actual checkpointing of the 
 task state is handled in upcoming JIRAs.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (MAPREDUCE-5196) CheckpointAMPreemptionPolicy implements preemption in MR AM via checkpointing

2013-12-21 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-5196:
-

Status: Patch Available  (was: Open)

 CheckpointAMPreemptionPolicy implements preemption in MR AM via checkpointing 
 --

 Key: MAPREDUCE-5196
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5196
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mr-am, mrv2
Reporter: Carlo Curino
Assignee: Carlo Curino
 Attachments: MAPREDUCE-5196.1.patch, MAPREDUCE-5196.2.patch, 
 MAPREDUCE-5196.3.patch, MAPREDUCE-5196.patch, MAPREDUCE-5196.patch


 This JIRA tracks a checkpoint-based AM preemption policy. The policy handles 
 propagation of the preemption requests received from the RM to the 
 appropriate tasks, and bookeeping of checkpoints. Actual checkpointing of the 
 task state is handled in upcoming JIRAs.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (MAPREDUCE-5196) CheckpointAMPreemptionPolicy implements preemption in MR AM via checkpointing

2013-12-21 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-5196:
-

Status: Patch Available  (was: Open)

 CheckpointAMPreemptionPolicy implements preemption in MR AM via checkpointing 
 --

 Key: MAPREDUCE-5196
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5196
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mr-am, mrv2
Reporter: Carlo Curino
Assignee: Carlo Curino
 Attachments: MAPREDUCE-5196.1.patch, MAPREDUCE-5196.2.patch, 
 MAPREDUCE-5196.3.patch, MAPREDUCE-5196.patch, MAPREDUCE-5196.patch


 This JIRA tracks a checkpoint-based AM preemption policy. The policy handles 
 propagation of the preemption requests received from the RM to the 
 appropriate tasks, and bookeeping of checkpoints. Actual checkpointing of the 
 task state is handled in upcoming JIRAs.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (MAPREDUCE-5196) CheckpointAMPreemptionPolicy implements preemption in MR AM via checkpointing

2013-12-21 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-5196:
-

Status: Open  (was: Patch Available)

 CheckpointAMPreemptionPolicy implements preemption in MR AM via checkpointing 
 --

 Key: MAPREDUCE-5196
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5196
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mr-am, mrv2
Reporter: Carlo Curino
Assignee: Carlo Curino
 Attachments: MAPREDUCE-5196.1.patch, MAPREDUCE-5196.2.patch, 
 MAPREDUCE-5196.3.patch, MAPREDUCE-5196.patch, MAPREDUCE-5196.patch


 This JIRA tracks a checkpoint-based AM preemption policy. The policy handles 
 propagation of the preemption requests received from the RM to the 
 appropriate tasks, and bookeeping of checkpoints. Actual checkpointing of the 
 task state is handled in upcoming JIRAs.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (MAPREDUCE-5196) CheckpointAMPreemptionPolicy implements preemption in MR AM via checkpointing

2013-12-21 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13855069#comment-13855069
 ] 

Chris Douglas commented on MAPREDUCE-5196:
--

Failed test is due to YARN-1463

 CheckpointAMPreemptionPolicy implements preemption in MR AM via checkpointing 
 --

 Key: MAPREDUCE-5196
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5196
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mr-am, mrv2
Reporter: Carlo Curino
Assignee: Carlo Curino
 Attachments: MAPREDUCE-5196.1.patch, MAPREDUCE-5196.2.patch, 
 MAPREDUCE-5196.3.patch, MAPREDUCE-5196.patch, MAPREDUCE-5196.patch


 This JIRA tracks a checkpoint-based AM preemption policy. The policy handles 
 propagation of the preemption requests received from the RM to the 
 appropriate tasks, and bookeeping of checkpoints. Actual checkpointing of the 
 task state is handled in upcoming JIRAs.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (MAPREDUCE-5189) Basic AM changes to support preemption requests (per YARN-45)

2013-12-17 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-5189:
-

Status: Patch Available  (was: Open)

 Basic AM changes to support preemption requests (per YARN-45)
 -

 Key: MAPREDUCE-5189
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5189
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mr-am, mrv2
Reporter: Carlo Curino
Assignee: Carlo Curino
 Attachments: MAPREDUCE-5189.1.patch, MAPREDUCE-5189.2.patch, 
 MAPREDUCE-5189.3.patch, MAPREDUCE-5189.4.patch, MAPREDUCE-5189.patch, 
 MAPREDUCE-5189.patch


 This JIRA tracks the minimum amount of changes necessary in the mapreduce AM 
 to receive preemption requests (per YARN-45) and invoke a local policy that 
 manages preemption. (advanced policies and mechanisms will be tracked 
 separately)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Updated] (MAPREDUCE-5189) Basic AM changes to support preemption requests (per YARN-45)

2013-12-17 Thread Chris Douglas (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Douglas updated MAPREDUCE-5189:
-

   Resolution: Fixed
Fix Version/s: 3.0.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

 Basic AM changes to support preemption requests (per YARN-45)
 -

 Key: MAPREDUCE-5189
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5189
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mr-am, mrv2
Reporter: Carlo Curino
Assignee: Carlo Curino
 Fix For: 3.0.0

 Attachments: MAPREDUCE-5189.1.patch, MAPREDUCE-5189.2.patch, 
 MAPREDUCE-5189.3.patch, MAPREDUCE-5189.4.patch, MAPREDUCE-5189.patch, 
 MAPREDUCE-5189.patch


 This JIRA tracks the minimum amount of changes necessary in the mapreduce AM 
 to receive preemption requests (per YARN-45) and invoke a local policy that 
 manages preemption. (advanced policies and mechanisms will be tracked 
 separately)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


  1   2   3   4   5   6   7   8   >