[jira] [Commented] (MAPREDUCE-6278) Multithreaded maven build breaks in hadoop-mapreduce-client-core
[ https://issues.apache.org/jira/browse/MAPREDUCE-6278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361865#comment-16361865 ] Chris Douglas commented on MAPREDUCE-6278: -- bq. the patch just works Well... the patch may work. Without a principled reason to believe it prevents the race, we know only that some executions didn't lose it. Moreover, if some accident in the build makes a dependency on {{hadoop-yarn-applications-distributedshell}} sufficient today, then future changes may accidentally break it. bq. I think ideally we need to put every leaf modules as dependencies of the root submodule This we could explain, at least. I'm not sure if it's necessary or if a better pattern exists. Please add a comment to the pom to explain why the dependencies are listed explicitly. > Multithreaded maven build breaks in hadoop-mapreduce-client-core > > > Key: MAPREDUCE-6278 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6278 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.9.0 > Environment: Linux (Fedora 21) >Reporter: Ewan Higgs >Assignee: Duo Xu >Priority: Major > Attachments: MAPREDUCE-6278.01.patch, MAPREDUCE-6278.02.patch > > > [As reported on the mailing > list|http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.user/52231]. > The following breaks: > {{mvn -e package -DskipTests -Dmaven.javadoc.skip -Dtar -Pdist,native -T5}} > ... > {code} > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-assembly-plugin:2.4:single (package-mapreduce) > on project hadoop-mapreduce: Failed to create assembly: Artifact: > org.apache.hadoop:hadoop-mapreduce-client-core:jar:3.0.0-SNAPSHOT (included > by module) does not have an artifact with a file. Please ensure the package > phase is run before the assembly is generated. -> [Help 1] > org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute > goal org.apache.maven.plugins:maven-assembly-plugin:2.4:single > (package-mapreduce) on project hadoop-mapreduce: Failed to create assembly: > Artifact: org.apache.hadoop:hadoop-mapreduce-client-core:jar:3.0.0-SNAPSHOT > (included by module) does not have an artifact with a file. Please ensure the > package phase is run before the assembly is generated. > at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:216) > at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153) > at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145) > at > org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116) > at > org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call(MultiThreadedBuilder.java:188) > at > org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call(MultiThreadedBuilder.java:184) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.maven.plugin.MojoExecutionException: Failed to create > assembly: Artifact: > org.apache.hadoop:hadoop-mapreduce-client-core:jar:3.0.0-SNAPSHOT (included > by module) does not have an artifact with a file. Please ensure the package > phase is run before the assembly is generated. > at > org.apache.maven.plugin.assembly.mojos.AbstractAssemblyMojo.execute(AbstractAssemblyMojo.java:495) > at > org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:132) > at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208) > ... 11 more > Caused by: org.apache.maven.plugin.assembly.archive.ArchiveCreationException: > Artifact: org.apache.hadoop:hadoop-mapreduce-client-core:jar:3.0.0-SNAPSHOT > (included by module) does not have an artifact with a file. Please ensure the > package phase is run before the assembly is generated. > at > org.apache.maven.plugin.assembly.archive.phase.ModuleSetAssemblyPhase.addModuleArtifact(ModuleSetAssemblyPhase.java:318) > at > org.apache.maven.plugin.assembly.archive.phase.ModuleSetAssemblyPhase.addModuleBinaries(ModuleSetAssemblyPhase.java:228) > at > org.apache.maven.plugin.assembly.archive.phase.ModuleSetAssemblyPhase.execute(ModuleSetAssemblyPhase.java:111) > at >
[jira] [Commented] (MAPREDUCE-6278) Multithreaded maven build breaks in hadoop-mapreduce-client-core
[ https://issues.apache.org/jira/browse/MAPREDUCE-6278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359181#comment-16359181 ] Chris Douglas commented on MAPREDUCE-6278: -- I reproduced the problem on branch-2 and in trunk. The patch works as intended on branch-2, but in trunk it caused an error because the {{hadoop-mapreduce-client-nativetask}} and {{hadoop-mapreduce-client-uploader}} modules weren't included in the pom. Updated patch to include these. After reverting the new dependency in {{yarn-project}} on {{hadoop-yarn-applications-distributedshell}}, I couldn't reproduce build errors on the trunk version. Is there a reason this particular application requires special handling among {{hadoop-yarn-applications}}? > Multithreaded maven build breaks in hadoop-mapreduce-client-core > > > Key: MAPREDUCE-6278 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6278 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.9.0 > Environment: Linux (Fedora 21) >Reporter: Ewan Higgs >Assignee: Duo Xu >Priority: Major > Attachments: MAPREDUCE-6278.01.patch, MAPREDUCE-6278.02.patch > > > [As reported on the mailing > list|http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.user/52231]. > The following breaks: > {{mvn -e package -DskipTests -Dmaven.javadoc.skip -Dtar -Pdist,native -T5}} > ... > {code} > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-assembly-plugin:2.4:single (package-mapreduce) > on project hadoop-mapreduce: Failed to create assembly: Artifact: > org.apache.hadoop:hadoop-mapreduce-client-core:jar:3.0.0-SNAPSHOT (included > by module) does not have an artifact with a file. Please ensure the package > phase is run before the assembly is generated. -> [Help 1] > org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute > goal org.apache.maven.plugins:maven-assembly-plugin:2.4:single > (package-mapreduce) on project hadoop-mapreduce: Failed to create assembly: > Artifact: org.apache.hadoop:hadoop-mapreduce-client-core:jar:3.0.0-SNAPSHOT > (included by module) does not have an artifact with a file. Please ensure the > package phase is run before the assembly is generated. > at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:216) > at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153) > at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145) > at > org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116) > at > org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call(MultiThreadedBuilder.java:188) > at > org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call(MultiThreadedBuilder.java:184) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.maven.plugin.MojoExecutionException: Failed to create > assembly: Artifact: > org.apache.hadoop:hadoop-mapreduce-client-core:jar:3.0.0-SNAPSHOT (included > by module) does not have an artifact with a file. Please ensure the package > phase is run before the assembly is generated. > at > org.apache.maven.plugin.assembly.mojos.AbstractAssemblyMojo.execute(AbstractAssemblyMojo.java:495) > at > org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:132) > at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208) > ... 11 more > Caused by: org.apache.maven.plugin.assembly.archive.ArchiveCreationException: > Artifact: org.apache.hadoop:hadoop-mapreduce-client-core:jar:3.0.0-SNAPSHOT > (included by module) does not have an artifact with a file. Please ensure the > package phase is run before the assembly is generated. > at > org.apache.maven.plugin.assembly.archive.phase.ModuleSetAssemblyPhase.addModuleArtifact(ModuleSetAssemblyPhase.java:318) > at > org.apache.maven.plugin.assembly.archive.phase.ModuleSetAssemblyPhase.addModuleBinaries(ModuleSetAssemblyPhase.java:228) > at > org.apache.maven.plugin.assembly.archive.phase.ModuleSetAssemblyPhase.execute(ModuleSetAssemblyPhase.java:111) > at >
[jira] [Updated] (MAPREDUCE-6278) Multithreaded maven build breaks in hadoop-mapreduce-client-core
[ https://issues.apache.org/jira/browse/MAPREDUCE-6278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-6278: - Attachment: MAPREDUCE-6278.02.patch > Multithreaded maven build breaks in hadoop-mapreduce-client-core > > > Key: MAPREDUCE-6278 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6278 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.9.0 > Environment: Linux (Fedora 21) >Reporter: Ewan Higgs >Assignee: Duo Xu >Priority: Major > Attachments: MAPREDUCE-6278.01.patch, MAPREDUCE-6278.02.patch > > > [As reported on the mailing > list|http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.user/52231]. > The following breaks: > {{mvn -e package -DskipTests -Dmaven.javadoc.skip -Dtar -Pdist,native -T5}} > ... > {code} > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-assembly-plugin:2.4:single (package-mapreduce) > on project hadoop-mapreduce: Failed to create assembly: Artifact: > org.apache.hadoop:hadoop-mapreduce-client-core:jar:3.0.0-SNAPSHOT (included > by module) does not have an artifact with a file. Please ensure the package > phase is run before the assembly is generated. -> [Help 1] > org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute > goal org.apache.maven.plugins:maven-assembly-plugin:2.4:single > (package-mapreduce) on project hadoop-mapreduce: Failed to create assembly: > Artifact: org.apache.hadoop:hadoop-mapreduce-client-core:jar:3.0.0-SNAPSHOT > (included by module) does not have an artifact with a file. Please ensure the > package phase is run before the assembly is generated. > at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:216) > at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153) > at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145) > at > org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116) > at > org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call(MultiThreadedBuilder.java:188) > at > org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call(MultiThreadedBuilder.java:184) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.maven.plugin.MojoExecutionException: Failed to create > assembly: Artifact: > org.apache.hadoop:hadoop-mapreduce-client-core:jar:3.0.0-SNAPSHOT (included > by module) does not have an artifact with a file. Please ensure the package > phase is run before the assembly is generated. > at > org.apache.maven.plugin.assembly.mojos.AbstractAssemblyMojo.execute(AbstractAssemblyMojo.java:495) > at > org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:132) > at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208) > ... 11 more > Caused by: org.apache.maven.plugin.assembly.archive.ArchiveCreationException: > Artifact: org.apache.hadoop:hadoop-mapreduce-client-core:jar:3.0.0-SNAPSHOT > (included by module) does not have an artifact with a file. Please ensure the > package phase is run before the assembly is generated. > at > org.apache.maven.plugin.assembly.archive.phase.ModuleSetAssemblyPhase.addModuleArtifact(ModuleSetAssemblyPhase.java:318) > at > org.apache.maven.plugin.assembly.archive.phase.ModuleSetAssemblyPhase.addModuleBinaries(ModuleSetAssemblyPhase.java:228) > at > org.apache.maven.plugin.assembly.archive.phase.ModuleSetAssemblyPhase.execute(ModuleSetAssemblyPhase.java:111) > at > org.apache.maven.plugin.assembly.archive.DefaultAssemblyArchiver.createArchive(DefaultAssemblyArchiver.java:183) > at > org.apache.maven.plugin.assembly.mojos.AbstractAssemblyMojo.execute(AbstractAssemblyMojo.java:436) > ... 13 more > {code} > Dmitry Siminov appears to be building on Windows. I'm using Linux. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6278) Multithreaded maven build breaks in hadoop-mapreduce-client-core
[ https://issues.apache.org/jira/browse/MAPREDUCE-6278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354875#comment-16354875 ] Chris Douglas commented on MAPREDUCE-6278: -- Thanks for the patch. Can you explain how it resolves the issue? > Multithreaded maven build breaks in hadoop-mapreduce-client-core > > > Key: MAPREDUCE-6278 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6278 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.9.0 > Environment: Linux (Fedora 21) >Reporter: Ewan Higgs >Assignee: Duo Xu >Priority: Major > Attachments: MAPREDUCE-6278.01.patch > > > [As reported on the mailing > list|http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.user/52231]. > The following breaks: > {{mvn -e package -DskipTests -Dmaven.javadoc.skip -Dtar -Pdist,native -T5}} > ... > {code} > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-assembly-plugin:2.4:single (package-mapreduce) > on project hadoop-mapreduce: Failed to create assembly: Artifact: > org.apache.hadoop:hadoop-mapreduce-client-core:jar:3.0.0-SNAPSHOT (included > by module) does not have an artifact with a file. Please ensure the package > phase is run before the assembly is generated. -> [Help 1] > org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute > goal org.apache.maven.plugins:maven-assembly-plugin:2.4:single > (package-mapreduce) on project hadoop-mapreduce: Failed to create assembly: > Artifact: org.apache.hadoop:hadoop-mapreduce-client-core:jar:3.0.0-SNAPSHOT > (included by module) does not have an artifact with a file. Please ensure the > package phase is run before the assembly is generated. > at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:216) > at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153) > at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145) > at > org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116) > at > org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call(MultiThreadedBuilder.java:188) > at > org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call(MultiThreadedBuilder.java:184) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.maven.plugin.MojoExecutionException: Failed to create > assembly: Artifact: > org.apache.hadoop:hadoop-mapreduce-client-core:jar:3.0.0-SNAPSHOT (included > by module) does not have an artifact with a file. Please ensure the package > phase is run before the assembly is generated. > at > org.apache.maven.plugin.assembly.mojos.AbstractAssemblyMojo.execute(AbstractAssemblyMojo.java:495) > at > org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:132) > at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208) > ... 11 more > Caused by: org.apache.maven.plugin.assembly.archive.ArchiveCreationException: > Artifact: org.apache.hadoop:hadoop-mapreduce-client-core:jar:3.0.0-SNAPSHOT > (included by module) does not have an artifact with a file. Please ensure the > package phase is run before the assembly is generated. > at > org.apache.maven.plugin.assembly.archive.phase.ModuleSetAssemblyPhase.addModuleArtifact(ModuleSetAssemblyPhase.java:318) > at > org.apache.maven.plugin.assembly.archive.phase.ModuleSetAssemblyPhase.addModuleBinaries(ModuleSetAssemblyPhase.java:228) > at > org.apache.maven.plugin.assembly.archive.phase.ModuleSetAssemblyPhase.execute(ModuleSetAssemblyPhase.java:111) > at > org.apache.maven.plugin.assembly.archive.DefaultAssemblyArchiver.createArchive(DefaultAssemblyArchiver.java:183) > at > org.apache.maven.plugin.assembly.mojos.AbstractAssemblyMojo.execute(AbstractAssemblyMojo.java:436) > ... 13 more > {code} > Dmitry Siminov appears to be building on Windows. I'm using Linux. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6278) Multithreaded maven build breaks in hadoop-mapreduce-client-core
[ https://issues.apache.org/jira/browse/MAPREDUCE-6278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353374#comment-16353374 ] Chris Douglas commented on MAPREDUCE-6278: -- I added you as a contributor on the MAPREDUCE project. You should be able to upload a patch (More > Attach Files), submit it to the CI infra (Submit Patch), and assign this JIRA to yourself. > Multithreaded maven build breaks in hadoop-mapreduce-client-core > > > Key: MAPREDUCE-6278 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6278 > Project: Hadoop Map/Reduce > Issue Type: Bug >Affects Versions: 2.4.0, 2.5.0, 3.0.0-alpha1 > Environment: Linux (Fedora 21) >Reporter: Ewan Higgs >Priority: Major > > [As reported on the mailing > list|http://comments.gmane.org/gmane.comp.jakarta.lucene.hadoop.user/52231]. > The following breaks: > {{mvn -e package -DskipTests -Dmaven.javadoc.skip -Dtar -Pdist,native -T5}} > ... > {code} > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-assembly-plugin:2.4:single (package-mapreduce) > on project hadoop-mapreduce: Failed to create assembly: Artifact: > org.apache.hadoop:hadoop-mapreduce-client-core:jar:3.0.0-SNAPSHOT (included > by module) does not have an artifact with a file. Please ensure the package > phase is run before the assembly is generated. -> [Help 1] > org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute > goal org.apache.maven.plugins:maven-assembly-plugin:2.4:single > (package-mapreduce) on project hadoop-mapreduce: Failed to create assembly: > Artifact: org.apache.hadoop:hadoop-mapreduce-client-core:jar:3.0.0-SNAPSHOT > (included by module) does not have an artifact with a file. Please ensure the > package phase is run before the assembly is generated. > at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:216) > at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153) > at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145) > at > org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116) > at > org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call(MultiThreadedBuilder.java:188) > at > org.apache.maven.lifecycle.internal.builder.multithreaded.MultiThreadedBuilder$1.call(MultiThreadedBuilder.java:184) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.maven.plugin.MojoExecutionException: Failed to create > assembly: Artifact: > org.apache.hadoop:hadoop-mapreduce-client-core:jar:3.0.0-SNAPSHOT (included > by module) does not have an artifact with a file. Please ensure the package > phase is run before the assembly is generated. > at > org.apache.maven.plugin.assembly.mojos.AbstractAssemblyMojo.execute(AbstractAssemblyMojo.java:495) > at > org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:132) > at > org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208) > ... 11 more > Caused by: org.apache.maven.plugin.assembly.archive.ArchiveCreationException: > Artifact: org.apache.hadoop:hadoop-mapreduce-client-core:jar:3.0.0-SNAPSHOT > (included by module) does not have an artifact with a file. Please ensure the > package phase is run before the assembly is generated. > at > org.apache.maven.plugin.assembly.archive.phase.ModuleSetAssemblyPhase.addModuleArtifact(ModuleSetAssemblyPhase.java:318) > at > org.apache.maven.plugin.assembly.archive.phase.ModuleSetAssemblyPhase.addModuleBinaries(ModuleSetAssemblyPhase.java:228) > at > org.apache.maven.plugin.assembly.archive.phase.ModuleSetAssemblyPhase.execute(ModuleSetAssemblyPhase.java:111) > at > org.apache.maven.plugin.assembly.archive.DefaultAssemblyArchiver.createArchive(DefaultAssemblyArchiver.java:183) > at > org.apache.maven.plugin.assembly.mojos.AbstractAssemblyMojo.execute(AbstractAssemblyMojo.java:436) > ... 13 more > {code} > Dmitry Siminov appears to be building on Windows. I'm using Linux. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For
[jira] [Updated] (MAPREDUCE-7016) Avoid making separate RPC calls for FileStatus and block locations in FileInputFormat
[ https://issues.apache.org/jira/browse/MAPREDUCE-7016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-7016: - Description: {{FileInputFormat::getSplits}} uses {{FileSystem::globStatus}} to determine its inputs. When the glob returns directories, each is traversed and {{LocatedFileStatus}} instances are returned with the block locations. However, when the glob returns files, this is a {{FileStatus}} that requires a second RPC to obtain its locations. (was: {{FileInputFormat::getSplits}} uses {{FileSystem::globStatus}} to determine its inputs. When the glob returns directories, each is traversed and {{LocatedFileStatus}} instances are returned with the block locations. However, when the glob returns files, each requires a second RPC to obtain its locations.) > Avoid making separate RPC calls for FileStatus and block locations in > FileInputFormat > - > > Key: MAPREDUCE-7016 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7016 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Chris Douglas > > {{FileInputFormat::getSplits}} uses {{FileSystem::globStatus}} to determine > its inputs. When the glob returns directories, each is traversed and > {{LocatedFileStatus}} instances are returned with the block locations. > However, when the glob returns files, this is a {{FileStatus}} that requires > a second RPC to obtain its locations. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-7016) Avoid making separate RPC calls for FileStatus and block locations in FileInputFormat
Chris Douglas created MAPREDUCE-7016: Summary: Avoid making separate RPC calls for FileStatus and block locations in FileInputFormat Key: MAPREDUCE-7016 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7016 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Chris Douglas {{FileInputFormat::getSplits}} uses {{FileSystem::globStatus}} to determine its inputs. When the glob returns directories, each is traversed and {{LocatedFileStatus}} instances are returned with the block locations. However, when the glob returns files, each requires a second RPC to obtain its locations. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (MAPREDUCE-7013) Tests of internal logic should not use the local FS as scratch space
[ https://issues.apache.org/jira/browse/MAPREDUCE-7013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-7013: - Comment: was deleted (was: MAPREDUCE-7011 is an example) > Tests of internal logic should not use the local FS as scratch space > > > Key: MAPREDUCE-7013 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7013 > Project: Hadoop Map/Reduce > Issue Type: Test >Reporter: Chris Douglas > > MapReduce often manipulates files/permissions to ensure splits, dependencies, > and other user data are consistently managed. Unit tests of these internal > methods sometimes set up temporary hierarchies in a scratch directory on the > local FS to exercise these modules. However, dev environment quirks (e.g., > umask) can cause these tests to fail spuriously. Instead, this logic should > be validated by mocking the filesystem. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-7013) Tests of internal logic should not use the local FS as scratch space
Chris Douglas created MAPREDUCE-7013: Summary: Tests of internal logic should not use the local FS as scratch space Key: MAPREDUCE-7013 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7013 Project: Hadoop Map/Reduce Issue Type: Test Reporter: Chris Douglas MapReduce often manipulates files/permissions to ensure splits, dependencies, and other user data are consistently managed. Unit tests of these internal methods sometimes set up temporary hierarchies in a scratch directory on the local FS to exercise these modules. However, dev environment quirks (e.g., umask) can cause these tests to fail spuriously. Instead, this logic should be validated by mocking the filesystem. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-7013) Tests of internal logic should not use the local FS as scratch space
[ https://issues.apache.org/jira/browse/MAPREDUCE-7013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16262899#comment-16262899 ] Chris Douglas commented on MAPREDUCE-7013: -- MAPREDUCE-7011 is an example > Tests of internal logic should not use the local FS as scratch space > > > Key: MAPREDUCE-7013 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7013 > Project: Hadoop Map/Reduce > Issue Type: Test >Reporter: Chris Douglas > > MapReduce often manipulates files/permissions to ensure splits, dependencies, > and other user data are consistently managed. Unit tests of these internal > methods sometimes set up temporary hierarchies in a scratch directory on the > local FS to exercise these modules. However, dev environment quirks (e.g., > umask) can cause these tests to fail spuriously. Instead, this logic should > be validated by mocking the filesystem. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-7011) TestClientDistributedCacheManager::testDetermineCacheVisibilities assumes all parent dirs set other exec
[ https://issues.apache.org/jira/browse/MAPREDUCE-7011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-7011: - Resolution: Fixed Assignee: Chris Douglas Hadoop Flags: Reviewed Fix Version/s: 3.0.1 Status: Resolved (was: Patch Available) I committed this. Thanks for the review, [~subru] > TestClientDistributedCacheManager::testDetermineCacheVisibilities assumes all > parent dirs set other exec > > > Key: MAPREDUCE-7011 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7011 > Project: Hadoop Map/Reduce > Issue Type: Test >Reporter: Chris Douglas >Assignee: Chris Douglas >Priority: Trivial > Fix For: 3.0.1 > > Attachments: MAPREDUCE-7011.000.patch > > > {{TestClientDistributedCacheManager}} sets up some local directories to check > the visibility set for dependencies, given their filesystem permissions. > However, if it is run in an environment where the scratch directory is not > itself PUBLIC ({{ClientDistributedCacheManager::isPublic}}), then it will > fail. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-7011) TestClientDistributedCacheManager::testDetermineCacheVisibilities assumes all parent dirs set other exec
[ https://issues.apache.org/jira/browse/MAPREDUCE-7011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-7011: - Priority: Trivial (was: Minor) > TestClientDistributedCacheManager::testDetermineCacheVisibilities assumes all > parent dirs set other exec > > > Key: MAPREDUCE-7011 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7011 > Project: Hadoop Map/Reduce > Issue Type: Test >Reporter: Chris Douglas >Priority: Trivial > Attachments: MAPREDUCE-7011.000.patch > > > {{TestClientDistributedCacheManager}} sets up some local directories to check > the visibility set for dependencies, given their filesystem permissions. > However, if it is run in an environment where the scratch directory is not > itself PUBLIC ({{ClientDistributedCacheManager::isPublic}}), then it will > fail. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-7011) TestClientDistributedCacheManager::testDetermineCacheVisibilities assumes all parent dirs set other exec
[ https://issues.apache.org/jira/browse/MAPREDUCE-7011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261966#comment-16261966 ] Chris Douglas commented on MAPREDUCE-7011: -- bq. Does it make sense to open another jira to refactor TestClientDistributedCacheManager to not use the local FS? It wouldn't hurt. I'll open something generic that includes this as an example. > TestClientDistributedCacheManager::testDetermineCacheVisibilities assumes all > parent dirs set other exec > > > Key: MAPREDUCE-7011 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7011 > Project: Hadoop Map/Reduce > Issue Type: Test >Reporter: Chris Douglas >Priority: Minor > Attachments: MAPREDUCE-7011.000.patch > > > {{TestClientDistributedCacheManager}} sets up some local directories to check > the visibility set for dependencies, given their filesystem permissions. > However, if it is run in an environment where the scratch directory is not > itself PUBLIC ({{ClientDistributedCacheManager::isPublic}}), then it will > fail. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-7011) TestClientDistributedCacheManager::testDetermineCacheVisibilities assumes all parent dirs set other exec
[ https://issues.apache.org/jira/browse/MAPREDUCE-7011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-7011: - Status: Patch Available (was: Open) > TestClientDistributedCacheManager::testDetermineCacheVisibilities assumes all > parent dirs set other exec > > > Key: MAPREDUCE-7011 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7011 > Project: Hadoop Map/Reduce > Issue Type: Test >Reporter: Chris Douglas >Priority: Minor > Attachments: MAPREDUCE-7011.000.patch > > > {{TestClientDistributedCacheManager}} sets up some local directories to check > the visibility set for dependencies, given their filesystem permissions. > However, if it is run in an environment where the scratch directory is not > itself PUBLIC ({{ClientDistributedCacheManager::isPublic}}), then it will > fail. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-7011) TestClientDistributedCacheManager::testDetermineCacheVisibilities assumes all parent dirs set other exec
[ https://issues.apache.org/jira/browse/MAPREDUCE-7011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-7011: - Attachment: MAPREDUCE-7011.000.patch While it would clearly be better if the test validated the visibility logic without requiring the local filesystem, that would likely require some refactoring. v000 simply skips the test. > TestClientDistributedCacheManager::testDetermineCacheVisibilities assumes all > parent dirs set other exec > > > Key: MAPREDUCE-7011 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7011 > Project: Hadoop Map/Reduce > Issue Type: Test >Reporter: Chris Douglas >Priority: Minor > Attachments: MAPREDUCE-7011.000.patch > > > {{TestClientDistributedCacheManager}} sets up some local directories to check > the visibility set for dependencies, given their filesystem permissions. > However, if it is run in an environment where the scratch directory is not > itself PUBLIC ({{ClientDistributedCacheManager::isPublic}}), then it will > fail. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Moved] (MAPREDUCE-7011) TestClientDistributedCacheManager::testDetermineCacheVisibilities assumes all parent dirs set other exec
[ https://issues.apache.org/jira/browse/MAPREDUCE-7011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas moved HDFS-12844 to MAPREDUCE-7011: - Key: MAPREDUCE-7011 (was: HDFS-12844) Project: Hadoop Map/Reduce (was: Hadoop HDFS) > TestClientDistributedCacheManager::testDetermineCacheVisibilities assumes all > parent dirs set other exec > > > Key: MAPREDUCE-7011 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-7011 > Project: Hadoop Map/Reduce > Issue Type: Test >Reporter: Chris Douglas >Priority: Minor > > {{TestClientDistributedCacheManager}} sets up some local directories to check > the visibility set for dependencies, given their filesystem permissions. > However, if it is run in an environment where the scratch directory is not > itself PUBLIC ({{ClientDistributedCacheManager::isPublic}}), then it will > fail. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6958) Shuffle audit logger should log size of shuffle transfer
[ https://issues.apache.org/jira/browse/MAPREDUCE-6958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16170837#comment-16170837 ] Chris Douglas commented on MAPREDUCE-6958: -- bq. Since 3.0 hasn't officially shipped yet, I propose to revert the 003 patch I committed to trunk and branch-3.0 and instead commit patch version 002 which preserves the job-then-reducer ordering already established in the 2.x line. Objections? Nope, that sounds reasonable. Thanks for the extra audit > Shuffle audit logger should log size of shuffle transfer > > > Key: MAPREDUCE-6958 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6958 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Minor > Fix For: 3.0.0-beta1 > > Attachments: MAPREDUCE-6958.001.patch, MAPREDUCE-6958.002.patch, > MAPREDUCE-6958.003.patch, MAPREDUCE-6958-branch-2.002.patch > > > The shuffle audit logger currently logs the job ID and reducer ID but nothing > about the size of the requested transfer. It calculates this as part of the > HTTP response headers, so it would be trivial to log the response size. This > would be very valuable for debugging network traffic storms from the shuffle > handler. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6958) Shuffle audit logger should log size of shuffle transfer
[ https://issues.apache.org/jira/browse/MAPREDUCE-6958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16170487#comment-16170487 ] Chris Douglas commented on MAPREDUCE-6958: -- Thanks for updating the patch, [~jlowe]. +1 > Shuffle audit logger should log size of shuffle transfer > > > Key: MAPREDUCE-6958 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6958 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Minor > Attachments: MAPREDUCE-6958.001.patch, MAPREDUCE-6958.002.patch, > MAPREDUCE-6958.003.patch > > > The shuffle audit logger currently logs the job ID and reducer ID but nothing > about the size of the requested transfer. It calculates this as part of the > HTTP response headers, so it would be trivial to log the response size. This > would be very valuable for debugging network traffic storms from the shuffle > handler. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6958) Shuffle audit logger should log size of shuffle transfer
[ https://issues.apache.org/jira/browse/MAPREDUCE-6958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16168577#comment-16168577 ] Chris Douglas commented on MAPREDUCE-6958: -- Sorry to ask for revs on this kind of patch, but this changes the format of the audit log in a way that might break downstream consumers. The mapIds are printed after the reducer in the revised version. Could this keep the format as-is, with the length appended? The shuffle sizes used to be available in the clienttrace log. Was that removed from the ShuffleHandler at some point? > Shuffle audit logger should log size of shuffle transfer > > > Key: MAPREDUCE-6958 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6958 > Project: Hadoop Map/Reduce > Issue Type: Improvement >Reporter: Jason Lowe >Assignee: Jason Lowe >Priority: Minor > Attachments: MAPREDUCE-6958.001.patch, MAPREDUCE-6958.002.patch > > > The shuffle audit logger currently logs the job ID and reducer ID but nothing > about the size of the requested transfer. It calculates this as part of the > HTTP response headers, so it would be trivial to log the response size. This > would be very valuable for debugging network traffic storms from the shuffle > handler. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6433) launchTime may be negative
[ https://issues.apache.org/jira/browse/MAPREDUCE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-6433: - Status: Patch Available (was: Reopened) > launchTime may be negative > -- > > Key: MAPREDUCE-6433 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6433 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver, mrv2 >Affects Versions: 2.4.1 >Reporter: Allen Wittenauer >Assignee: zhihai xu > Labels: release-blocker > Fix For: 3.0.0-alpha1, 2.8.0 > > Attachments: MAPREDUCE-6433.000.patch, MAPREDUCE-6433.001.patch, > MAPREDUCE-6433-branch-2.7.001.patch, REPRODUCE.patch > > > Under extremely rare conditions (.0017% in our sample size), launchTime in > the jhist files may be set to -1. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6433) launchTime may be negative
[ https://issues.apache.org/jira/browse/MAPREDUCE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-6433: - Target Version/s: 2.7.4 > launchTime may be negative > -- > > Key: MAPREDUCE-6433 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6433 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver, mrv2 >Affects Versions: 2.4.1 >Reporter: Allen Wittenauer >Assignee: zhihai xu > Labels: release-blocker > Fix For: 2.8.0, 3.0.0-alpha1 > > Attachments: MAPREDUCE-6433.000.patch, MAPREDUCE-6433.001.patch, > MAPREDUCE-6433-branch-2.7.001.patch, REPRODUCE.patch > > > Under extremely rare conditions (.0017% in our sample size), launchTime in > the jhist files may be set to -1. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6433) launchTime may be negative
[ https://issues.apache.org/jira/browse/MAPREDUCE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-6433: - Labels: release-blocker (was: ) > launchTime may be negative > -- > > Key: MAPREDUCE-6433 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6433 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver, mrv2 >Affects Versions: 2.4.1 >Reporter: Allen Wittenauer >Assignee: zhihai xu > Labels: release-blocker > Fix For: 2.8.0, 3.0.0-alpha1 > > Attachments: MAPREDUCE-6433.000.patch, MAPREDUCE-6433.001.patch, > MAPREDUCE-6433-branch-2.7.001.patch, REPRODUCE.patch > > > Under extremely rare conditions (.0017% in our sample size), launchTime in > the jhist files may be set to -1. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6433) launchTime may be negative
[ https://issues.apache.org/jira/browse/MAPREDUCE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-6433: - Attachment: MAPREDUCE-6433-branch-2.7.001.patch Backport to branch-2.7. This doesn't seem major, but it's easy to include in the upcoming 2.7.4 release. > launchTime may be negative > -- > > Key: MAPREDUCE-6433 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6433 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver, mrv2 >Affects Versions: 2.4.1 >Reporter: Allen Wittenauer >Assignee: zhihai xu > Fix For: 2.8.0, 3.0.0-alpha1 > > Attachments: MAPREDUCE-6433.000.patch, MAPREDUCE-6433.001.patch, > MAPREDUCE-6433-branch-2.7.001.patch, REPRODUCE.patch > > > Under extremely rare conditions (.0017% in our sample size), launchTime in > the jhist files may be set to -1. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Reopened] (MAPREDUCE-6433) launchTime may be negative
[ https://issues.apache.org/jira/browse/MAPREDUCE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas reopened MAPREDUCE-6433: -- > launchTime may be negative > -- > > Key: MAPREDUCE-6433 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6433 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver, mrv2 >Affects Versions: 2.4.1 >Reporter: Allen Wittenauer >Assignee: zhihai xu > Fix For: 2.8.0, 3.0.0-alpha1 > > Attachments: MAPREDUCE-6433.000.patch, MAPREDUCE-6433.001.patch, > MAPREDUCE-6433-branch-2.7.001.patch, REPRODUCE.patch > > > Under extremely rare conditions (.0017% in our sample size), launchTime in > the jhist files may be set to -1. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6883) AuditLogger and TestAuditLogger are dead code
[ https://issues.apache.org/jira/browse/MAPREDUCE-6883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-6883: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.0.0-alpha3 Status: Resolved (was: Patch Available) +1 I committed this. Thanks Vrushali > AuditLogger and TestAuditLogger are dead code > - > > Key: MAPREDUCE-6883 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6883 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client >Affects Versions: 2.8.0 >Reporter: Daniel Templeton >Assignee: Vrushali C >Priority: Minor > Labels: newbie > Fix For: 3.0.0-alpha3 > > Attachments: MAPREDUCE-6883.001.patch, MAPREDUCE-6883.002.patch, > MAPREDUCE-6883.003.patch > > > The {{AuditLogger}} and {{TestAuditLogger}} classes appear to be dead code. > I can't find anything that uses or references {{AuditLogger}}. No one has > touched the code 2011. I think it's safe to remove. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6883) AuditLogger and TestAuditLogger are dead code
[ https://issues.apache.org/jira/browse/MAPREDUCE-6883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-6883: - Attachment: MAPREDUCE-6883.003.patch > AuditLogger and TestAuditLogger are dead code > - > > Key: MAPREDUCE-6883 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6883 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client >Affects Versions: 2.8.0 >Reporter: Daniel Templeton >Assignee: Vrushali C >Priority: Minor > Labels: newbie > Attachments: MAPREDUCE-6883.001.patch, MAPREDUCE-6883.002.patch, > MAPREDUCE-6883.003.patch > > > The {{AuditLogger}} and {{TestAuditLogger}} classes appear to be dead code. > I can't find anything that uses or references {{AuditLogger}}. No one has > touched the code 2011. I think it's safe to remove. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6628) Potential memory leak in CryptoOutputStream
[ https://issues.apache.org/jira/browse/MAPREDUCE-6628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-6628: - Fix Version/s: 2.9.0 > Potential memory leak in CryptoOutputStream > --- > > Key: MAPREDUCE-6628 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6628 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: security >Affects Versions: 2.6.4 >Reporter: Mariappan Asokan >Assignee: Mariappan Asokan > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: MAPREDUCE-6628.001.patch, MAPREDUCE-6628.002.patch, > MAPREDUCE-6628.003.patch, MAPREDUCE-6628.004.patch, MAPREDUCE-6628.005.patch, > MAPREDUCE-6628.006.patch, MAPREDUCE-6628.007.patch, MAPREDUCE-6628.008.patch, > MAPREDUCE-6628.009.patch > > > There is a potential memory leak in {{CryptoOutputStream.java.}} It > allocates two direct byte buffers ({{inBuffer}} and {{outBuffer}}) that get > freed when {{close()}} method is called. Most of the time, {{close()}} > method is called. However, when writing to intermediate Map output file or > the spill files in {{MapTask}}, {{close()}} is never called since calling so > would close the underlying stream which is not desirable. There is a single > underlying physical stream that contains multiple logical streams one per > partition of Map output. > By default the amount of memory allocated per byte buffer is 128 KB and so > the total memory allocated is 256 KB, This may not sound much. However, if > the number of partitions (or number of reducers) is large (in the hundreds) > and/or there are spill files created in {{MapTask}}, this can grow into a few > hundred MB. > I can think of two ways to address this issue: > h2. Possible Fix - 1 > According to JDK documentation: > {quote} > The contents of direct buffers may reside outside of the normal > garbage-collected heap, and so their impact upon the memory footprint of an > application might not be obvious. It is therefore recommended that direct > buffers be allocated primarily for large, long-lived buffers that are subject > to the underlying system's native I/O operations. In general it is best to > allocate direct buffers only when they yield a measureable gain in program > performance. > {quote} > It is not clear to me whether there is any benefit of allocating direct byte > buffers in {{CryptoOutputStream.java}}. In fact, there is a slight CPU > overhead in moving data from {{outBuffer}} to a temporary byte array as per > the following code in {{CryptoOutputStream.java}}. > {code} > /* > * If underlying stream supports {@link ByteBuffer} write in future, needs > * refine here. > */ > final byte[] tmp = getTmpBuf(); > outBuffer.get(tmp, 0, len); > out.write(tmp, 0, len); > {code} > Even if the underlying stream supports direct byte buffer IO (or direct IO in > OS parlance), it is not clear whether it will yield any measurable > performance gain. > The fix would be to allocate a ByteBuffer on the heap for inBuffer and wrap a > byte array in a {{ByteBuffer}} for {{outBuffer}}. By the way, the > {{inBuffer}} and {{outBuffer}} have to be {{ByteBuffer}} as demanded by the > {{encrypt()}} method in {{Encryptor}}. > h2. Possible Fix - 2 > Assuming that we want to keep the buffers as direct byte buffers, we can > create a new constructor to {{CryptoOutputStream}} and pass a boolean flag > {{ownOutputStream}} to indicate whether the underlying stream will be owned > by {{CryptoOutputStream}}. If it is true, then calling the {{close()}} method > will close the underlying stream. Otherwise, when {{close()}} is called only > the direct byte buffers will be freed and the underlying stream will not be > closed. > The scope of changes for this fix will be somewhat wider. We need to modify > {{MapTask.java}}, {{CryptoUtils.java}}, and {{CryptoFSDataOutputStream.java}} > as well to pass the ownership flag mentioned above. > I can post a patch for either of the above. I welcome any other ideas from > developers to fix this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6628) Potential memory leak in CryptoOutputStream
[ https://issues.apache.org/jira/browse/MAPREDUCE-6628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-6628: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.0.0-alpha2 Status: Resolved (was: Patch Available) +1 I committed this. Thanks [~masokan] > Potential memory leak in CryptoOutputStream > --- > > Key: MAPREDUCE-6628 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6628 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: security >Affects Versions: 2.6.4 >Reporter: Mariappan Asokan >Assignee: Mariappan Asokan > Fix For: 3.0.0-alpha2 > > Attachments: MAPREDUCE-6628.001.patch, MAPREDUCE-6628.002.patch, > MAPREDUCE-6628.003.patch, MAPREDUCE-6628.004.patch, MAPREDUCE-6628.005.patch, > MAPREDUCE-6628.006.patch, MAPREDUCE-6628.007.patch, MAPREDUCE-6628.008.patch, > MAPREDUCE-6628.009.patch > > > There is a potential memory leak in {{CryptoOutputStream.java.}} It > allocates two direct byte buffers ({{inBuffer}} and {{outBuffer}}) that get > freed when {{close()}} method is called. Most of the time, {{close()}} > method is called. However, when writing to intermediate Map output file or > the spill files in {{MapTask}}, {{close()}} is never called since calling so > would close the underlying stream which is not desirable. There is a single > underlying physical stream that contains multiple logical streams one per > partition of Map output. > By default the amount of memory allocated per byte buffer is 128 KB and so > the total memory allocated is 256 KB, This may not sound much. However, if > the number of partitions (or number of reducers) is large (in the hundreds) > and/or there are spill files created in {{MapTask}}, this can grow into a few > hundred MB. > I can think of two ways to address this issue: > h2. Possible Fix - 1 > According to JDK documentation: > {quote} > The contents of direct buffers may reside outside of the normal > garbage-collected heap, and so their impact upon the memory footprint of an > application might not be obvious. It is therefore recommended that direct > buffers be allocated primarily for large, long-lived buffers that are subject > to the underlying system's native I/O operations. In general it is best to > allocate direct buffers only when they yield a measureable gain in program > performance. > {quote} > It is not clear to me whether there is any benefit of allocating direct byte > buffers in {{CryptoOutputStream.java}}. In fact, there is a slight CPU > overhead in moving data from {{outBuffer}} to a temporary byte array as per > the following code in {{CryptoOutputStream.java}}. > {code} > /* > * If underlying stream supports {@link ByteBuffer} write in future, needs > * refine here. > */ > final byte[] tmp = getTmpBuf(); > outBuffer.get(tmp, 0, len); > out.write(tmp, 0, len); > {code} > Even if the underlying stream supports direct byte buffer IO (or direct IO in > OS parlance), it is not clear whether it will yield any measurable > performance gain. > The fix would be to allocate a ByteBuffer on the heap for inBuffer and wrap a > byte array in a {{ByteBuffer}} for {{outBuffer}}. By the way, the > {{inBuffer}} and {{outBuffer}} have to be {{ByteBuffer}} as demanded by the > {{encrypt()}} method in {{Encryptor}}. > h2. Possible Fix - 2 > Assuming that we want to keep the buffers as direct byte buffers, we can > create a new constructor to {{CryptoOutputStream}} and pass a boolean flag > {{ownOutputStream}} to indicate whether the underlying stream will be owned > by {{CryptoOutputStream}}. If it is true, then calling the {{close()}} method > will close the underlying stream. Otherwise, when {{close()}} is called only > the direct byte buffers will be freed and the underlying stream will not be > closed. > The scope of changes for this fix will be somewhat wider. We need to modify > {{MapTask.java}}, {{CryptoUtils.java}}, and {{CryptoFSDataOutputStream.java}} > as well to pass the ownership flag mentioned above. > I can post a patch for either of the above. I welcome any other ideas from > developers to fix this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6628) Potential memory leak in CryptoOutputStream
[ https://issues.apache.org/jira/browse/MAPREDUCE-6628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15468199#comment-15468199 ] Chris Douglas commented on MAPREDUCE-6628: -- [~masokan] thank you for your patience with this. The unit test looks useful for debugging, but it doesn't actually verify the fix. As written, it's also expensive to run (starts a cluster) and relies on a platform-dependent scan of {{/proc/self/status}}, rather than using {{java.lang.management}} APIs. That said, unit testing this corner of MapReduce is not straightforward, and your posted results demonstrate both the issue and the fix. We can commit this without a MR test. Would it be possible to write a short unit test for {{CryptoOutputStream}} verifying the new {{closeOutputStream}} semantics? This should be very straightforward in Mockito, just checking that {{close}} behaves as expected when the flag is passed. It's unfortunate that we're switching behavior based on object reference equality, to check whether the stream was wrapped. As designed, I don't see a cleaner way to improve this without refactoring the crypto implementation. > Potential memory leak in CryptoOutputStream > --- > > Key: MAPREDUCE-6628 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6628 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: security >Affects Versions: 2.6.4 >Reporter: Mariappan Asokan >Assignee: Mariappan Asokan > Attachments: MAPREDUCE-6628.001.patch, MAPREDUCE-6628.002.patch, > MAPREDUCE-6628.003.patch, MAPREDUCE-6628.004.patch, MAPREDUCE-6628.005.patch, > MAPREDUCE-6628.006.patch, MAPREDUCE-6628.007.patch > > > There is a potential memory leak in {{CryptoOutputStream.java.}} It > allocates two direct byte buffers ({{inBuffer}} and {{outBuffer}}) that get > freed when {{close()}} method is called. Most of the time, {{close()}} > method is called. However, when writing to intermediate Map output file or > the spill files in {{MapTask}}, {{close()}} is never called since calling so > would close the underlying stream which is not desirable. There is a single > underlying physical stream that contains multiple logical streams one per > partition of Map output. > By default the amount of memory allocated per byte buffer is 128 KB and so > the total memory allocated is 256 KB, This may not sound much. However, if > the number of partitions (or number of reducers) is large (in the hundreds) > and/or there are spill files created in {{MapTask}}, this can grow into a few > hundred MB. > I can think of two ways to address this issue: > h2. Possible Fix - 1 > According to JDK documentation: > {quote} > The contents of direct buffers may reside outside of the normal > garbage-collected heap, and so their impact upon the memory footprint of an > application might not be obvious. It is therefore recommended that direct > buffers be allocated primarily for large, long-lived buffers that are subject > to the underlying system's native I/O operations. In general it is best to > allocate direct buffers only when they yield a measureable gain in program > performance. > {quote} > It is not clear to me whether there is any benefit of allocating direct byte > buffers in {{CryptoOutputStream.java}}. In fact, there is a slight CPU > overhead in moving data from {{outBuffer}} to a temporary byte array as per > the following code in {{CryptoOutputStream.java}}. > {code} > /* > * If underlying stream supports {@link ByteBuffer} write in future, needs > * refine here. > */ > final byte[] tmp = getTmpBuf(); > outBuffer.get(tmp, 0, len); > out.write(tmp, 0, len); > {code} > Even if the underlying stream supports direct byte buffer IO (or direct IO in > OS parlance), it is not clear whether it will yield any measurable > performance gain. > The fix would be to allocate a ByteBuffer on the heap for inBuffer and wrap a > byte array in a {{ByteBuffer}} for {{outBuffer}}. By the way, the > {{inBuffer}} and {{outBuffer}} have to be {{ByteBuffer}} as demanded by the > {{encrypt()}} method in {{Encryptor}}. > h2. Possible Fix - 2 > Assuming that we want to keep the buffers as direct byte buffers, we can > create a new constructor to {{CryptoOutputStream}} and pass a boolean flag > {{ownOutputStream}} to indicate whether the underlying stream will be owned > by {{CryptoOutputStream}}. If it is true, then calling the {{close()}} method > will close the underlying stream. Otherwise, when {{close()}} is called only > the direct byte buffers will be freed and the underlying stream will not be > closed. > The scope of changes for this fix will be somewhat wider. We need to modify > {{MapTask.java}}, {{CryptoUtils.java}}, and {{CryptoFSDataOutputStream.java}} >
[jira] [Updated] (MAPREDUCE-6767) TestSlive fails after a common change
[ https://issues.apache.org/jira/browse/MAPREDUCE-6767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-6767: - Status: Patch Available (was: Open) > TestSlive fails after a common change > - > > Key: MAPREDUCE-6767 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6767 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Kihwal Lee >Assignee: Daniel Templeton > Attachments: MAPREDUCE-6767.001.patch > > > It looks like this was broken after HADOOP-12726. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6240) Hadoop client displays confusing error message
[ https://issues.apache.org/jira/browse/MAPREDUCE-6240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15320699#comment-15320699 ] Chris Douglas commented on MAPREDUCE-6240: -- +1 lgtm bq. would creating IOException inside catch block will be better? The suppressed exceptions are the interesting part. The code is easier to read as-is (IMO), but either way is fine. > Hadoop client displays confusing error message > -- > > Key: MAPREDUCE-6240 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6240 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client >Affects Versions: 2.7.0 >Reporter: Mohammad Kamrul Islam >Assignee: Gera Shegalov > Attachments: MAPREDUCE-6240-gera.001.patch, > MAPREDUCE-6240-gera.001.patch, MAPREDUCE-6240-gera.002.patch, > MAPREDUCE-6240.003.patch, MAPREDUCE-6240.004.patch, MAPREDUCE-6240.1.patch > > > Hadoop client often throws exception with "java.io.IOException: Cannot > initialize Cluster. Please check your configuration for > mapreduce.framework.name and the correspond server addresses". > This is a misleading and generic message for any cluster initialization > problem. It takes a lot of debugging hours to identify the root cause. The > correct error message could resolve this problem quickly. > In one such instance, Oozie log showed the following exception while the > root cause was CNF that Hadoop client didn't return in the exception. > {noformat} > JA009: Cannot initialize Cluster. Please check your configuration for > mapreduce.framework.name and the correspond server addresses. > at > org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:412) > at > org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:392) > at > org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:979) > at > org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1134) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:228) > at > org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63) > at org.apache.oozie.command.XCommand.call(XCommand.java:281) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:323) > at > org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:252) > at > org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:174) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.io.IOException: Cannot initialize Cluster. Please check your > configuration for mapreduce.framework.name and the correspond server > addresses. > at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120) > at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:82) > at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:75) > at org.apache.hadoop.mapred.JobClient.init(JobClient.java:470) > at org.apache.hadoop.mapred.JobClient.(JobClient.java:449) > at > org.apache.oozie.service.HadoopAccessorService$1.run(HadoopAccessorService.java:372) > at > org.apache.oozie.service.HadoopAccessorService$1.run(HadoopAccessorService.java:370) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) > at > org.apache.oozie.service.HadoopAccessorService.createJobClient(HadoopAccessorService.java:379) > at > org.apache.oozie.action.hadoop.JavaActionExecutor.createJobClient(JavaActionExecutor.java:1185) > at > org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:927) > ... 10 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6423) MapOutput Sampler
[ https://issues.apache.org/jira/browse/MAPREDUCE-6423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-6423: - Status: Open (was: Patch Available) MapOutput Sampler - Key: MAPREDUCE-6423 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6423 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Ram Manohar Bheemana Assignee: Ram Manohar Bheemana Priority: Minor Attachments: MapOutputSampler.java Need a sampler based on the MapOutput Keys. Current InputSampler implementation has a major drawback which is input and output of a mapper should be same, generally this isn't the case. approach: 1. Create a Sampler which samples the data based on the input. 2. Run a small map reduce in uber task mode using the original job mapper and identity reducer to generate required MapOutputSample keys 3. Optionally, we can input the input file to be sample. For example inputs files A, B; we should be able to specify to use only file A for sampling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6423) MapOutput Sampler
[ https://issues.apache.org/jira/browse/MAPREDUCE-6423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707436#comment-14707436 ] Chris Douglas commented on MAPREDUCE-6423: -- Thanks for taking a look at this. That the sampler only works on input data was always a weakness for jobs requiring their output be totally ordered. Could you generate a patch? The contribution wiki is [here|http://wiki.apache.org/hadoop/HowToContribute]. It might be easier for others to use if the Mapper was integrated with the InputSampler, but a separate tool is still an improvement. MapOutput Sampler - Key: MAPREDUCE-6423 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6423 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Ram Manohar Bheemana Assignee: Ram Manohar Bheemana Priority: Minor Attachments: MapOutputSampler.java Need a sampler based on the MapOutput Keys. Current InputSampler implementation has a major drawback which is input and output of a mapper should be same, generally this isn't the case. approach: 1. Create a Sampler which samples the data based on the input. 2. Run a small map reduce in uber task mode using the original job mapper and identity reducer to generate required MapOutputSample keys 3. Optionally, we can input the input file to be sample. For example inputs files A, B; we should be able to specify to use only file A for sampling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6434) Add support for PartialFileOutputCommiter when checkpointing is an option during preemption
[ https://issues.apache.org/jira/browse/MAPREDUCE-6434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707130#comment-14707130 ] Chris Douglas commented on MAPREDUCE-6434: -- Offhand, I'd guess adding {{TaskType.REDUCE.equals(context.getTaskAttemptID().getTaskType())}} to the expression would prevent it from affecting more than reducers, but I haven't looked into it. Could you test with a map-only job, where {{context.getReducerClass()}} is undefined or not on the classpath? Add support for PartialFileOutputCommiter when checkpointing is an option during preemption --- Key: MAPREDUCE-6434 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6434 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Augusto Souza Assignee: Augusto Souza Attachments: MAPREDUCE-6434.001.patch, MAPREDUCE-6434.002.patch, MAPREDUCE-6434.003.patch, MAPREDUCE-6434.004.patch, MAPREDUCE-6434.005.patch Finish up some renaming work related to the annotation @Preemptable (it should be @Checkpointable now) and help in the splitting of patch in MAPREDUCE-5269 that is too large for being reviewed or accepted by Jenkins CI scripts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6434) Add support for PartialFileOutputCommiter when checkpointing is an option during preemption
[ https://issues.apache.org/jira/browse/MAPREDUCE-6434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707498#comment-14707498 ] Chris Douglas commented on MAPREDUCE-6434: -- Agreed, the NPE is usually not a problem since the default should be defined in mapred-defaults, though {{JobContextImpl::getReducerClass}} can return null. At least two cases shouldn't cause a problem for map-only jobs: # The base {{mapreduce.Reducer}} is {{\@Checkpointable}}, so it would instantiate a {{PartialFileOutputCommitter}} # A {{Reducer}} in the config shouldn't cause a map-only job to fail if it's not on the classpath (this may not be true in the current code, but this shouldn't add another case) We also don't want to do anything surprising for setup/cleanup tasks. Add support for PartialFileOutputCommiter when checkpointing is an option during preemption --- Key: MAPREDUCE-6434 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6434 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Augusto Souza Assignee: Augusto Souza Attachments: MAPREDUCE-6434.001.patch, MAPREDUCE-6434.002.patch, MAPREDUCE-6434.003.patch, MAPREDUCE-6434.004.patch, MAPREDUCE-6434.005.patch, MAPREDUCE-6434.006.patch Finish up some renaming work related to the annotation @Preemptable (it should be @Checkpointable now) and help in the splitting of patch in MAPREDUCE-5269 that is too large for being reviewed or accepted by Jenkins CI scripts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6434) Add support for PartialFileOutputCommiter when checkpointing is an option during preemption
[ https://issues.apache.org/jira/browse/MAPREDUCE-6434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706018#comment-14706018 ] Chris Douglas commented on MAPREDUCE-6434: -- Thanks for updating the patch, [~augustorsouza]. Could you check this change? {noformat} - committer = new FileOutputCommitter(output, context); + try { +if (context.getConfiguration().getBoolean(MRJobConfig.TASK_PREEMPTION, +false) + context.getReducerClass() +.isAnnotationPresent(Checkpointable.class)) { + committer = new PartialFileOutputCommitter(output, context); +} else { + committer = new FileOutputCommitter(output, context); +} + } catch (ClassNotFoundException c) { +throw new RuntimeException( +Internal error: reducer class is not defined , c); + } {noformat} Since preemption in MAPREDUCE-5269 only supports reduce tasks, even if preemption is enabled for map-only jobs, the reduce class can be undefined. Add support for PartialFileOutputCommiter when checkpointing is an option during preemption --- Key: MAPREDUCE-6434 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6434 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Augusto Souza Assignee: Augusto Souza Attachments: MAPREDUCE-6434.001.patch, MAPREDUCE-6434.002.patch, MAPREDUCE-6434.003.patch, MAPREDUCE-6434.004.patch, MAPREDUCE-6434.005.patch Finish up some renaming work related to the annotation @Preemptable (it should be @Checkpointable now) and help in the splitting of patch in MAPREDUCE-5269 that is too large for being reviewed or accepted by Jenkins CI scripts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-2454) Allow external sorter plugin for MR
[ https://issues.apache.org/jira/browse/MAPREDUCE-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-2454: - Assignee: Mariappan Asokan (was: Bharat Jha) Allow external sorter plugin for MR --- Key: MAPREDUCE-2454 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2454 Project: Hadoop Map/Reduce Issue Type: New Feature Affects Versions: 2.0.0-alpha, 3.0.0, 2.0.2-alpha Reporter: Mariappan Asokan Assignee: Mariappan Asokan Priority: Minor Labels: features, performance, plugin, sort Fix For: 2.0.3-alpha Attachments: HadoopSortPlugin.pdf, HadoopSortPlugin.pdf, KeyValueIterator.java, MR-2454-trunkPatchPreview.gz, MapOutputSorter.java, MapOutputSorterAbstract.java, ReduceInputSorter.java, mapreduce-2454-modified-code.patch, mapreduce-2454-modified-test.patch, mapreduce-2454-new-test.patch, mapreduce-2454-protection-change.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mapreduce-2454.patch, mr-2454-on-mr-279-build82.patch.gz Define interfaces and some abstract classes in the Hadoop framework to facilitate external sorter plugins both on the Map and Reduce sides. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5817) mappers get rescheduled on node transition even after all reducers are completed
[ https://issues.apache.org/jira/browse/MAPREDUCE-5817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14693807#comment-14693807 ] Chris Douglas commented on MAPREDUCE-5817: -- bq. The current patch skips re-running mappers only if all reducers are complete. So I don't think reducers will fail beyond that point? Did I understand your question right? I see; sorry, I hadn't read the rest of the JIRA carefully. That's a fairly narrow window, isn't it? We may not need an extra state, if we kill all running maps when the last reducer completes. The condition this adds prevents new maps from being scheduled while cleanup/commit code is running. Minor: could {{allReducersComplete()}} call {{getCompletedReduces()}}? +1 on the patch mappers get rescheduled on node transition even after all reducers are completed Key: MAPREDUCE-5817 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5817 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster Affects Versions: 2.3.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Attachments: MAPREDUCE-5817.001.patch, mapreduce-5817.patch We're seeing a behavior where a job runs long after all reducers were already finished. We found that the job was rescheduling and running a number of mappers beyond the point of reducer completion. In one situation, the job ran for some 9 more hours after all reducers completed! This happens because whenever a node transition (to an unusable state) comes into the app master, it just reschedules all mappers that already ran on the node in all cases. Therefore, if any node transition has a potential to extend the job period. Once this window opens, another node transition can prolong it, and this can happen indefinitely in theory. If there is some instability in the pool (unhealthy, etc.) for a duration, then any big job is severely vulnerable to this problem. If all reducers have been completed, JobImpl.actOnUnusableNode() should not reschedule mapper tasks. If all reducers are completed, the mapper outputs are no longer needed, and there is no need to reschedule mapper tasks as they would not be consumed anyway. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-5817) mappers get rescheduled on node transition even after all reducers are completed
[ https://issues.apache.org/jira/browse/MAPREDUCE-5817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692298#comment-14692298 ] Chris Douglas commented on MAPREDUCE-5817: -- Does this work if the reducer fails subsequently? Presumably reexecution will be triggered by fetch failures? mappers get rescheduled on node transition even after all reducers are completed Key: MAPREDUCE-5817 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5817 Project: Hadoop Map/Reduce Issue Type: Bug Components: applicationmaster Affects Versions: 2.3.0 Reporter: Sangjin Lee Assignee: Sangjin Lee Labels: BB2015-05-TBR Attachments: MAPREDUCE-5817.001.patch, mapreduce-5817.patch We're seeing a behavior where a job runs long after all reducers were already finished. We found that the job was rescheduling and running a number of mappers beyond the point of reducer completion. In one situation, the job ran for some 9 more hours after all reducers completed! This happens because whenever a node transition (to an unusable state) comes into the app master, it just reschedules all mappers that already ran on the node in all cases. Therefore, if any node transition has a potential to extend the job period. Once this window opens, another node transition can prolong it, and this can happen indefinitely in theory. If there is some instability in the pool (unhealthy, etc.) for a duration, then any big job is severely vulnerable to this problem. If all reducers have been completed, JobImpl.actOnUnusableNode() should not reschedule mapper tasks. If all reducers are completed, the mapper outputs are no longer needed, and there is no need to reschedule mapper tasks as they would not be consumed anyway. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6240) Hadoop client displays confusing error message
[ https://issues.apache.org/jira/browse/MAPREDUCE-6240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14660599#comment-14660599 ] Chris Douglas commented on MAPREDUCE-6240: -- bq. how about refactoring it and removing this class org.apache.hadoop.io.MultipleIOException We'd have to audit where it's used. If there could be systems that expect and handle it, we'd have to deprecate it first, but I think it makes sense to remove it in trunk. Separate issue, of course. Hadoop client displays confusing error message -- Key: MAPREDUCE-6240 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6240 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Affects Versions: 2.7.0 Reporter: Mohammad Kamrul Islam Assignee: Gera Shegalov Attachments: MAPREDUCE-6240-gera.001.patch, MAPREDUCE-6240-gera.001.patch, MAPREDUCE-6240-gera.002.patch, MAPREDUCE-6240.003.patch, MAPREDUCE-6240.1.patch Hadoop client often throws exception with java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses. This is a misleading and generic message for any cluster initialization problem. It takes a lot of debugging hours to identify the root cause. The correct error message could resolve this problem quickly. In one such instance, Oozie log showed the following exception while the root cause was CNF that Hadoop client didn't return in the exception. {noformat} JA009: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses. at org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:412) at org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:392) at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:979) at org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1134) at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:228) at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63) at org.apache.oozie.command.XCommand.call(XCommand.java:281) at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:323) at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:252) at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:174) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses. at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120) at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:82) at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:75) at org.apache.hadoop.mapred.JobClient.init(JobClient.java:470) at org.apache.hadoop.mapred.JobClient.init(JobClient.java:449) at org.apache.oozie.service.HadoopAccessorService$1.run(HadoopAccessorService.java:372) at org.apache.oozie.service.HadoopAccessorService$1.run(HadoopAccessorService.java:370) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.oozie.service.HadoopAccessorService.createJobClient(HadoopAccessorService.java:379) at org.apache.oozie.action.hadoop.JavaActionExecutor.createJobClient(JavaActionExecutor.java:1185) at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:927) ... 10 more {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6240) Hadoop client displays confusing error message
[ https://issues.apache.org/jira/browse/MAPREDUCE-6240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14658589#comment-14658589 ] Chris Douglas commented on MAPREDUCE-6240: -- bq. Any reason why we dint just throw a IOException with all these exception added as suppressed exceptions.? Neat! I didn't know this was added to 1.7. I like this approach Hadoop client displays confusing error message -- Key: MAPREDUCE-6240 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6240 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Affects Versions: 2.7.0 Reporter: Mohammad Kamrul Islam Assignee: Gera Shegalov Attachments: MAPREDUCE-6240-gera.001.patch, MAPREDUCE-6240-gera.001.patch, MAPREDUCE-6240-gera.002.patch, MAPREDUCE-6240.003.patch, MAPREDUCE-6240.1.patch Hadoop client often throws exception with java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses. This is a misleading and generic message for any cluster initialization problem. It takes a lot of debugging hours to identify the root cause. The correct error message could resolve this problem quickly. In one such instance, Oozie log showed the following exception while the root cause was CNF that Hadoop client didn't return in the exception. {noformat} JA009: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses. at org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:412) at org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:392) at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:979) at org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1134) at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:228) at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63) at org.apache.oozie.command.XCommand.call(XCommand.java:281) at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:323) at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:252) at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:174) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses. at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120) at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:82) at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:75) at org.apache.hadoop.mapred.JobClient.init(JobClient.java:470) at org.apache.hadoop.mapred.JobClient.init(JobClient.java:449) at org.apache.oozie.service.HadoopAccessorService$1.run(HadoopAccessorService.java:372) at org.apache.oozie.service.HadoopAccessorService$1.run(HadoopAccessorService.java:370) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.oozie.service.HadoopAccessorService.createJobClient(HadoopAccessorService.java:379) at org.apache.oozie.action.hadoop.JavaActionExecutor.createJobClient(JavaActionExecutor.java:1185) at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:927) ... 10 more {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6427) Fix typo in JobHistoryEventHandler
[ https://issues.apache.org/jira/browse/MAPREDUCE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-6427: - Summary: Fix typo in JobHistoryEventHandler (was: Fix Typo in JobHistoryEventHandler#processEventForTimelineServer) Fix typo in JobHistoryEventHandler -- Key: MAPREDUCE-6427 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6427 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Brahma Reddy Battula Assignee: Ray Chiang Priority: Minor Fix For: 2.8.0 Attachments: MAPREDUCE-6427.001.branch-2.patch, MAPREDUCE-6427.001.patch JobHistoryEventHandler#processEventForTimelineServer {code}tEvent.addEventInfo(WORKLFOW_ID, jse.getWorkflowId());{code} *should be like below.* {code}tEvent.addEventInfo(WORKFLOW_ID, jse.getWorkflowId()); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6427) Fix Typo in JobHistoryEventHandler#processEventForTimelineServer
[ https://issues.apache.org/jira/browse/MAPREDUCE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-6427: - Resolution: Fixed Hadoop Flags: Incompatible change,Reviewed (was: Incompatible change) Fix Version/s: 2.8.0 Release Note: There is a typo in the event string WORKFLOW_ID (as WORKLFOW_ID). The branch-2 change will publish both event strings for compatibility with consumers, but the misspelled metric will be removed in trunk. (was: There is a typo in the event string WORKFLOW_ID (as WORKLFOW_ID). The branch-2 change will publish both event strings. The trunk solution will be an incompatible change going forward, with only the correctly spelled string.) Status: Resolved (was: Patch Available) +1 I committed this. Thanks Ray Fix Typo in JobHistoryEventHandler#processEventForTimelineServer Key: MAPREDUCE-6427 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6427 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Brahma Reddy Battula Assignee: Ray Chiang Priority: Minor Fix For: 2.8.0 Attachments: MAPREDUCE-6427.001.branch-2.patch, MAPREDUCE-6427.001.patch JobHistoryEventHandler#processEventForTimelineServer {code}tEvent.addEventInfo(WORKLFOW_ID, jse.getWorkflowId());{code} *should be like below.* {code}tEvent.addEventInfo(WORKFLOW_ID, jse.getWorkflowId()); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6427) Fix Typo in JobHistoryEventHandler#processEventForTimelineServer
[ https://issues.apache.org/jira/browse/MAPREDUCE-6427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619698#comment-14619698 ] Chris Douglas commented on MAPREDUCE-6427: -- bq. Initial version. branch-2 and trunk versions appear identical. The branch-2 version should publish both the old metric _and_ the correctly spelled label. Trunk can replace it outright. Please also add a release note. Fix Typo in JobHistoryEventHandler#processEventForTimelineServer Key: MAPREDUCE-6427 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6427 Project: Hadoop Map/Reduce Issue Type: Bug Reporter: Brahma Reddy Battula Assignee: Ray Chiang Priority: Minor Attachments: MAPREDUCE-6427.001.patch JobHistoryEventHandler#processEventForTimelineServer {code}tEvent.addEventInfo(WORKLFOW_ID, jse.getWorkflowId());{code} *should be like below.* {code}tEvent.addEventInfo(WORKFLOW_ID, jse.getWorkflowId()); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6038) A boolean may be set error in the Word Count v2.0 in MapReduce Tutorial
[ https://issues.apache.org/jira/browse/MAPREDUCE-6038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-6038: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) +1 I committed this. Thanks Tsuyoshi A boolean may be set error in the Word Count v2.0 in MapReduce Tutorial --- Key: MAPREDUCE-6038 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6038 Project: Hadoop Map/Reduce Issue Type: Bug Environment: java version 1.8.0_11 hostspot 64-bit Reporter: Pei Ma Assignee: Tsuyoshi Ozawa Priority: Minor Labels: BB2015-05-TBR Fix For: 2.8.0 Attachments: MAPREDUCE-6038.1.patch, MAPREDUCE-6038.2.patch As a beginner, when I learned about the basic of the mr, I found that I cound't run the WordCount2 using the command bin/hadoop jar wc.jar WordCount2 /user/joe/wordcount/input /user/joe/wordcount/output in the Tutorial. The VM throwed the NullPoniterException at the line 47. In the line 45, the returned default value of conf.getBoolean is true. That is to say when wordcount.skip.patterns is not set ,the WordCount2 will continue to execute getCacheFiles.. Then patternsURIs gets the null value. When the -skip option dosen't exist, wordcount.skip.patterns will not be set. Then a NullPointerException come out. At all, the block after the if-statement in line no. 45 shoudn't be executed when the -skip option dosen't exist in command. Maybe the line 45 should like that if (conf.getBoolean(wordcount.skip.patterns, false)) { .Just change the boolean. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6240) Hadoop client displays confusing error message
[ https://issues.apache.org/jira/browse/MAPREDUCE-6240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609224#comment-14609224 ] Chris Douglas commented on MAPREDUCE-6240: -- Minor (not blocking for commit): - This could just add the {{IOException}} to the list instead of throw/catch - The message doesn't need to append {{t}}: {noformat} +ioExceptions.add(new IOException(Failed to initialize protocol: ++ t, t)); {noformat} - Should this continue to catch only {{Exception}}, instead of {{Throwable}}? - If the cause of the exception is an {{IOException}}, this discards the caught exception? Is this a special/common case? It's a little odd to wrap all of this as an {{IOException}}, but I don't think it's worth adding another composite type. Hadoop client displays confusing error message -- Key: MAPREDUCE-6240 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6240 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Affects Versions: 2.7.0 Reporter: Mohammad Kamrul Islam Assignee: Gera Shegalov Attachments: MAPREDUCE-6240-gera.001.patch, MAPREDUCE-6240-gera.001.patch, MAPREDUCE-6240-gera.002.patch, MAPREDUCE-6240.003.patch, MAPREDUCE-6240.1.patch Hadoop client often throws exception with java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses. This is a misleading and generic message for any cluster initialization problem. It takes a lot of debugging hours to identify the root cause. The correct error message could resolve this problem quickly. In one such instance, Oozie log showed the following exception while the root cause was CNF that Hadoop client didn't return in the exception. {noformat} JA009: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses. at org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:412) at org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:392) at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:979) at org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1134) at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:228) at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63) at org.apache.oozie.command.XCommand.call(XCommand.java:281) at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:323) at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:252) at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:174) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses. at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120) at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:82) at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:75) at org.apache.hadoop.mapred.JobClient.init(JobClient.java:470) at org.apache.hadoop.mapred.JobClient.init(JobClient.java:449) at org.apache.oozie.service.HadoopAccessorService$1.run(HadoopAccessorService.java:372) at org.apache.oozie.service.HadoopAccessorService$1.run(HadoopAccessorService.java:370) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.oozie.service.HadoopAccessorService.createJobClient(HadoopAccessorService.java:379) at org.apache.oozie.action.hadoop.JavaActionExecutor.createJobClient(JavaActionExecutor.java:1185) at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:927) ... 10 more {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-2094) org.apache.hadoop.mapreduce.lib.input.FileInputFormat: isSplitable implements unsafe default behaviour that is different from the documented behaviour.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-2094: - Attachment: M2094.patch Rewrote exception message. [~nielsbasjes], I know you think this undersells the severity of the bug. I'll rewrite the description to limit the scope of this fix, if you still want to litigate the point in another JIRA. org.apache.hadoop.mapreduce.lib.input.FileInputFormat: isSplitable implements unsafe default behaviour that is different from the documented behaviour. --- Key: MAPREDUCE-2094 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2094 Project: Hadoop Map/Reduce Issue Type: Bug Components: task Reporter: Niels Basjes Assignee: Niels Basjes Attachments: M2094.patch, MAPREDUCE-2094-2011-05-19.patch, MAPREDUCE-2094-20140727-svn-fixed-spaces.patch, MAPREDUCE-2094-20140727-svn.patch, MAPREDUCE-2094-20140727.patch, MAPREDUCE-2094-2015-05-05-2328.patch, MAPREDUCE-2094-FileInputFormat-docs-v2.patch When implementing a custom derivative of FileInputFormat we ran into the effect that a large Gzipped input file would be processed several times. A near 1GiB file would be processed around 36 times in its entirety. Thus producing garbage results and taking up a lot more CPU time than needed. It took a while to figure out and what we found is that the default implementation of the isSplittable method in [org.apache.hadoop.mapreduce.lib.input.FileInputFormat | http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java?view=markup ] is simply return true;. This is a very unsafe default and is in contradiction with the JavaDoc of the method which states: Is the given filename splitable? Usually, true, but if the file is stream compressed, it will not be. . The actual implementation effectively does Is the given filename splitable? Always true, even if the file is stream compressed using an unsplittable compression codec. For our situation (where we always have Gzipped input) we took the easy way out and simply implemented an isSplittable in our class that does return false; Now there are essentially 3 ways I can think of for fixing this (in order of what I would find preferable): # Implement something that looks at the used compression of the file (i.e. do migrate the implementation from TextInputFormat to FileInputFormat). This would make the method do what the JavaDoc describes. # Force developers to think about it and make this method abstract. # Use a safe default (i.e. return false) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-2094) LineRecordReader should not seek into non-splittable, compressed streams.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-2094: - Summary: LineRecordReader should not seek into non-splittable, compressed streams. (was: org.apache.hadoop.mapreduce.lib.input.FileInputFormat: isSplitable implements unsafe default behaviour that is different from the documented behaviour.) LineRecordReader should not seek into non-splittable, compressed streams. - Key: MAPREDUCE-2094 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2094 Project: Hadoop Map/Reduce Issue Type: Bug Components: task Reporter: Niels Basjes Assignee: Niels Basjes Attachments: M2094.patch, MAPREDUCE-2094-2011-05-19.patch, MAPREDUCE-2094-20140727-svn-fixed-spaces.patch, MAPREDUCE-2094-20140727-svn.patch, MAPREDUCE-2094-20140727.patch, MAPREDUCE-2094-2015-05-05-2328.patch, MAPREDUCE-2094-FileInputFormat-docs-v2.patch When implementing a custom derivative of FileInputFormat we ran into the effect that a large Gzipped input file would be processed several times. A near 1GiB file would be processed around 36 times in its entirety. Thus producing garbage results and taking up a lot more CPU time than needed. It took a while to figure out and what we found is that the default implementation of the isSplittable method in [org.apache.hadoop.mapreduce.lib.input.FileInputFormat | http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java?view=markup ] is simply return true;. This is a very unsafe default and is in contradiction with the JavaDoc of the method which states: Is the given filename splitable? Usually, true, but if the file is stream compressed, it will not be. . The actual implementation effectively does Is the given filename splitable? Always true, even if the file is stream compressed using an unsplittable compression codec. For our situation (where we always have Gzipped input) we took the easy way out and simply implemented an isSplittable in our class that does return false; Now there are essentially 3 ways I can think of for fixing this (in order of what I would find preferable): # Implement something that looks at the used compression of the file (i.e. do migrate the implementation from TextInputFormat to FileInputFormat). This would make the method do what the JavaDoc describes. # Force developers to think about it and make this method abstract. # Use a safe default (i.e. return false) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4882) Error in estimating the length of the output file in Spill Phase
[ https://issues.apache.org/jira/browse/MAPREDUCE-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-4882: - Resolution: Duplicate Fix Version/s: 2.6.0 Target Version/s: (was: ) Status: Resolved (was: Patch Available) Fixed in MAPREDUCE-6063. Sorry Jerry; didn't see this. Error in estimating the length of the output file in Spill Phase Key: MAPREDUCE-4882 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4882 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 0.20.2, 1.0.3 Environment: Any Environment Reporter: Lijie Xu Assignee: Jerry Chen Labels: BB2015-05-TBR, patch Fix For: 2.6.0 Attachments: MAPREDUCE-4882.patch Original Estimate: 1h Remaining Estimate: 1h The sortAndSpill() method in MapTask.java has an error in estimating the length of the output file. The long size should be (bufvoid - bufstart) + bufend not (bufvoid - bufend) + bufstart when bufend bufstart. Here is the original code in MapTask.java. private void sortAndSpill() throws IOException, ClassNotFoundException, InterruptedException { //approximate the length of the output file to be the length of the //buffer + header lengths for the partitions long size = (bufend = bufstart ? bufend - bufstart : (bufvoid - bufend) + bufstart) + partitions * APPROX_HEADER_LENGTH; FSDataOutputStream out = null; -- I had a test on TeraSort. A snippet from mapper's log is as follows: MapTask: Spilling map output: record full = true MapTask: bufstart = 157286200; bufend = 10485460; bufvoid = 199229440 MapTask: kvstart = 262142; kvend = 131069; length = 655360 MapTask: Finished spill 3 In this occasioin, Spill Bytes should be (199229440 - 157286200) + 10485460 = 52428700 (52 MB) because the number of spilled records is 524287 and each record costs 100B. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-2094) LineRecordReader should not seek into non-splittable, compressed streams.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-2094: - Attachment: M2094-1.patch Ran test-patch locally, all OK except a spurious whitespace and a release audit warning (fixed) LineRecordReader should not seek into non-splittable, compressed streams. - Key: MAPREDUCE-2094 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2094 Project: Hadoop Map/Reduce Issue Type: Bug Components: task Reporter: Niels Basjes Assignee: Niels Basjes Attachments: M2094-1.patch, M2094.patch, MAPREDUCE-2094-2011-05-19.patch, MAPREDUCE-2094-20140727-svn-fixed-spaces.patch, MAPREDUCE-2094-20140727-svn.patch, MAPREDUCE-2094-20140727.patch, MAPREDUCE-2094-2015-05-05-2328.patch, MAPREDUCE-2094-FileInputFormat-docs-v2.patch When implementing a custom derivative of FileInputFormat we ran into the effect that a large Gzipped input file would be processed several times. A near 1GiB file would be processed around 36 times in its entirety. Thus producing garbage results and taking up a lot more CPU time than needed. It took a while to figure out and what we found is that the default implementation of the isSplittable method in [org.apache.hadoop.mapreduce.lib.input.FileInputFormat | http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java?view=markup ] is simply return true;. This is a very unsafe default and is in contradiction with the JavaDoc of the method which states: Is the given filename splitable? Usually, true, but if the file is stream compressed, it will not be. . The actual implementation effectively does Is the given filename splitable? Always true, even if the file is stream compressed using an unsplittable compression codec. For our situation (where we always have Gzipped input) we took the easy way out and simply implemented an isSplittable in our class that does return false; Now there are essentially 3 ways I can think of for fixing this (in order of what I would find preferable): # Implement something that looks at the used compression of the file (i.e. do migrate the implementation from TextInputFormat to FileInputFormat). This would make the method do what the JavaDoc describes. # Force developers to think about it and make this method abstract. # Use a safe default (i.e. return false) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-2094) LineRecordReader should not seek into non-splittable, compressed streams.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-2094: - Resolution: Fixed Fix Version/s: 2.8.0 Release Note: (was: Throw an Exception in the most common error scenario present in many FileInputFormat derivatives that do not override isSplitable. ) Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) +1 I committed this to trunk and branch-2. Thanks Niels LineRecordReader should not seek into non-splittable, compressed streams. - Key: MAPREDUCE-2094 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2094 Project: Hadoop Map/Reduce Issue Type: Bug Components: task Reporter: Niels Basjes Assignee: Niels Basjes Fix For: 2.8.0 Attachments: M2094-1.patch, M2094.patch, MAPREDUCE-2094-2011-05-19.patch, MAPREDUCE-2094-20140727-svn-fixed-spaces.patch, MAPREDUCE-2094-20140727-svn.patch, MAPREDUCE-2094-20140727.patch, MAPREDUCE-2094-2015-05-05-2328.patch, MAPREDUCE-2094-FileInputFormat-docs-v2.patch When implementing a custom derivative of FileInputFormat we ran into the effect that a large Gzipped input file would be processed several times. A near 1GiB file would be processed around 36 times in its entirety. Thus producing garbage results and taking up a lot more CPU time than needed. It took a while to figure out and what we found is that the default implementation of the isSplittable method in [org.apache.hadoop.mapreduce.lib.input.FileInputFormat | http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java?view=markup ] is simply return true;. This is a very unsafe default and is in contradiction with the JavaDoc of the method which states: Is the given filename splitable? Usually, true, but if the file is stream compressed, it will not be. . The actual implementation effectively does Is the given filename splitable? Always true, even if the file is stream compressed using an unsplittable compression codec. For our situation (where we always have Gzipped input) we took the easy way out and simply implemented an isSplittable in our class that does return false; Now there are essentially 3 ways I can think of for fixing this (in order of what I would find preferable): # Implement something that looks at the used compression of the file (i.e. do migrate the implementation from TextInputFormat to FileInputFormat). This would make the method do what the JavaDoc describes. # Force developers to think about it and make this method abstract. # Use a safe default (i.e. return false) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4469) Resource calculation in child tasks is CPU-heavy
[ https://issues.apache.org/jira/browse/MAPREDUCE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-4469: - Status: Open (was: Patch Available) branch-1 is not getting new fixes, but it looks like {{ProcfsBasedProcessTree}} in trunk could benefit from this same set of optimizations. I'll leave the issue open, in case someone has the time and inclination to rebase it. Resource calculation in child tasks is CPU-heavy Key: MAPREDUCE-4469 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4469 Project: Hadoop Map/Reduce Issue Type: Bug Components: performance, task Affects Versions: 1.0.3 Reporter: Todd Lipcon Assignee: Ahmed Radwan Labels: BB2015-05-TBR Attachments: MAPREDUCE-4469.patch, MAPREDUCE-4469_rev2.patch, MAPREDUCE-4469_rev3.patch, MAPREDUCE-4469_rev4.patch, MAPREDUCE-4469_rev5.patch In doing some benchmarking on a hadoop-1 derived codebase, I noticed that each of the child tasks was doing a ton of syscalls. Upon stracing, I noticed that it's spending a lot of time looping through all the files in /proc to calculate resource usage. As a test, I added a flag to disable use of the ResourceCalculatorPlugin within the tasks. On a CPU-bound 500G-sort workload, this improved total job runtime by about 10% (map slot-seconds by 14%, reduce slot seconds by 8%) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-3936) Clients should not enforce counter limits
[ https://issues.apache.org/jira/browse/MAPREDUCE-3936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-3936: - Status: Open (was: Patch Available) Patch no longer applies. In the context of YARN-2928, presumably the timeline server can handle unlimited counters. The limit can already be set very high... is this still relevant for MR? If nobody plans to work on this, please close as later or won't fix Clients should not enforce counter limits -- Key: MAPREDUCE-3936 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3936 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv1 Reporter: Tom White Assignee: Tom White Labels: BB2015-05-TBR Attachments: MAPREDUCE-3936.patch, MAPREDUCE-3936.patch The code for enforcing counter limits (from MAPREDUCE-1943) creates a static JobConf instance to load the limits, which may throw an exception if the client limit is set to be lower than the limit on the cluster (perhaps because the cluster limit was raised from the default). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6038) A boolean may be set error in the Word Count v2.0 in MapReduce Tutorial
[ https://issues.apache.org/jira/browse/MAPREDUCE-6038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-6038: - Attachment: MAPREDUCE-6038.2.patch Rebased patch A boolean may be set error in the Word Count v2.0 in MapReduce Tutorial --- Key: MAPREDUCE-6038 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6038 Project: Hadoop Map/Reduce Issue Type: Bug Environment: java version 1.8.0_11 hostspot 64-bit Reporter: Pei Ma Assignee: Tsuyoshi Ozawa Priority: Minor Labels: BB2015-05-TBR Attachments: MAPREDUCE-6038.1.patch, MAPREDUCE-6038.2.patch As a beginner, when I learned about the basic of the mr, I found that I cound't run the WordCount2 using the command bin/hadoop jar wc.jar WordCount2 /user/joe/wordcount/input /user/joe/wordcount/output in the Tutorial. The VM throwed the NullPoniterException at the line 47. In the line 45, the returned default value of conf.getBoolean is true. That is to say when wordcount.skip.patterns is not set ,the WordCount2 will continue to execute getCacheFiles.. Then patternsURIs gets the null value. When the -skip option dosen't exist, wordcount.skip.patterns will not be set. Then a NullPointerException come out. At all, the block after the if-statement in line no. 45 shoudn't be executed when the -skip option dosen't exist in command. Maybe the line 45 should like that if (conf.getBoolean(wordcount.skip.patterns, false)) { .Just change the boolean. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-4473) tasktracker rank on machines.jsp?type=active
[ https://issues.apache.org/jira/browse/MAPREDUCE-4473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-4473: - Resolution: Won't Fix Target Version/s: 1.0.3, 1.0.2, 1.0.1, 1.0.0 (was: 1.0.0, 1.0.1, 1.0.2, 1.0.3) Status: Resolved (was: Patch Available) Closing as branch-1 is unlikely to be released tasktracker rank on machines.jsp?type=active Key: MAPREDUCE-4473 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4473 Project: Hadoop Map/Reduce Issue Type: Improvement Components: tasktracker Affects Versions: 0.20.2, 0.21.0, 0.22.0, 0.23.0, 0.23.1, 1.0.0, 1.0.1, 1.0.2, 1.0.3 Reporter: jian fan Priority: Minor Labels: BB2015-05-TBR, tasktracker Attachments: MAPREDUCE-4473.patch sometimes we need to simple judge which tasktracker is down from the page of machines.jsp?type=active -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5150) Backport 2009 terasort (MAPREDUCE-639) to branch-1
[ https://issues.apache.org/jira/browse/MAPREDUCE-5150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-5150: - Resolution: Won't Fix Status: Resolved (was: Patch Available) Closing as WONTFIX, since branch-1 is unlikely to be released. Backport 2009 terasort (MAPREDUCE-639) to branch-1 -- Key: MAPREDUCE-5150 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5150 Project: Hadoop Map/Reduce Issue Type: Improvement Components: examples Affects Versions: 1.2.0 Reporter: Gera Shegalov Priority: Minor Labels: BB2015-05-TBR Attachments: MAPREDUCE-5150-branch-1.patch Users evaluate performance of Hadoop clusters using different benchmarks such as TeraSort. However, terasort version in branch-1 is outdated. It works on teragen dataset that cannot exceed 4 billion unique keys and it does not have the fast non-sampling partitioner SimplePartitioner either. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6023) Fix SuppressWarnings from unchecked to rawtypes in O.A.H.mapreduce.lib.input.TaggedInputSplit
[ https://issues.apache.org/jira/browse/MAPREDUCE-6023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-6023: - Status: Open (was: Patch Available) This is creating new javac warnings, not correcting the errant usage. Fix SuppressWarnings from unchecked to rawtypes in O.A.H.mapreduce.lib.input.TaggedInputSplit - Key: MAPREDUCE-6023 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6023 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Junping Du Assignee: Abhilash Srimat Tirumala Pallerlamudi Priority: Minor Labels: BB2015-05-TBR, newbie Attachments: MAPREDUCE-6023.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-2094) org.apache.hadoop.mapreduce.lib.input.FileInputFormat: isSplitable implements unsafe default behaviour that is different from the documented behaviour.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14531171#comment-14531171 ] Chris Douglas commented on MAPREDUCE-2094: -- [Given|http://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201405.mbox/%3CCADoiZqoBKme-HYoM%3DhRxPEs1w2qdevo0%3DaoihqiWT4vS8D42Yg%40mail.gmail.com%3E] [discussion|https://mail-archives.apache.org/mod_mbox/hadoop-common-dev/201406.mbox/%3ccadoizqoqkpn_7b9w75dcrvjxz1sqbkryqbrwlw1rwo26a4e...@mail.gmail.com%3E] on the dev list, the following error message: {noformat} + throw new IOException( +Implementation bug in the used FileInputFormat: + +The isSplitable method returned 'true' on a file that + +was compressed with a non splittable compression codec. + +If you get this right after upgrading Hadoop then know + +that you have been looking at reports based on + +corrupt data for a long time !!! (see: MAPREDUCE-2094)); {noformat} is a little over the top. Please just report the error detected e.g., {{Cannot seek in + codec.getClass().getSimpleName() + compressed stream}} org.apache.hadoop.mapreduce.lib.input.FileInputFormat: isSplitable implements unsafe default behaviour that is different from the documented behaviour. --- Key: MAPREDUCE-2094 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2094 Project: Hadoop Map/Reduce Issue Type: Bug Components: task Reporter: Niels Basjes Assignee: Niels Basjes Labels: BB2015-05-TBR Attachments: MAPREDUCE-2094-2011-05-19.patch, MAPREDUCE-2094-20140727-svn-fixed-spaces.patch, MAPREDUCE-2094-20140727-svn.patch, MAPREDUCE-2094-20140727.patch, MAPREDUCE-2094-2015-05-05-2328.patch, MAPREDUCE-2094-FileInputFormat-docs-v2.patch When implementing a custom derivative of FileInputFormat we ran into the effect that a large Gzipped input file would be processed several times. A near 1GiB file would be processed around 36 times in its entirety. Thus producing garbage results and taking up a lot more CPU time than needed. It took a while to figure out and what we found is that the default implementation of the isSplittable method in [org.apache.hadoop.mapreduce.lib.input.FileInputFormat | http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/java/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.java?view=markup ] is simply return true;. This is a very unsafe default and is in contradiction with the JavaDoc of the method which states: Is the given filename splitable? Usually, true, but if the file is stream compressed, it will not be. . The actual implementation effectively does Is the given filename splitable? Always true, even if the file is stream compressed using an unsplittable compression codec. For our situation (where we always have Gzipped input) we took the easy way out and simply implemented an isSplittable in our class that does return false; Now there are essentially 3 ways I can think of for fixing this (in order of what I would find preferable): # Implement something that looks at the used compression of the file (i.e. do migrate the implementation from TextInputFormat to FileInputFormat). This would make the method do what the JavaDoc describes. # Force developers to think about it and make this method abstract. # Use a safe default (i.e. return false) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6220) Provide option to suppress stdout of MapReduce task
[ https://issues.apache.org/jira/browse/MAPREDUCE-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512014#comment-14512014 ] Chris Douglas commented on MAPREDUCE-6220: -- bq. I think YangHao may have been thinking of putting this option in there for production clusters. I understood the intent, but the value in the patch is taken from the user configuration. Agree on closing this as WONTFIX. Provide option to suppress stdout of MapReduce task --- Key: MAPREDUCE-6220 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6220 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Reporter: Yang Hao Assignee: Yang Hao Attachments: MAPREDUCE-6220.patch, MAPREDUCE-6220.v2.patch System.out is a ugly way to print log, and many times it would do harm to Hadoop cluster. So we can provide an option to forbid it -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6220) Provide option to suppress stdout of MapReduce task
[ https://issues.apache.org/jira/browse/MAPREDUCE-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510090#comment-14510090 ] Chris Douglas commented on MAPREDUCE-6220: -- The patch won't work on Windows. This seems unlikely to solve the problem... users printing lots of messages to stdout/stderr won't think to suppress the output using an esoteric config knob before they submit. Until YARN limits disk usage, corralling users to run on partitions separated from HDFS, etc. will at least limit the damage done by containers. Provide option to suppress stdout of MapReduce task --- Key: MAPREDUCE-6220 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6220 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mrv2 Reporter: Yang Hao Assignee: Yang Hao Attachments: MAPREDUCE-6220.patch, MAPREDUCE-6220.v2.patch System.out is a ugly way to print log, and many times it would do harm to Hadoop cluster. So we can provide an option to forbid it -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6240) Hadoop client displays confusing error message
[ https://issues.apache.org/jira/browse/MAPREDUCE-6240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-6240: - Status: Open (was: Patch Available) bq. The intuition behind chaining is the following. We would not try provider 2 if provider 1 worked, in other words: our provider 2 failures are indirectly caused by provider 1 failures. Wait, this is synthetically stitching exceptions together from a retry loop? That would be very confusing to debug. Have you looked at an approach like [MultipleIOException|https://git1-us-west.apache.org/repos/asf?p=hadoop.git;a=blob;f=hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/MultipleIOException.java;h=5e584c9cd0705471a826932d782eec409b5bae37;hb=HEAD]? There's also a spurious change to {{AbstractFileSystem}} in the latest patch. Hadoop client displays confusing error message -- Key: MAPREDUCE-6240 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6240 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Reporter: Mohammad Kamrul Islam Assignee: Mohammad Kamrul Islam Attachments: MAPREDUCE-6240-gera.001.patch, MAPREDUCE-6240-gera.001.patch, MAPREDUCE-6240-gera.002.patch, MAPREDUCE-6240.1.patch Hadoop client often throws exception with java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses. This is a misleading and generic message for any cluster initialization problem. It takes a lot of debugging hours to identify the root cause. The correct error message could resolve this problem quickly. In one such instance, Oozie log showed the following exception while the root cause was CNF that Hadoop client didn't return in the exception. {noformat} JA009: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses. at org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:412) at org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:392) at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:979) at org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1134) at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:228) at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63) at org.apache.oozie.command.XCommand.call(XCommand.java:281) at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:323) at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:252) at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:174) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses. at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120) at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:82) at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:75) at org.apache.hadoop.mapred.JobClient.init(JobClient.java:470) at org.apache.hadoop.mapred.JobClient.init(JobClient.java:449) at org.apache.oozie.service.HadoopAccessorService$1.run(HadoopAccessorService.java:372) at org.apache.oozie.service.HadoopAccessorService$1.run(HadoopAccessorService.java:370) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.oozie.service.HadoopAccessorService.createJobClient(HadoopAccessorService.java:379) at org.apache.oozie.action.hadoop.JavaActionExecutor.createJobClient(JavaActionExecutor.java:1185) at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:927) ... 10 more {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6240) Hadoop client displays confusing error message
[ https://issues.apache.org/jira/browse/MAPREDUCE-6240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14306365#comment-14306365 ] Chris Douglas commented on MAPREDUCE-6240: -- I see. Would it be possible to construct a test that's both more direct for the fix and doesn't require a change to common? Possibly initialize two providers that throw different exceptions, then verify that both messages are in the output. Just looking at the patch so I don't know the context, but wouldn't the old code pass, also? Hadoop client displays confusing error message -- Key: MAPREDUCE-6240 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6240 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Reporter: Mohammad Kamrul Islam Assignee: Mohammad Kamrul Islam Attachments: MAPREDUCE-6240-gera.001.patch, MAPREDUCE-6240-gera.001.patch, MAPREDUCE-6240-gera.002.patch, MAPREDUCE-6240.1.patch Hadoop client often throws exception with java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses. This is a misleading and generic message for any cluster initialization problem. It takes a lot of debugging hours to identify the root cause. The correct error message could resolve this problem quickly. In one such instance, Oozie log showed the following exception while the root cause was CNF that Hadoop client didn't return in the exception. {noformat} JA009: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses. at org.apache.oozie.action.ActionExecutor.convertExceptionHelper(ActionExecutor.java:412) at org.apache.oozie.action.ActionExecutor.convertException(ActionExecutor.java:392) at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:979) at org.apache.oozie.action.hadoop.JavaActionExecutor.start(JavaActionExecutor.java:1134) at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:228) at org.apache.oozie.command.wf.ActionStartXCommand.execute(ActionStartXCommand.java:63) at org.apache.oozie.command.XCommand.call(XCommand.java:281) at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:323) at org.apache.oozie.service.CallableQueueService$CompositeCallable.call(CallableQueueService.java:252) at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:174) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses. at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120) at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:82) at org.apache.hadoop.mapreduce.Cluster.init(Cluster.java:75) at org.apache.hadoop.mapred.JobClient.init(JobClient.java:470) at org.apache.hadoop.mapred.JobClient.init(JobClient.java:449) at org.apache.oozie.service.HadoopAccessorService$1.run(HadoopAccessorService.java:372) at org.apache.oozie.service.HadoopAccessorService$1.run(HadoopAccessorService.java:370) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.oozie.service.HadoopAccessorService.createJobClient(HadoopAccessorService.java:379) at org.apache.oozie.action.hadoop.JavaActionExecutor.createJobClient(JavaActionExecutor.java:1185) at org.apache.oozie.action.hadoop.JavaActionExecutor.submitLauncher(JavaActionExecutor.java:927) ... 10 more {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6094) TestMRCJCFileInputFormat.testAddInputPath() fails on trunk
[ https://issues.apache.org/jira/browse/MAPREDUCE-6094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14149716#comment-14149716 ] Chris Douglas commented on MAPREDUCE-6094: -- +1 TestMRCJCFileInputFormat.testAddInputPath() fails on trunk -- Key: MAPREDUCE-6094 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6094 Project: Hadoop Map/Reduce Issue Type: Bug Components: test Reporter: Sangjin Lee Assignee: Akira AJISAKA Priority: Minor Attachments: MAPREDUCE-6094.patch {noformat} Tests run: 6, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.624 sec FAILURE! - in org.apache.hadoop.mapreduce.lib.input.TestMRCJCFileInputFormat testAddInputPath(org.apache.hadoop.mapreduce.lib.input.TestMRCJCFileInputFormat) Time elapsed: 0.886 sec ERROR! java.io.IOException: No FileSystem for scheme: s3 at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2583) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2590) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2629) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2611) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:169) at org.apache.hadoop.mapreduce.lib.input.TestMRCJCFileInputFormat.testAddInputPath(TestMRCJCFileInputFormat.java:55) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (MAPREDUCE-6103) Adding reservation APIs to resource manager delegate
[ https://issues.apache.org/jira/browse/MAPREDUCE-6103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-6103: - Comment: was deleted (was: bq. For example to run the sample pi job with queue sla and reservation **, you now can Read it as For example to run the sample pi job with queue sla and reservation *reservation_1411602647912_0001 *, you now can) Adding reservation APIs to resource manager delegate Key: MAPREDUCE-6103 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6103 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Subru Krishnan Assignee: Subru Krishnan Attachments: MR-6103.patch, MR-6103.patch YARN-1051 introduces the ReservationSystem and the corresponding APIs for create/update/delete ops. The MR resource manager delegate needs to to be updated with the APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (MAPREDUCE-6103) Adding reservation APIs to resource manager delegate
[ https://issues.apache.org/jira/browse/MAPREDUCE-6103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14147130#comment-14147130 ] Chris Douglas edited comment on MAPREDUCE-6103 at 9/25/14 12:11 AM: Thanks [~chris.douglas] for taking a look. I realized that there was no easy way to specify the reservation id so added support for it in YARNRunner so now users can specify reservation id just like they currently do queue names. For example to run the sample _pi_ job with queue _sla_ and reservation *reservation_1411602647912_0001*, you now can {noformat} hadoop jar hadoop-3.0.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-SNAPSHOT.jar pi \ -Dmapreduce.job.queuename=sla \ -Dmapreduce.job.reservation.id=reservation_1411602647912_0001 \ -Dyarn.app.mapreduce.am.resource.mb=1024 \ 3 10 {noformat} was (Author: subru): Thanks [~chris.douglas] for taking a look. I realized that there was no easy way to specify the reservation id so added support for it in YARNRunner so now users can specify reservation id just like they currently do queue names. For example to run the sample _pi_ job with queue _sla_ and reservation **, you now can hadoop jar hadoop-3.0.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-SNAPSHOT.jar pi -Dmapreduce.job.queuename=sla -Dmapreduce.job.reservation.id=reservation_1411602647912_0001 -Dyarn.app.mapreduce.am.resource.mb=1024 3 10 Adding reservation APIs to resource manager delegate Key: MAPREDUCE-6103 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6103 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Subru Krishnan Assignee: Subru Krishnan Attachments: MR-6103.patch, MR-6103.patch YARN-1051 introduces the ReservationSystem and the corresponding APIs for create/update/delete ops. The MR resource manager delegate needs to to be updated with the APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6103) Adding reservation APIs to resource manager delegate
[ https://issues.apache.org/jira/browse/MAPREDUCE-6103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14147195#comment-14147195 ] Chris Douglas commented on MAPREDUCE-6103: -- The {{YARNRunner}} changes look good, though discussing with [~subru] there are a few conditions where {{ReservationId::parseReservationId()}} can be malformed, but return {{null}}. It might be clearer if this method were to throw an exception on all input that can't be parsed into a {{ReservationId}}, including null. +1 overall, though Adding reservation APIs to resource manager delegate Key: MAPREDUCE-6103 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6103 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Subru Krishnan Assignee: Subru Krishnan Attachments: MR-6103.patch, MR-6103.patch YARN-1051 introduces the ReservationSystem and the corresponding APIs for create/update/delete ops. The MR resource manager delegate needs to to be updated with the APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6103) Adding reservation APIs to resource manager delegate
[ https://issues.apache.org/jira/browse/MAPREDUCE-6103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144171#comment-14144171 ] Chris Douglas commented on MAPREDUCE-6103: -- +1 straightforward update to the branch Adding reservation APIs to resource manager delegate Key: MAPREDUCE-6103 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6103 Project: Hadoop Map/Reduce Issue Type: Improvement Reporter: Subru Krishnan Assignee: Subru Krishnan Attachments: MR-6103.patch YARN-1051 introduces the ReservationSystem and the corresponding APIs for create/update/delete ops. The MR resource manager delegate needs to to be updated with the APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6063) In sortAndSpill of MapTask.java, size is calculated wrongly when bufend bufstart.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14121968#comment-14121968 ] Chris Douglas commented on MAPREDUCE-6063: -- bq. This issue is also in MR1 (branch-1) . I attached a patch MAPREDUCE-6063.branch-1.patch for branch-1. Done. Thanks again for the fix In sortAndSpill of MapTask.java, size is calculated wrongly when bufend bufstart. --- Key: MAPREDUCE-6063 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6063 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1, mrv2 Reporter: zhihai xu Assignee: zhihai xu Fix For: 3.0.0, 2.6.0 Attachments: MAPREDUCE-6063.000.patch, MAPREDUCE-6063.branch-1.patch In sortAndSpill of MapTask.java, size is calculated wrongly when bufend bufstart. we should change (bufvoid - bufend) + bufstart to (bufvoid - bufstart) + bufend. Should change {code} long size = (bufend = bufstart ? bufend - bufstart : (bufvoid - bufend) + bufstart) + partitions * APPROX_HEADER_LENGTH; {code} to: {code} long size = (bufend = bufstart ? bufend - bufstart : (bufvoid - bufstart) + bufend) + partitions * APPROX_HEADER_LENGTH; {code} It is because when wraparound happen (bufend bufstart) , the size should bufvoid - bufstart (bigger one) + bufend(small one). You can find similar code implementation in MapTask.java: {code} mapOutputByteCounter.increment(valend = keystart ? valend - keystart : (bufvoid - keystart) + valend); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6063) In sortAndSpill of MapTask.java, size is calculated wrongly when bufend bufstart.
[ https://issues.apache.org/jira/browse/MAPREDUCE-6063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-6063: - Resolution: Fixed Fix Version/s: 2.6.0 3.0.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) +1 Good catch I committed this to trunk and branch-2 In sortAndSpill of MapTask.java, size is calculated wrongly when bufend bufstart. --- Key: MAPREDUCE-6063 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6063 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv1, mrv2 Reporter: zhihai xu Assignee: zhihai xu Fix For: 3.0.0, 2.6.0 Attachments: MAPREDUCE-6063.000.patch In sortAndSpill of MapTask.java, size is calculated wrongly when bufend bufstart. we should change (bufvoid - bufend) + bufstart to (bufvoid - bufstart) + bufend. Should change {code} long size = (bufend = bufstart ? bufend - bufstart : (bufvoid - bufend) + bufstart) + partitions * APPROX_HEADER_LENGTH; {code} to: {code} long size = (bufend = bufstart ? bufend - bufstart : (bufvoid - bufstart) + bufend) + partitions * APPROX_HEADER_LENGTH; {code} It is because when wraparound happen (bufend bufstart) , the size should bufvoid - bufstart (bigger one) + bufend(small one). You can find similar code implementation in MapTask.java: {code} mapOutputByteCounter.increment(valend = keystart ? valend - keystart : (bufvoid - keystart) + valend); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6051) Fix typos in log messages
[ https://issues.apache.org/jira/browse/MAPREDUCE-6051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-6051: - Resolution: Fixed Fix Version/s: 2.6.0 3.0.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) +1 I committed this. Thanks, Ray Fix typos in log messages - Key: MAPREDUCE-6051 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6051 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 2.5.0 Reporter: Ray Chiang Assignee: Ray Chiang Priority: Trivial Labels: newbie Fix For: 3.0.0, 2.6.0 Attachments: MAPREDUCE-6051-01.patch There are a bunch of typos in log messages. HADOOP-10946 was initially created, but may have failed due to being in multiple components. Try fixing typos on a per-component basis. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5974) Allow map output collector fallback
[ https://issues.apache.org/jira/browse/MAPREDUCE-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14099345#comment-14099345 ] Chris Douglas commented on MAPREDUCE-5974: -- Sorry, I didn't mean to hold this up. +0 Allow map output collector fallback --- Key: MAPREDUCE-5974 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5974 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: task Affects Versions: 2.6.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: mapreduce-5974.txt Currently we only allow specifying a single MapOutputCollector implementation class in a job. It would be nice to allow a comma-separated list of classes: we should try each collector implementation in the user-specified order until we find one that can be successfully instantiated and initted. This is useful for cases where a particular optimized collector implementation cannot operate on all key/value types, or requires native code. The cluster administrator can configure the cluster to try to use the optimized collector and fall back to the default collector. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5974) Allow map output collector fallback
[ https://issues.apache.org/jira/browse/MAPREDUCE-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14071923#comment-14071923 ] Chris Douglas commented on MAPREDUCE-5974: -- bq. Implementing this inside the native collector init() method itself might be messy – you'd have to essentially write a wrapper collector and have every method delegate to the real implementation. I would hope that the delegation would get devirtualized and inlined, but not certain about that. I hadn't considered that; I'm not sure either. I'm mostly ambivalent about the alternatives, assuming the majority of jobs will configure a single collector. There's a case to be made for throwing the original exception in that case, but it's not worth much hand-wringing. Allow map output collector fallback --- Key: MAPREDUCE-5974 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5974 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: task Affects Versions: 2.6.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: mapreduce-5974.txt Currently we only allow specifying a single MapOutputCollector implementation class in a job. It would be nice to allow a comma-separated list of classes: we should try each collector implementation in the user-specified order until we find one that can be successfully instantiated and initted. This is useful for cases where a particular optimized collector implementation cannot operate on all key/value types, or requires native code. The cluster administrator can configure the cluster to try to use the optimized collector and fall back to the default collector. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5974) Allow map output collector fallback
[ https://issues.apache.org/jira/browse/MAPREDUCE-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14071174#comment-14071174 ] Chris Douglas commented on MAPREDUCE-5974: -- bq. Doing fallback as the records are emitted would be pretty neat, but may also be somewhat difficult. [snip] *nod* Fair enough, though if each MapTask is making independent decisions about the collector, they still need to agree on the format for the shuffle. Spilling one collector to disk and changing strategies should be compatible, assuming there isn't a different format for intermediate spills. But yeah, this is very abstract, given the use cases we have. If the goal is to support a fallback collector when native libs aren't available; given the dependency on intermediate format, should the swap be internal to the native collector, even in init? If the interface were like the serialization, then one might use the keytype, etc. to pick the most-appropriate collector. As failover, I'm struggling to come up with a case that's not covered by making this an internal detail of the native collector. Allow map output collector fallback --- Key: MAPREDUCE-5974 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5974 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: task Affects Versions: 2.6.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: mapreduce-5974.txt Currently we only allow specifying a single MapOutputCollector implementation class in a job. It would be nice to allow a comma-separated list of classes: we should try each collector implementation in the user-specified order until we find one that can be successfully instantiated and initted. This is useful for cases where a particular optimized collector implementation cannot operate on all key/value types, or requires native code. The cluster administrator can configure the cluster to try to use the optimized collector and fall back to the default collector. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5974) Allow map output collector fallback
[ https://issues.apache.org/jira/browse/MAPREDUCE-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065752#comment-14065752 ] Chris Douglas commented on MAPREDUCE-5974: -- Could this be equivalently implemented as a composite collector using the existing plugin architecture? Trading off collector implementations at runtime is a cool idea, but if the criteria are available to {{init}} then they're also available during submission (excluding availability of local dependencies or arch restrictions in heterogeneous clusters). Changing strategies during the collection phase based on the records emitted seems to have equivalent or better potential, and is covered by the composite strategy, also. Allow map output collector fallback --- Key: MAPREDUCE-5974 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5974 Project: Hadoop Map/Reduce Issue Type: Sub-task Components: task Affects Versions: 2.6.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: mapreduce-5974.txt Currently we only allow specifying a single MapOutputCollector implementation class in a job. It would be nice to allow a comma-separated list of classes: we should try each collector implementation in the user-specified order until we find one that can be successfully instantiated and initted. This is useful for cases where a particular optimized collector implementation cannot operate on all key/value types, or requires native code. The cluster administrator can configure the cluster to try to use the optimized collector and fall back to the default collector. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem
[ https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14058364#comment-14058364 ] Chris Douglas commented on MAPREDUCE-5890: -- Yes; thanks [~asuresh] for your patience in seeing this through. Support for encrypting Intermediate data and spills in local filesystem --- Key: MAPREDUCE-5890 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890 Project: Hadoop Map/Reduce Issue Type: New Feature Components: security Affects Versions: 2.4.0 Reporter: Alejandro Abdelnur Assignee: Arun Suresh Labels: encryption Fix For: fs-encryption Attachments: MAPREDUCE-5890.10.patch, MAPREDUCE-5890.11.patch, MAPREDUCE-5890.12.patch, MAPREDUCE-5890.13.patch, MAPREDUCE-5890.14.patch, MAPREDUCE-5890.15.patch, MAPREDUCE-5890.3.patch, MAPREDUCE-5890.4.patch, MAPREDUCE-5890.5.patch, MAPREDUCE-5890.6.patch, MAPREDUCE-5890.7.patch, MAPREDUCE-5890.8.patch, MAPREDUCE-5890.9.patch, org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt, syslog.tar.gz For some sensitive data, encryption while in flight (network) is not sufficient, it is required that while at rest it should be encrypted. HADOOP-10150 HDFS-6134 bring encryption at rest for data in filesystem using Hadoop FileSystem API. MapReduce intermediate data and spills should also be encrypted while at rest. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem
[ https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14056631#comment-14056631 ] Chris Douglas commented on MAPREDUCE-5890: -- I was thinking {{o.a.h.mapred}}, with other internal classes. Support for encrypting Intermediate data and spills in local filesystem --- Key: MAPREDUCE-5890 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890 Project: Hadoop Map/Reduce Issue Type: New Feature Components: security Affects Versions: 2.4.0 Reporter: Alejandro Abdelnur Assignee: Arun Suresh Labels: encryption Attachments: MAPREDUCE-5890.10.patch, MAPREDUCE-5890.11.patch, MAPREDUCE-5890.12.patch, MAPREDUCE-5890.13.patch, MAPREDUCE-5890.3.patch, MAPREDUCE-5890.4.patch, MAPREDUCE-5890.5.patch, MAPREDUCE-5890.6.patch, MAPREDUCE-5890.7.patch, MAPREDUCE-5890.8.patch, MAPREDUCE-5890.9.patch, org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt, syslog.tar.gz For some sensitive data, encryption while in flight (network) is not sufficient, it is required that while at rest it should be encrypted. HADOOP-10150 HDFS-6134 bring encryption at rest for data in filesystem using Hadoop FileSystem API. MapReduce intermediate data and spills should also be encrypted while at rest. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem
[ https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14055613#comment-14055613 ] Chris Douglas commented on MAPREDUCE-5890: -- Yes, I'm OK with the current patch. This approach won't scale to another feature, but it can be preserved in a refactoring. My only remaining ask (fine to add during commit) is that {{CryptoUtils}} be annotated with {{@Private}} and {{@Unstable}}, so it's clearly marked as an implementation detail. If it could be package-private that would be even better, though I haven't checked to see if there's anything else in the {{o.a.h.mapreduce.task.crypto}} package. Support for encrypting Intermediate data and spills in local filesystem --- Key: MAPREDUCE-5890 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890 Project: Hadoop Map/Reduce Issue Type: New Feature Components: security Affects Versions: 2.4.0 Reporter: Alejandro Abdelnur Assignee: Arun Suresh Labels: encryption Attachments: MAPREDUCE-5890.10.patch, MAPREDUCE-5890.11.patch, MAPREDUCE-5890.12.patch, MAPREDUCE-5890.3.patch, MAPREDUCE-5890.4.patch, MAPREDUCE-5890.5.patch, MAPREDUCE-5890.6.patch, MAPREDUCE-5890.7.patch, MAPREDUCE-5890.8.patch, MAPREDUCE-5890.9.patch, org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt, syslog.tar.gz For some sensitive data, encryption while in flight (network) is not sufficient, it is required that while at rest it should be encrypted. HADOOP-10150 HDFS-6134 bring encryption at rest for data in filesystem using Hadoop FileSystem API. MapReduce intermediate data and spills should also be encrypted while at rest. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem
[ https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14055718#comment-14055718 ] Chris Douglas commented on MAPREDUCE-5890: -- Sorry, I meant that if {{o.a.h.mapreduce.task.crypto}} only has {{CryptoUtils}} in it, then maybe the new package isn't necessary. Support for encrypting Intermediate data and spills in local filesystem --- Key: MAPREDUCE-5890 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890 Project: Hadoop Map/Reduce Issue Type: New Feature Components: security Affects Versions: 2.4.0 Reporter: Alejandro Abdelnur Assignee: Arun Suresh Labels: encryption Attachments: MAPREDUCE-5890.10.patch, MAPREDUCE-5890.11.patch, MAPREDUCE-5890.12.patch, MAPREDUCE-5890.13.patch, MAPREDUCE-5890.3.patch, MAPREDUCE-5890.4.patch, MAPREDUCE-5890.5.patch, MAPREDUCE-5890.6.patch, MAPREDUCE-5890.7.patch, MAPREDUCE-5890.8.patch, MAPREDUCE-5890.9.patch, org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt, syslog.tar.gz For some sensitive data, encryption while in flight (network) is not sufficient, it is required that while at rest it should be encrypted. HADOOP-10150 HDFS-6134 bring encryption at rest for data in filesystem using Hadoop FileSystem API. MapReduce intermediate data and spills should also be encrypted while at rest. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem
[ https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14053251#comment-14053251 ] Chris Douglas commented on MAPREDUCE-5890: -- OK... untangling the abstractions can be deferred. The current patch spreads the feature across the code in a way that's not ideal to maintain, but it addresses all the functional feedback by moving the IV inline. Thanks [~asuresh] for all the iterations on this. Support for encrypting Intermediate data and spills in local filesystem --- Key: MAPREDUCE-5890 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890 Project: Hadoop Map/Reduce Issue Type: New Feature Components: security Affects Versions: 2.4.0 Reporter: Alejandro Abdelnur Assignee: Arun Suresh Labels: encryption Attachments: MAPREDUCE-5890.10.patch, MAPREDUCE-5890.11.patch, MAPREDUCE-5890.12.patch, MAPREDUCE-5890.3.patch, MAPREDUCE-5890.4.patch, MAPREDUCE-5890.5.patch, MAPREDUCE-5890.6.patch, MAPREDUCE-5890.7.patch, MAPREDUCE-5890.8.patch, MAPREDUCE-5890.9.patch, org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt, syslog.tar.gz For some sensitive data, encryption while in flight (network) is not sufficient, it is required that while at rest it should be encrypted. HADOOP-10150 HDFS-6134 bring encryption at rest for data in filesystem using Hadoop FileSystem API. MapReduce intermediate data and spills should also be encrypted while at rest. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-2841) Task level native optimization
[ https://issues.apache.org/jira/browse/MAPREDUCE-2841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047804#comment-14047804 ] Chris Douglas commented on MAPREDUCE-2841: -- If [~clockfly] is close to a patch, that would make the scope concrete. It sounds like there are more than zero changes to the framework (i.e., the MAPREDUCE-2454 API is insufficient), but fewer than a full replacement of the {{Task}} code with C\+\+. Would it be difficult to produce and post a patch to ground the discussion? Task level native optimization -- Key: MAPREDUCE-2841 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2841 Project: Hadoop Map/Reduce Issue Type: Improvement Components: task Environment: x86-64 Linux/Unix Reporter: Binglin Chang Assignee: Sean Zhong Attachments: DESIGN.html, MAPREDUCE-2841.v1.patch, MAPREDUCE-2841.v2.patch, dualpivot-0.patch, dualpivotv20-0.patch, fb-shuffle.patch I'm recently working on native optimization for MapTask based on JNI. The basic idea is that, add a NativeMapOutputCollector to handle k/v pairs emitted by mapper, therefore sort, spill, IFile serialization can all be done in native code, preliminary test(on Xeon E5410, jdk6u24) showed promising results: 1. Sort is about 3x-10x as fast as java(only binary string compare is supported) 2. IFile serialization speed is about 3x of java, about 500MB/s, if hardware CRC32C is used, things can get much faster(1G/ 3. Merge code is not completed yet, so the test use enough io.sort.mb to prevent mid-spill This leads to a total speed up of 2x~3x for the whole MapTask, if IdentityMapper(mapper does nothing) is used There are limitations of course, currently only Text and BytesWritable is supported, and I have not think through many things right now, such as how to support map side combine. I had some discussion with somebody familiar with hive, it seems that these limitations won't be much problem for Hive to benefit from those optimizations, at least. Advices or discussions about improving compatibility are most welcome:) Currently NativeMapOutputCollector has a static method called canEnable(), which checks if key/value type, comparator type, combiner are all compatible, then MapTask can choose to enable NativeMapOutputCollector. This is only a preliminary test, more work need to be done. I expect better final results, and I believe similar optimization can be adopt to reduce task and shuffle too. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem
[ https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047206#comment-14047206 ] Chris Douglas commented on MAPREDUCE-5890: -- The current patch still injects the IV and length into the stream, then fixes up the offsets. If the IV were part of the {{IFile}} format, then this would not be necessary. If this format were ever changed, then someone would need to go back and fix all this arithmetic or take its framing as a requirement for any intermediate data format. Am I missing why it's easier to wrap/unwrap streams? Support for encrypting Intermediate data and spills in local filesystem --- Key: MAPREDUCE-5890 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890 Project: Hadoop Map/Reduce Issue Type: New Feature Components: security Affects Versions: 2.4.0 Reporter: Alejandro Abdelnur Assignee: Arun Suresh Labels: encryption Attachments: MAPREDUCE-5890.10.patch, MAPREDUCE-5890.11.patch, MAPREDUCE-5890.12.patch, MAPREDUCE-5890.3.patch, MAPREDUCE-5890.4.patch, MAPREDUCE-5890.5.patch, MAPREDUCE-5890.6.patch, MAPREDUCE-5890.7.patch, MAPREDUCE-5890.8.patch, MAPREDUCE-5890.9.patch, org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt, syslog.tar.gz For some sensitive data, encryption while in flight (network) is not sufficient, it is required that while at rest it should be encrypted. HADOOP-10150 HDFS-6134 bring encryption at rest for data in filesystem using Hadoop FileSystem API. MapReduce intermediate data and spills should also be encrypted while at rest. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem
[ https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14044144#comment-14044144 ] Chris Douglas commented on MAPREDUCE-5890: -- bq. I am trying to trade off that complexity in software with an admin prerequisite to install one or few disks/partitions that selective users can chose to use via their job-configuration. This would work also, but (Alejandro/Arun, correct me if this is mistaken) encrypted intermediate data is probably motivated by compliance regimes that require it. An audit would need to verify that every job used the encrypted local dirs, that those mounts were configured to encrypt when the job ran, etc. One would also need to do capacity planning for encrypted vs unencrypted space across nodes, possibly even federating jobs. It's workable, but kind of ad hoc. In contrast, verifying that the MR job set this switch is straightforward and has no ops overhead. I have no idea whether it's common to combine these workloads, but this would make it easier. It's not so inconsistent to add this to MapReduce... frameworks are currently responsible for intra-application security, particularly RPC. If there's a general mechanism then this should use it. If that layer were developed, we'd want MapReduce to use it instead of its own, custom encryption. Today, the alternative is to develop that general-purpose layer. To reduce the overhead, this could use the plugin mechanism in MAPREDUCE-2454 because this no longer requires any changes to the {{ShuffleHandler}} or index formats. I haven't looked at the latest patch, but if the {{IFile}} format omits the 16 byte IV for each spill, then the only overhead it's adding is for the checks in the config (most of which can be pulled into the buffer init and cached). Has this been tested in a cluster? Would the perf hit be simple to measure? Support for encrypting Intermediate data and spills in local filesystem --- Key: MAPREDUCE-5890 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890 Project: Hadoop Map/Reduce Issue Type: New Feature Components: security Affects Versions: 2.4.0 Reporter: Alejandro Abdelnur Assignee: Arun Suresh Labels: encryption Attachments: MAPREDUCE-5890.3.patch, MAPREDUCE-5890.4.patch, MAPREDUCE-5890.5.patch, MAPREDUCE-5890.6.patch, MAPREDUCE-5890.7.patch, MAPREDUCE-5890.8.patch, MAPREDUCE-5890.9.patch, org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt, syslog.tar.gz For some sensitive data, encryption while in flight (network) is not sufficient, it is required that while at rest it should be encrypted. HADOOP-10150 HDFS-6134 bring encryption at rest for data in filesystem using Hadoop FileSystem API. MapReduce intermediate data and spills should also be encrypted while at rest. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem
[ https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041955#comment-14041955 ] Chris Douglas commented on MAPREDUCE-5890: -- Thanks for updating the patch, Arun. Adding seeks for serving map output would be regrettable. Few nits: * unused, private static field {{counter}} added to {{Fetcher}} * unit test should use JUnit4 annotations rather than extending {{TestCase}} * {noformat} + InputStream is = input; + is = CryptoUtils.wrap(jobConf, iv, is, offset, compressedLength); {noformat} is equivalently {{InputStream is = CryptoUtils.wrap(jobConf, iv, input, offset, compressedLength);}} * While not terribly expensive, there are a lot of redundant lookups for the encrypted shuffle config parameter. * There are many counterexamples, but running a MR job is a heavy way to test this. * To be sure I understand the IV logic, it's injected in the stream as a prefix to the segment during a merge, but is part of the index record during a spill. Is that accurate? Adding a few comments calling this out would be appreciated, particularly since it's hard to spot in the merge. * Has this been tested on spills with intermediate merges? With more than a single reduce? Looking at the patch, it looks like it creates the stream with the IV, it doesn't reset the IV for each segment (apologies, I haven't tried applying it, so I might just be misreading the context). * Since the IV size is hard-coded in {{CryptoUtils}} to 16 bytes (and part of the {{IndexRecord}} format), it should probably fail if the {{CryptoCodec::getAlgorithmBlockSize}} returns anything else. Much of the logic in here is internal to MapReduce, so it would be unfair to ask that this create better abstractions than what exists, but the IV handling is pretty ad hoc. Other improvements under consideration- particularly native implementations and other frameworks building on the {{ShuffleHandler}}- may rely on this code, as well as older versions of MapReduce that will fail without deploying two versions of the ShuffleHandler. To make it backwards compatible, the IV can be part of each {{IFile}} segment (requiring no changes to {{ShuffleHandler}} or the {{SpillRecord}}/{{IndexRecord}} format), or the IVs can be added to the end of the {{SpillRecord}}. In the latter case, the {{Fetcher}} will need to request that the alternate interpretation by including a header; old versions will get the existing interpretation of the {{SpillRecord}}. Support for encrypting Intermediate data and spills in local filesystem --- Key: MAPREDUCE-5890 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890 Project: Hadoop Map/Reduce Issue Type: New Feature Components: security Affects Versions: 2.4.0 Reporter: Alejandro Abdelnur Assignee: Arun Suresh Labels: encryption Attachments: MAPREDUCE-5890.3.patch, MAPREDUCE-5890.4.patch, org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt, syslog.tar.gz For some sensitive data, encryption while in flight (network) is not sufficient, it is required that while at rest it should be encrypted. HADOOP-10150 HDFS-6134 bring encryption at rest for data in filesystem using Hadoop FileSystem API. MapReduce intermediate data and spills should also be encrypted while at rest. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem
[ https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14042451#comment-14042451 ] Chris Douglas commented on MAPREDUCE-5890: -- The repeated config lookup and unit test are not blockers, but they're places where the patch could be improved. bq. The ShuffleHandler is a private class of MapReduce, if other frameworks use it, it is at their own risk. Every version of the patch has broken compatibility with existing versions of _MapReduce_. Other frameworks may rely on functionality we don't guarantee, but breaking them is avoidable. bq. Regarding adding new abstractions, I’m OK if they are small and non-intrusive. I just don’t want to send Arun chasing a goose a wild goose and when he finally does we backtrack because the changes are too pervasive in the core of MapReduce Adding a new file just to pass 16 bytes to the {{ShuffleHandler}} will harm performance; breaking backwards compatibility is not OK, and not necessary for this feature. Aside from those, I've asked for some formatting fixes and that the code not return an IV that doesn't match the hard-coded 16-byte size. These are reasonable, limited requests and bug fixes, and I've suggested two possible implementations that would address them. These would be blockers during the merge, too. Support for encrypting Intermediate data and spills in local filesystem --- Key: MAPREDUCE-5890 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890 Project: Hadoop Map/Reduce Issue Type: New Feature Components: security Affects Versions: 2.4.0 Reporter: Alejandro Abdelnur Assignee: Arun Suresh Labels: encryption Attachments: MAPREDUCE-5890.3.patch, MAPREDUCE-5890.4.patch, org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt, syslog.tar.gz For some sensitive data, encryption while in flight (network) is not sufficient, it is required that while at rest it should be encrypted. HADOOP-10150 HDFS-6134 bring encryption at rest for data in filesystem using Hadoop FileSystem API. MapReduce intermediate data and spills should also be encrypted while at rest. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5890) Support for encrypting Intermediate data and spills in local filesystem
[ https://issues.apache.org/jira/browse/MAPREDUCE-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040237#comment-14040237 ] Chris Douglas commented on MAPREDUCE-5890: -- bq. Given that current abstraction does not provide a clean cut to hide this within the IFile without a significant refactoring throughout the code, I think is the least evil. It's expedient, but this code is already difficult to follow. Arun, would you mind making an attempt at refactoring? The current code doesn't have an existing abstraction for this, but writing a separate file for every spill just to store a few bytes of IV doesn't seem like a reasonable tradeoff in either performance or complexity. Adding a metadata block to the {{IFile}} segment or adding the IV to the spill index (to be added in the header, as in the current patch) would both work. A couple nits: * In {{OnDiskMapOutput}}, the {{disk}} field can stay final, since the only assignment is in the cstr * Minor indentation/braces issue in {{MapTask}}: {{noformat}} + if (CryptoUtils.isShuffleEncrypted(job)) + CryptoUtils.deleteIVFile(rfs, filename[i]); {{noformat}} Minor nit: please leave old patches attached to avoid orphaning the discussion around them. Support for encrypting Intermediate data and spills in local filesystem --- Key: MAPREDUCE-5890 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5890 Project: Hadoop Map/Reduce Issue Type: New Feature Components: security Affects Versions: 2.4.0 Reporter: Alejandro Abdelnur Assignee: Arun Suresh Labels: encryption Attachments: MAPREDUCE-5890.3.patch, org.apache.hadoop.mapred.TestMRIntermediateDataEncryption-output.txt, syslog.tar.gz For some sensitive data, encryption while in flight (network) is not sufficient, it is required that while at rest it should be encrypted. HADOOP-10150 HDFS-6134 bring encryption at rest for data in filesystem using Hadoop FileSystem API. MapReduce intermediate data and spills should also be encrypted while at rest. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5912) Task.calculateOutputSize does not handle Windows files after MAPREDUCE-5196
[ https://issues.apache.org/jira/browse/MAPREDUCE-5912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029625#comment-14029625 ] Chris Douglas commented on MAPREDUCE-5912: -- bq. If in the future we want to revisit the idea of map outputs going somewhere different than the local file system, then I think we'd need a different patch. I think we'd want to make sure that the map output's Path instance contains an explicit scheme, so that the code here doesn't need to assume local vs. default vs. something else. Agreed. MAPREDUCE-5269 changed all {{Path}} instances returned from {{YARNOutputFiles}} to be fully qualified, but the two changes were separated. +1 for committing the workaround until HADOOP-10663 is ready. Task.calculateOutputSize does not handle Windows files after MAPREDUCE-5196 --- Key: MAPREDUCE-5912 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5912 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Affects Versions: 3.0.0 Reporter: Remus Rusanu Assignee: Remus Rusanu Fix For: 3.0.0 Attachments: MAPREDUCE-5912.1.patch {code} @@ -1098,8 +1120,8 @@ private long calculateOutputSize() throws IOException { if (isMapTask() conf.getNumReduceTasks() 0) { try { Path mapOutput = mapOutputFile.getOutputFile(); -FileSystem localFS = FileSystem.getLocal(conf); -return localFS.getFileStatus(mapOutput).getLen(); +FileSystem fs = mapOutput.getFileSystem(conf); +return fs.getFileStatus(mapOutput).getLen(); } catch (IOException e) { LOG.warn (Could not find output size , e); } {code} causes Windows local output files to be routed through HDFS: {code} 2014-06-02 00:14:53,891 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.IllegalArgumentException: Pathname /c:/Hadoop/Data/Hadoop/local/usercache/HadoopUser/appcache/application_1401693085139_0001/output/attempt_1401693085139_0001_m_00_0/file.out from c:/Hadoop/Data/Hadoop/local/usercache/HadoopUser/appcache/application_1401693085139_0001/output/attempt_1401693085139_0001_m_00_0/file.out is not a valid DFS filename. at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:187) at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:101) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1024) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1020) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1020) at org.apache.hadoop.mapred.Task.calculateOutputSize(Task.java:1124) at org.apache.hadoop.mapred.Task.sendLastUpdate(Task.java:1102) at org.apache.hadoop.mapred.Task.done(Task.java:1048) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5912) Task.calculateOutputSize does not handle Windows files after MAPREDUCE-5196
[ https://issues.apache.org/jira/browse/MAPREDUCE-5912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018246#comment-14018246 ] Chris Douglas commented on MAPREDUCE-5912: -- As you identified in HADOOP-10663, returning the default filesystem for local paths is not correct. Task.calculateOutputSize does not handle Windows files after MAPREDUCE-5196 --- Key: MAPREDUCE-5912 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5912 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 3.0.0 Reporter: Remus Rusanu Assignee: Remus Rusanu Fix For: 3.0.0 Attachments: MAPREDUCE-5912.1.patch {code} @@ -1098,8 +1120,8 @@ private long calculateOutputSize() throws IOException { if (isMapTask() conf.getNumReduceTasks() 0) { try { Path mapOutput = mapOutputFile.getOutputFile(); -FileSystem localFS = FileSystem.getLocal(conf); -return localFS.getFileStatus(mapOutput).getLen(); +FileSystem fs = mapOutput.getFileSystem(conf); +return fs.getFileStatus(mapOutput).getLen(); } catch (IOException e) { LOG.warn (Could not find output size , e); } {code} causes Windows local output files to be routed through HDFS: {code} 2014-06-02 00:14:53,891 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.IllegalArgumentException: Pathname /c:/Hadoop/Data/Hadoop/local/usercache/HadoopUser/appcache/application_1401693085139_0001/output/attempt_1401693085139_0001_m_00_0/file.out from c:/Hadoop/Data/Hadoop/local/usercache/HadoopUser/appcache/application_1401693085139_0001/output/attempt_1401693085139_0001_m_00_0/file.out is not a valid DFS filename. at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:187) at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:101) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1024) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1020) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1020) at org.apache.hadoop.mapred.Task.calculateOutputSize(Task.java:1124) at org.apache.hadoop.mapred.Task.sendLastUpdate(Task.java:1102) at org.apache.hadoop.mapred.Task.done(Task.java:1048) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5821) IFile merge allocates new byte array for every value
[ https://issues.apache.org/jira/browse/MAPREDUCE-5821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-5821: - Resolution: Fixed Fix Version/s: 2.4.1 2.5.0 3.0.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I committed this. Thanks, Todd IFile merge allocates new byte array for every value Key: MAPREDUCE-5821 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5821 Project: Hadoop Map/Reduce Issue Type: Bug Components: performance, task Affects Versions: 2.4.1 Reporter: Todd Lipcon Assignee: Todd Lipcon Fix For: 3.0.0, 2.5.0, 2.4.1 Attachments: after-patch.png, before-patch.png, mapreduce-5821.txt, mapreduce-5821.txt I wrote a standalone benchmark of the MapOutputBuffer and found that it did a lot of allocations during the merge phase. After looking at an allocation profile, I found that IFile.Reader.nextRawValue() would always allocate a new byte array for every value, so the allocation rate goes way up during the merge phase of the mapper. I imagine this also affects the reducer input, though I didn't profile that. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5821) IFile merge allocates new byte array for every value
[ https://issues.apache.org/jira/browse/MAPREDUCE-5821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967652#comment-13967652 ] Chris Douglas commented on MAPREDUCE-5821: -- +1 This looks like the intended behavior from HADOOP-5494 Good catch IFile merge allocates new byte array for every value Key: MAPREDUCE-5821 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5821 Project: Hadoop Map/Reduce Issue Type: Bug Components: performance, task Affects Versions: 2.4.1 Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: after-patch.png, before-patch.png, mapreduce-5821.txt, mapreduce-5821.txt I wrote a standalone benchmark of the MapOutputBuffer and found that it did a lot of allocations during the merge phase. After looking at an allocation profile, I found that IFile.Reader.nextRawValue() would always allocate a new byte array for every value, so the allocation rate goes way up during the merge phase of the mapper. I imagine this also affects the reducer input, though I didn't profile that. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5821) IFile merge allocates new byte array for every value
[ https://issues.apache.org/jira/browse/MAPREDUCE-5821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963651#comment-13963651 ] Chris Douglas commented on MAPREDUCE-5821: -- Sure, I can take a look later this week if it can wait. IFile merge allocates new byte array for every value Key: MAPREDUCE-5821 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5821 Project: Hadoop Map/Reduce Issue Type: Bug Components: performance, task Affects Versions: 2.4.1 Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: after-patch.png, before-patch.png, mapreduce-5821.txt, mapreduce-5821.txt I wrote a standalone benchmark of the MapOutputBuffer and found that it did a lot of allocations during the merge phase. After looking at an allocation profile, I found that IFile.Reader.nextRawValue() would always allocate a new byte array for every value, so the allocation rate goes way up during the merge phase of the mapper. I imagine this also affects the reducer input, though I didn't profile that. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5717) Task pings are interpreted as task progress
[ https://issues.apache.org/jira/browse/MAPREDUCE-5717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868942#comment-13868942 ] Chris Douglas commented on MAPREDUCE-5717: -- +1 Thanks for catching this, Jason Task pings are interpreted as task progress --- Key: MAPREDUCE-5717 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5717 Project: Hadoop Map/Reduce Issue Type: Bug Affects Versions: 3.0.0 Reporter: Jason Lowe Assignee: Jason Lowe Attachments: MAPREDUCE-5717.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (MAPREDUCE-5196) CheckpointAMPreemptionPolicy implements preemption in MR AM via checkpointing
[ https://issues.apache.org/jira/browse/MAPREDUCE-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-5196: - Resolution: Fixed Fix Version/s: 3.0.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I committed this. Thanks, Carlo CheckpointAMPreemptionPolicy implements preemption in MR AM via checkpointing -- Key: MAPREDUCE-5196 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5196 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mr-am, mrv2 Reporter: Carlo Curino Assignee: Carlo Curino Fix For: 3.0.0 Attachments: MAPREDUCE-5196.1.patch, MAPREDUCE-5196.2.patch, MAPREDUCE-5196.3.patch, MAPREDUCE-5196.patch, MAPREDUCE-5196.patch This JIRA tracks a checkpoint-based AM preemption policy. The policy handles propagation of the preemption requests received from the RM to the appropriate tasks, and bookeeping of checkpoints. Actual checkpointing of the task state is handled in upcoming JIRAs. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (MAPREDUCE-5196) CheckpointAMPreemptionPolicy implements preemption in MR AM via checkpointing
[ https://issues.apache.org/jira/browse/MAPREDUCE-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-5196: - Status: Open (was: Patch Available) CheckpointAMPreemptionPolicy implements preemption in MR AM via checkpointing -- Key: MAPREDUCE-5196 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5196 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mr-am, mrv2 Reporter: Carlo Curino Assignee: Carlo Curino Attachments: MAPREDUCE-5196.1.patch, MAPREDUCE-5196.2.patch, MAPREDUCE-5196.patch, MAPREDUCE-5196.patch This JIRA tracks a checkpoint-based AM preemption policy. The policy handles propagation of the preemption requests received from the RM to the appropriate tasks, and bookeeping of checkpoints. Actual checkpointing of the task state is handled in upcoming JIRAs. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (MAPREDUCE-5196) CheckpointAMPreemptionPolicy implements preemption in MR AM via checkpointing
[ https://issues.apache.org/jira/browse/MAPREDUCE-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-5196: - Attachment: MAPREDUCE-5196.3.patch CheckpointAMPreemptionPolicy implements preemption in MR AM via checkpointing -- Key: MAPREDUCE-5196 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5196 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mr-am, mrv2 Reporter: Carlo Curino Assignee: Carlo Curino Attachments: MAPREDUCE-5196.1.patch, MAPREDUCE-5196.2.patch, MAPREDUCE-5196.3.patch, MAPREDUCE-5196.patch, MAPREDUCE-5196.patch This JIRA tracks a checkpoint-based AM preemption policy. The policy handles propagation of the preemption requests received from the RM to the appropriate tasks, and bookeeping of checkpoints. Actual checkpointing of the task state is handled in upcoming JIRAs. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (MAPREDUCE-5196) CheckpointAMPreemptionPolicy implements preemption in MR AM via checkpointing
[ https://issues.apache.org/jira/browse/MAPREDUCE-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-5196: - Status: Patch Available (was: Open) CheckpointAMPreemptionPolicy implements preemption in MR AM via checkpointing -- Key: MAPREDUCE-5196 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5196 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mr-am, mrv2 Reporter: Carlo Curino Assignee: Carlo Curino Attachments: MAPREDUCE-5196.1.patch, MAPREDUCE-5196.2.patch, MAPREDUCE-5196.3.patch, MAPREDUCE-5196.patch, MAPREDUCE-5196.patch This JIRA tracks a checkpoint-based AM preemption policy. The policy handles propagation of the preemption requests received from the RM to the appropriate tasks, and bookeeping of checkpoints. Actual checkpointing of the task state is handled in upcoming JIRAs. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (MAPREDUCE-5196) CheckpointAMPreemptionPolicy implements preemption in MR AM via checkpointing
[ https://issues.apache.org/jira/browse/MAPREDUCE-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-5196: - Status: Patch Available (was: Open) CheckpointAMPreemptionPolicy implements preemption in MR AM via checkpointing -- Key: MAPREDUCE-5196 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5196 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mr-am, mrv2 Reporter: Carlo Curino Assignee: Carlo Curino Attachments: MAPREDUCE-5196.1.patch, MAPREDUCE-5196.2.patch, MAPREDUCE-5196.3.patch, MAPREDUCE-5196.patch, MAPREDUCE-5196.patch This JIRA tracks a checkpoint-based AM preemption policy. The policy handles propagation of the preemption requests received from the RM to the appropriate tasks, and bookeeping of checkpoints. Actual checkpointing of the task state is handled in upcoming JIRAs. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (MAPREDUCE-5196) CheckpointAMPreemptionPolicy implements preemption in MR AM via checkpointing
[ https://issues.apache.org/jira/browse/MAPREDUCE-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-5196: - Status: Open (was: Patch Available) CheckpointAMPreemptionPolicy implements preemption in MR AM via checkpointing -- Key: MAPREDUCE-5196 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5196 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mr-am, mrv2 Reporter: Carlo Curino Assignee: Carlo Curino Attachments: MAPREDUCE-5196.1.patch, MAPREDUCE-5196.2.patch, MAPREDUCE-5196.3.patch, MAPREDUCE-5196.patch, MAPREDUCE-5196.patch This JIRA tracks a checkpoint-based AM preemption policy. The policy handles propagation of the preemption requests received from the RM to the appropriate tasks, and bookeeping of checkpoints. Actual checkpointing of the task state is handled in upcoming JIRAs. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (MAPREDUCE-5196) CheckpointAMPreemptionPolicy implements preemption in MR AM via checkpointing
[ https://issues.apache.org/jira/browse/MAPREDUCE-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13855069#comment-13855069 ] Chris Douglas commented on MAPREDUCE-5196: -- Failed test is due to YARN-1463 CheckpointAMPreemptionPolicy implements preemption in MR AM via checkpointing -- Key: MAPREDUCE-5196 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5196 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mr-am, mrv2 Reporter: Carlo Curino Assignee: Carlo Curino Attachments: MAPREDUCE-5196.1.patch, MAPREDUCE-5196.2.patch, MAPREDUCE-5196.3.patch, MAPREDUCE-5196.patch, MAPREDUCE-5196.patch This JIRA tracks a checkpoint-based AM preemption policy. The policy handles propagation of the preemption requests received from the RM to the appropriate tasks, and bookeeping of checkpoints. Actual checkpointing of the task state is handled in upcoming JIRAs. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (MAPREDUCE-5189) Basic AM changes to support preemption requests (per YARN-45)
[ https://issues.apache.org/jira/browse/MAPREDUCE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-5189: - Status: Patch Available (was: Open) Basic AM changes to support preemption requests (per YARN-45) - Key: MAPREDUCE-5189 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5189 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mr-am, mrv2 Reporter: Carlo Curino Assignee: Carlo Curino Attachments: MAPREDUCE-5189.1.patch, MAPREDUCE-5189.2.patch, MAPREDUCE-5189.3.patch, MAPREDUCE-5189.4.patch, MAPREDUCE-5189.patch, MAPREDUCE-5189.patch This JIRA tracks the minimum amount of changes necessary in the mapreduce AM to receive preemption requests (per YARN-45) and invoke a local policy that manages preemption. (advanced policies and mechanisms will be tracked separately) -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (MAPREDUCE-5189) Basic AM changes to support preemption requests (per YARN-45)
[ https://issues.apache.org/jira/browse/MAPREDUCE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Douglas updated MAPREDUCE-5189: - Resolution: Fixed Fix Version/s: 3.0.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Basic AM changes to support preemption requests (per YARN-45) - Key: MAPREDUCE-5189 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5189 Project: Hadoop Map/Reduce Issue Type: Improvement Components: mr-am, mrv2 Reporter: Carlo Curino Assignee: Carlo Curino Fix For: 3.0.0 Attachments: MAPREDUCE-5189.1.patch, MAPREDUCE-5189.2.patch, MAPREDUCE-5189.3.patch, MAPREDUCE-5189.4.patch, MAPREDUCE-5189.patch, MAPREDUCE-5189.patch This JIRA tracks the minimum amount of changes necessary in the mapreduce AM to receive preemption requests (per YARN-45) and invoke a local policy that manages preemption. (advanced policies and mechanisms will be tracked separately) -- This message was sent by Atlassian JIRA (v6.1.4#6159)