[jira] [Commented] (MAPREDUCE-6704) Container fail to launch for mapred application
[ https://issues.apache.org/jira/browse/MAPREDUCE-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15596876#comment-15596876 ] Bibin A Chundatt commented on MAPREDUCE-6704: - [~rkanter] {quote} Given the difficulty people seem to be having and that there doesn't seem to be a single fix that works for everyone for some reason, perhaps we should revisit that decision? {quote} IMHO we have to rethink of the same. Summarizing discussion and solutions # Add HADOOP_MAPRED_HOME=HADOOP_COMMON_HOME in opts.But its not mandatory that MAPRED_HOME=HADOOP_COMMON_HOME # Add HADOOP_MAPRED_HOME to Yarn.Since we want to keep YARN and MAPRED separate initial solution was not accepted. # Add documentation to configure yarn.nodemanager.env-whitelist in nodemanager to run mapred application {noformat} yarn.nodemanager.env-whitelist JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME {noformat} Waiting for +1 for any one of above solution or inputs for any other approach > Container fail to launch for mapred application > --- > > Key: MAPREDUCE-6704 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6704 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Blocker > Attachments: 0001-MAPREDUCE-6704.patch, 0001-YARN-5026.patch > > > Container fail to launch for mapred application. > As part for launch script {{HADOOP_MAPRED_HOME}} default value is not set > .After > https://github.com/apache/hadoop/commit/9d4d30243b0fc9630da51a2c17b543ef671d035c >{{HADOOP_MAPRED_HOME}} is not able to get from {{builder.environment()}} > since {{DefaultContainerExecutor#buildCommandExecutor}} sets inherit to false. > {noformat} > 16/05/02 09:16:05 INFO mapreduce.Job: Job job_1462155939310_0004 failed with > state FAILED due to: Application application_1462155939310_0004 failed 2 > times due to AM Container for appattempt_1462155939310_0004_02 exited > with exitCode: 1 > Failing this attempt.Diagnostics: Exception from container-launch. > Container id: container_1462155939310_0004_02_01 > Exit code: 1 > Stack trace: ExitCodeException exitCode=1: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:946) > at org.apache.hadoop.util.Shell.run(Shell.java:850) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1144) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:227) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:385) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:281) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:89) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Container exited with a non-zero exit code 1. Last 4096 bytes of stderr : > Java HotSpot(TM) 64-Bit Server VM warning: ignoring option UseSplitVerifier; > support was removed in 8.0 > Error: Could not find or load main class > org.apache.hadoop.mapreduce.v2.app.MRAppMaster > Container exited with a non-zero exit code 1. Last 4096 bytes of stderr : > Java HotSpot(TM) 64-Bit Server VM warning: ignoring option UseSplitVerifier; > support was removed in 8.0 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6728) Give fetchers hint when ShuffleHandler rejects a shuffling connection
[ https://issues.apache.org/jira/browse/MAPREDUCE-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15596842#comment-15596842 ] Hudson commented on MAPREDUCE-6728: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10662 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/10662/]) MAPREDUCE-6728. Give fetchers hint when ShuffleHandler rejects a (rkanter: rev d4725bfcb2d300219d65395a78f957afbf37b201) * (edit) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java * (edit) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java * (edit) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestFetcher.java * (edit) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MapHost.java * (edit) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/test/java/org/apache/hadoop/mapred/TestShuffleHandler.java * (edit) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/ShuffleSchedulerImpl.java > Give fetchers hint when ShuffleHandler rejects a shuffling connection > - > > Key: MAPREDUCE-6728 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6728 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv2 >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: mapreduce6728.001.patch, mapreduce6728.002.patch, > mapreduce6728.003.patch, mapreduce6728.004.patch, mapreduce6728.005.patch, > mapreduce6728.006.patch, mapreduce6728.prelim.patch > > > If # of open shuffle connection to a node goes over the max, ShuffleHandler > closes the connection immediately without giving fetchers any hint of the > reason, which causes fetchers to fail due to exceptions > java.net.SocketException: Unexpected end of file from server > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:772) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:769) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) > at > java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:430) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.setupConnectionsWithRetry(Fetcher.java:395) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.openShuffleUrl(Fetcher.java:266) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:323) > at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) > OR > java.net.SocketException: Connection reset > at java.net.SocketInputStream.read(SocketInputStream.java:196) > at java.net.SocketInputStream.read(SocketInputStream.java:122) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) > at java.io.BufferedInputStream.read(BufferedInputStream.java:334) > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:769) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) > at > java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:430) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.setupConnectionsWithRetry(Fetcher.java:395) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.openShuffleUrl(Fetcher.java:266) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java > Such failures are counted as fetcher failures -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6728) Give fetchers hint when ShuffleHandler rejects a shuffling connection
[ https://issues.apache.org/jira/browse/MAPREDUCE-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15596824#comment-15596824 ] Robert Kanter commented on MAPREDUCE-6728: -- Looks like it doesn't compile against branch-2. Can you take a look and upload a modified patch? {noformat} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project hadoop-mapreduce-client-shuffle: Compilation failure [ERROR] /Users/rkanter/dev/hadoop-git/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java:[1268,17] cannot find symbol [ERROR] symbol: method headers() [ERROR] location: variable response of type org.jboss.netty.handler.codec.http.HttpResponse [ERROR] -> [Help 1] {noformat} > Give fetchers hint when ShuffleHandler rejects a shuffling connection > - > > Key: MAPREDUCE-6728 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6728 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv2 >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: mapreduce6728.001.patch, mapreduce6728.002.patch, > mapreduce6728.003.patch, mapreduce6728.004.patch, mapreduce6728.005.patch, > mapreduce6728.006.patch, mapreduce6728.prelim.patch > > > If # of open shuffle connection to a node goes over the max, ShuffleHandler > closes the connection immediately without giving fetchers any hint of the > reason, which causes fetchers to fail due to exceptions > java.net.SocketException: Unexpected end of file from server > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:772) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:769) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) > at > java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:430) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.setupConnectionsWithRetry(Fetcher.java:395) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.openShuffleUrl(Fetcher.java:266) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:323) > at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) > OR > java.net.SocketException: Connection reset > at java.net.SocketInputStream.read(SocketInputStream.java:196) > at java.net.SocketInputStream.read(SocketInputStream.java:122) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) > at java.io.BufferedInputStream.read(BufferedInputStream.java:334) > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:769) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) > at > java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:430) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.setupConnectionsWithRetry(Fetcher.java:395) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.openShuffleUrl(Fetcher.java:266) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java > Such failures are counted as fetcher failures -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6728) Give fetchers hint when ShuffleHandler rejects a shuffling connection
[ https://issues.apache.org/jira/browse/MAPREDUCE-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15596800#comment-15596800 ] Robert Kanter commented on MAPREDUCE-6728: -- +1 > Give fetchers hint when ShuffleHandler rejects a shuffling connection > - > > Key: MAPREDUCE-6728 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6728 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv2 >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: mapreduce6728.001.patch, mapreduce6728.002.patch, > mapreduce6728.003.patch, mapreduce6728.004.patch, mapreduce6728.005.patch, > mapreduce6728.006.patch, mapreduce6728.prelim.patch > > > If # of open shuffle connection to a node goes over the max, ShuffleHandler > closes the connection immediately without giving fetchers any hint of the > reason, which causes fetchers to fail due to exceptions > java.net.SocketException: Unexpected end of file from server > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:772) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:769) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) > at > java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:430) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.setupConnectionsWithRetry(Fetcher.java:395) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.openShuffleUrl(Fetcher.java:266) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:323) > at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) > OR > java.net.SocketException: Connection reset > at java.net.SocketInputStream.read(SocketInputStream.java:196) > at java.net.SocketInputStream.read(SocketInputStream.java:122) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) > at java.io.BufferedInputStream.read(BufferedInputStream.java:334) > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:769) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) > at > java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:430) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.setupConnectionsWithRetry(Fetcher.java:395) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.openShuffleUrl(Fetcher.java:266) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java > Such failures are counted as fetcher failures -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6728) Give fetchers hint when ShuffleHandler rejects a shuffling connection
[ https://issues.apache.org/jira/browse/MAPREDUCE-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15596716#comment-15596716 ] Hadoop QA commented on MAPREDUCE-6728: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 53s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 53s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 32s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 28s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 23s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s {color} | {color:green} trunk passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 8s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 44s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 54s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 54s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 28s {color} | {color:red} hadoop-mapreduce-project/hadoop-mapreduce-client: The patch generated 2 new + 277 unchanged - 3 fixed = 279 total (was 280) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 42s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 23s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 28s {color} | {color:green} hadoop-mapreduce-client-core in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 17s {color} | {color:green} hadoop-mapreduce-client-shuffle in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 23m 58s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12834778/mapreduce6728.006.patch | | JIRA Issue | MAPREDUCE-6728 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 6e090c3051f6 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 2543852 | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6770/artifact/patchprocess/diff-checkstyle-hadoop-mapreduce-project_hadoop-mapreduce-client.txt | | Test Results | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6770/testReport/ | | modules | C: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core
[jira] [Updated] (MAPREDUCE-6728) Give fetchers hint when ShuffleHandler rejects a shuffling connection
[ https://issues.apache.org/jira/browse/MAPREDUCE-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haibo Chen updated MAPREDUCE-6728: -- Attachment: mapreduce6728.006.patch Thanks for your reivews, [~rkanter]. Uploading a new patch to address the issues you have raised. > Give fetchers hint when ShuffleHandler rejects a shuffling connection > - > > Key: MAPREDUCE-6728 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6728 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv2 >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: mapreduce6728.001.patch, mapreduce6728.002.patch, > mapreduce6728.003.patch, mapreduce6728.004.patch, mapreduce6728.005.patch, > mapreduce6728.006.patch, mapreduce6728.prelim.patch > > > If # of open shuffle connection to a node goes over the max, ShuffleHandler > closes the connection immediately without giving fetchers any hint of the > reason, which causes fetchers to fail due to exceptions > java.net.SocketException: Unexpected end of file from server > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:772) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:769) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) > at > java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:430) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.setupConnectionsWithRetry(Fetcher.java:395) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.openShuffleUrl(Fetcher.java:266) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:323) > at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) > OR > java.net.SocketException: Connection reset > at java.net.SocketInputStream.read(SocketInputStream.java:196) > at java.net.SocketInputStream.read(SocketInputStream.java:122) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) > at java.io.BufferedInputStream.read(BufferedInputStream.java:334) > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:769) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) > at > java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:430) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.setupConnectionsWithRetry(Fetcher.java:395) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.openShuffleUrl(Fetcher.java:266) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java > Such failures are counted as fetcher failures -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6728) Give fetchers hint when ShuffleHandler rejects a shuffling connection
[ https://issues.apache.org/jira/browse/MAPREDUCE-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15596602#comment-15596602 ] Robert Kanter commented on MAPREDUCE-6728: -- Thanks for the patch [~haibochen] and the reviews [~templedf]. A few minor things: - {code:java}for (TaskAttemptID left: remaining) {code} still has wrong spacing. - I think it might be helpful to put the hostname in the {{TryAgainLaterException}} message. - {{Fetcher.FETCH_RETRY_DELAY_DEFAULT}} has a comment to point you to {{ShuffleHandler.FETCH_RETRY_DELAY}}. I think we should add a comment pointing back in case someone goes and changes {{ShuffleHandler.FETCH_RETRY_DELAY}}, they'll know to go and change {{Fetcher.FETCH_RETRY_DELAY_DEFAULT}} > Give fetchers hint when ShuffleHandler rejects a shuffling connection > - > > Key: MAPREDUCE-6728 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6728 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv2 >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: mapreduce6728.001.patch, mapreduce6728.002.patch, > mapreduce6728.003.patch, mapreduce6728.004.patch, mapreduce6728.005.patch, > mapreduce6728.prelim.patch > > > If # of open shuffle connection to a node goes over the max, ShuffleHandler > closes the connection immediately without giving fetchers any hint of the > reason, which causes fetchers to fail due to exceptions > java.net.SocketException: Unexpected end of file from server > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:772) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:769) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) > at > java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:430) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.setupConnectionsWithRetry(Fetcher.java:395) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.openShuffleUrl(Fetcher.java:266) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:323) > at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) > OR > java.net.SocketException: Connection reset > at java.net.SocketInputStream.read(SocketInputStream.java:196) > at java.net.SocketInputStream.read(SocketInputStream.java:122) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) > at java.io.BufferedInputStream.read(BufferedInputStream.java:334) > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:769) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) > at > java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:430) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.setupConnectionsWithRetry(Fetcher.java:395) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.openShuffleUrl(Fetcher.java:266) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java > Such failures are counted as fetcher failures -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6728) Give fetchers hint when ShuffleHandler rejects a shuffling connection
[ https://issues.apache.org/jira/browse/MAPREDUCE-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15596254#comment-15596254 ] Daniel Templeton commented on MAPREDUCE-6728: - Thanks, [~haibochen]. Latest patch looks good to me. +1 (non-binding) > Give fetchers hint when ShuffleHandler rejects a shuffling connection > - > > Key: MAPREDUCE-6728 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6728 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: mrv2 >Reporter: Haibo Chen >Assignee: Haibo Chen > Attachments: mapreduce6728.001.patch, mapreduce6728.002.patch, > mapreduce6728.003.patch, mapreduce6728.004.patch, mapreduce6728.005.patch, > mapreduce6728.prelim.patch > > > If # of open shuffle connection to a node goes over the max, ShuffleHandler > closes the connection immediately without giving fetchers any hint of the > reason, which causes fetchers to fail due to exceptions > java.net.SocketException: Unexpected end of file from server > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:772) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:769) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) > at > java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:430) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.setupConnectionsWithRetry(Fetcher.java:395) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.openShuffleUrl(Fetcher.java:266) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:323) > at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193) > OR > java.net.SocketException: Connection reset > at java.net.SocketInputStream.read(SocketInputStream.java:196) > at java.net.SocketInputStream.read(SocketInputStream.java:122) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) > at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) > at java.io.BufferedInputStream.read(BufferedInputStream.java:334) > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) > at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:769) > at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633) > at > sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323) > at > java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:430) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.setupConnectionsWithRetry(Fetcher.java:395) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.openShuffleUrl(Fetcher.java:266) > at > org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java > Such failures are counted as fetcher failures -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6704) Container fail to launch for mapred application
[ https://issues.apache.org/jira/browse/MAPREDUCE-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15596201#comment-15596201 ] Andrew Wang commented on MAPREDUCE-6704: Folks, is there any progress we can make on this JIRA? That this doesn't work out of the box anymore has been very surprising to our users. I'd like to get it fixed for alpha2 if possible. > Container fail to launch for mapred application > --- > > Key: MAPREDUCE-6704 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6704 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Bibin A Chundatt >Priority: Blocker > Attachments: 0001-MAPREDUCE-6704.patch, 0001-YARN-5026.patch > > > Container fail to launch for mapred application. > As part for launch script {{HADOOP_MAPRED_HOME}} default value is not set > .After > https://github.com/apache/hadoop/commit/9d4d30243b0fc9630da51a2c17b543ef671d035c >{{HADOOP_MAPRED_HOME}} is not able to get from {{builder.environment()}} > since {{DefaultContainerExecutor#buildCommandExecutor}} sets inherit to false. > {noformat} > 16/05/02 09:16:05 INFO mapreduce.Job: Job job_1462155939310_0004 failed with > state FAILED due to: Application application_1462155939310_0004 failed 2 > times due to AM Container for appattempt_1462155939310_0004_02 exited > with exitCode: 1 > Failing this attempt.Diagnostics: Exception from container-launch. > Container id: container_1462155939310_0004_02_01 > Exit code: 1 > Stack trace: ExitCodeException exitCode=1: > at org.apache.hadoop.util.Shell.runCommand(Shell.java:946) > at org.apache.hadoop.util.Shell.run(Shell.java:850) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1144) > at > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:227) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:385) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:281) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:89) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Container exited with a non-zero exit code 1. Last 4096 bytes of stderr : > Java HotSpot(TM) 64-Bit Server VM warning: ignoring option UseSplitVerifier; > support was removed in 8.0 > Error: Could not find or load main class > org.apache.hadoop.mapreduce.v2.app.MRAppMaster > Container exited with a non-zero exit code 1. Last 4096 bytes of stderr : > Java HotSpot(TM) 64-Bit Server VM warning: ignoring option UseSplitVerifier; > support was removed in 8.0 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6772) Add MR Job Configurations for Containers reuse
[ https://issues.apache.org/jira/browse/MAPREDUCE-6772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15596078#comment-15596078 ] Naganarasimha G R commented on MAPREDUCE-6772: -- [~devaraj.k], Sorry have one last comment, Missed to mention that we need to capture the same in {{mapred-default.xml}}, hope we could capture the same with the proper patch name pattern. > Add MR Job Configurations for Containers reuse > -- > > Key: MAPREDUCE-6772 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6772 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: applicationmaster, mrv2 >Reporter: Devaraj K >Assignee: Devaraj K > Attachments: MAPREDUCE-6772-v0.patch, MAPREDUCE-6772-v1.patch, > MR-6749-MAPREDUCE-6772.003.patch > > > This task adds configurations required for MR AM Container reuse feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6772) Add MR Job Configurations for Containers reuse
[ https://issues.apache.org/jira/browse/MAPREDUCE-6772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15596028#comment-15596028 ] Naganarasimha G R commented on MAPREDUCE-6772: -- [~devaraj.k], Overall the latest patch LGTM committing this patch to the branch *"MR-6749"*, as this is first patch its fine, but for further patches we need to follow the naming convention as per https://wiki.apache.org/hadoop/HowToContribute #Naming your patch i.e. -..patch,, which would look like {{MAPREDUCE-6772-MR-6749.03.patch}}. This would help Jenkins to run the patch against the right branch. > Add MR Job Configurations for Containers reuse > -- > > Key: MAPREDUCE-6772 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6772 > Project: Hadoop Map/Reduce > Issue Type: Sub-task > Components: applicationmaster, mrv2 >Reporter: Devaraj K >Assignee: Devaraj K > Attachments: MAPREDUCE-6772-v0.patch, MAPREDUCE-6772-v1.patch, > MR-6749-MAPREDUCE-6772.003.patch > > > This task adds configurations required for MR AM Container reuse feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (MAPREDUCE-6797) Job history server scans can become blocked on a single, slow entry
[ https://issues.apache.org/jira/browse/MAPREDUCE-6797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15594632#comment-15594632 ] Prabhu Joseph edited comment on MAPREDUCE-6797 at 10/21/16 9:37 AM: [~kasha] Multiple threads calling addIfAbsent simultaneously and processing a same HistoryFileInfo is possible and that won't face any issue even after removing synchronized block as the operations inside are thread safe and also idempotent. was (Author: prabhu joseph): [~kasha] Multiple threads calling addIfAbsent simultaneously is possible and that won't face any issue after removing synchronized block as the operations inside Synchronized block are thread safe and also idempotent. > Job history server scans can become blocked on a single, slow entry > --- > > Key: MAPREDUCE-6797 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6797 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver >Affects Versions: 2.4.0, 2.8.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Critical > Fix For: 2.9.0 > > Attachments: 0001-MAPREDUCE-6797.patch, jstack > > > There is one more piece of code in HistoryFileManager where Synchronized > keyword on HistoryFileInfo need to be removed. The JobHistoryServer > contention issue is hit on our environment where stacktrace (attached) shows > the HistoryFileManager$JobListCache.addIfAbsent unnecessarily waiting to lock > on HistoryFileInfo. > Synchronized on isMovePending and didMoveFail has been removed by > Mapreduce-6684. > {code} > HistoryFileInfo firstValue = cache.get(key); > synchronized(firstValue) { ---> Synchronized is not needed > here > if (firstValue.isMovePending()) { > if(firstValue.didMoveFail() && > firstValue.jobIndexInfo.getFinishTime() <= cutoff) { > cache.remove(key); > //Now lets try to delete it > try { > firstValue.delete(); > } catch (IOException e) { > LOG.error("Error while trying to delete history files" + > " that could not be moved to done.", e); > } > } else { > LOG.warn("Waiting to remove " + key > + " from JobListCache because it is not in done yet."); > } > } else { > cache.remove(key); > } > } > {code} > {code} > Note: stacktrace is from hadoop-2.4.0 version and the problem exists in > latest hadoop as well > "2144820863@qtp-313351300-38156" daemon prio=10 tid=0x01e13800 > nid=0xf133 waiting for monitor entry [0x7f7c1d8dd000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$JobListCache.addIfAbsent(HistoryFileManager.java:226) > - waiting to lock <0x00040145c4d8> (a > org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$HistoryFileInfo) > at > org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.scanIntermediateDirectory(HistoryFileManager.java:825) > at > org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.access$200(HistoryFileManager.java:82) > at > org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$UserLogDir.scanIfNeeded(HistoryFileManager.java:280) > - locked <0x000400375388> (a > org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$UserLogDir) > at > org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.scanIntermediateDirectory(HistoryFileManager.java:792) > at > org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.getAllFileInfo(HistoryFileManager.java:920) > at > org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.getAllPartialJobs(CachedHistoryStorage.java:156) > at > org.apache.hadoop.mapreduce.v2.hs.JobHistory.getAllJobs(JobHistory.java:235) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6797) Job history server scans can become blocked on a single, slow entry
[ https://issues.apache.org/jira/browse/MAPREDUCE-6797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15594632#comment-15594632 ] Prabhu Joseph commented on MAPREDUCE-6797: -- [~kasha] Multiple threads calling addIfAbsent simultaneously is possible and that won't face any issue after removing synchronized block as the operations inside Synchronized block are thread safe and also idempotent. > Job history server scans can become blocked on a single, slow entry > --- > > Key: MAPREDUCE-6797 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6797 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver >Affects Versions: 2.4.0, 2.8.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Critical > Fix For: 2.9.0 > > Attachments: 0001-MAPREDUCE-6797.patch, jstack > > > There is one more piece of code in HistoryFileManager where Synchronized > keyword on HistoryFileInfo need to be removed. The JobHistoryServer > contention issue is hit on our environment where stacktrace (attached) shows > the HistoryFileManager$JobListCache.addIfAbsent unnecessarily waiting to lock > on HistoryFileInfo. > Synchronized on isMovePending and didMoveFail has been removed by > Mapreduce-6684. > {code} > HistoryFileInfo firstValue = cache.get(key); > synchronized(firstValue) { ---> Synchronized is not needed > here > if (firstValue.isMovePending()) { > if(firstValue.didMoveFail() && > firstValue.jobIndexInfo.getFinishTime() <= cutoff) { > cache.remove(key); > //Now lets try to delete it > try { > firstValue.delete(); > } catch (IOException e) { > LOG.error("Error while trying to delete history files" + > " that could not be moved to done.", e); > } > } else { > LOG.warn("Waiting to remove " + key > + " from JobListCache because it is not in done yet."); > } > } else { > cache.remove(key); > } > } > {code} > {code} > Note: stacktrace is from hadoop-2.4.0 version and the problem exists in > latest hadoop as well > "2144820863@qtp-313351300-38156" daemon prio=10 tid=0x01e13800 > nid=0xf133 waiting for monitor entry [0x7f7c1d8dd000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$JobListCache.addIfAbsent(HistoryFileManager.java:226) > - waiting to lock <0x00040145c4d8> (a > org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$HistoryFileInfo) > at > org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.scanIntermediateDirectory(HistoryFileManager.java:825) > at > org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.access$200(HistoryFileManager.java:82) > at > org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$UserLogDir.scanIfNeeded(HistoryFileManager.java:280) > - locked <0x000400375388> (a > org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$UserLogDir) > at > org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.scanIntermediateDirectory(HistoryFileManager.java:792) > at > org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.getAllFileInfo(HistoryFileManager.java:920) > at > org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.getAllPartialJobs(CachedHistoryStorage.java:156) > at > org.apache.hadoop.mapreduce.v2.hs.JobHistory.getAllJobs(JobHistory.java:235) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-1211) Online aggregation and continuous query support
[ https://issues.apache.org/jira/browse/MAPREDUCE-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15594386#comment-15594386 ] Reynold Xin commented on MAPREDUCE-1211: This seems useful. > Online aggregation and continuous query support > --- > > Key: MAPREDUCE-1211 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1211 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: task >Reporter: Tyson Condie >Priority: Minor > > The purpose of this post is to propose a modified MapReduce architecture that > allows data to be pipelined between operators. This extends the MapReduce > programming model beyond batch processing, and can reduce completion times > and improve system utilization for batch jobs as well. We have built a > modified version of the Hadoop MapReduce framework that supports online > aggregation, which allows users to see "early returns" from a job as it is > being computed. Our Hadoop Online Prototype (HOP) also supports continuous > queries, which enable MapReduce programs to be written for applications such > as event monitoring and stream processing. HOP retains the fault tolerance > properties of Hadoop, and can run unmodified user-defined MapReduce programs. > For more information on the HOP design, please see our technical report. > http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-136.html > Further details are discussed in the following blog posts. > http://databeta.wordpress.com/2009/10/18/mapreduce-online/ > http://radar.oreilly.com/2009/10/pipelining-and-real-time-analytics-with-mapreduce-online.html > http://dbmsmusings.blogspot.com/2009/10/analysis-of-mapreduce-online-paper.html > The HOP code has been published at the following location. > http://code.google.com/p/hop/ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6799) Document mapreduce.jobhistory.webapp.https.address in mapred-default.xml
[ https://issues.apache.org/jira/browse/MAPREDUCE-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated MAPREDUCE-6799: - Labels: newbie supportability (was: newbie) > Document mapreduce.jobhistory.webapp.https.address in mapred-default.xml > > > Key: MAPREDUCE-6799 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6799 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: jobhistoryserver >Reporter: Akira Ajisaka >Priority: Minor > Labels: newbie, supportability > > The default port number is 19890 but it is not documented. > {code:title=JHAdminConfig.java} > public static final String MR_HISTORY_WEBAPP_HTTPS_ADDRESS = > MR_HISTORY_PREFIX + "webapp.https.address"; > public static final int DEFAULT_MR_HISTORY_WEBAPP_HTTPS_PORT = 19890; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-6799) Document mapreduce.jobhistory.webapp.https.address in mapred-default.xml
Akira Ajisaka created MAPREDUCE-6799: Summary: Document mapreduce.jobhistory.webapp.https.address in mapred-default.xml Key: MAPREDUCE-6799 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6799 Project: Hadoop Map/Reduce Issue Type: Improvement Components: jobhistoryserver Reporter: Akira Ajisaka Priority: Minor The default port number is 19890 but it is not documented. {code:title=JHAdminConfig.java} public static final String MR_HISTORY_WEBAPP_HTTPS_ADDRESS = MR_HISTORY_PREFIX + "webapp.https.address"; public static final int DEFAULT_MR_HISTORY_WEBAPP_HTTPS_PORT = 19890; {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6797) Job history server scans can become blocked on a single, slow entry
[ https://issues.apache.org/jira/browse/MAPREDUCE-6797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated MAPREDUCE-6797: - Attachment: 0001-MAPREDUCE-6797.patch > Job history server scans can become blocked on a single, slow entry > --- > > Key: MAPREDUCE-6797 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6797 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver >Affects Versions: 2.4.0, 2.8.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Critical > Fix For: 2.9.0 > > Attachments: 0001-MAPREDUCE-6797.patch, jstack > > > There is one more piece of code in HistoryFileManager where Synchronized > keyword on HistoryFileInfo need to be removed. The JobHistoryServer > contention issue is hit on our environment where stacktrace (attached) shows > the HistoryFileManager$JobListCache.addIfAbsent unnecessarily waiting to lock > on HistoryFileInfo. > Synchronized on isMovePending and didMoveFail has been removed by > Mapreduce-6684. > {code} > HistoryFileInfo firstValue = cache.get(key); > synchronized(firstValue) { ---> Synchronized is not needed > here > if (firstValue.isMovePending()) { > if(firstValue.didMoveFail() && > firstValue.jobIndexInfo.getFinishTime() <= cutoff) { > cache.remove(key); > //Now lets try to delete it > try { > firstValue.delete(); > } catch (IOException e) { > LOG.error("Error while trying to delete history files" + > " that could not be moved to done.", e); > } > } else { > LOG.warn("Waiting to remove " + key > + " from JobListCache because it is not in done yet."); > } > } else { > cache.remove(key); > } > } > {code} > {code} > Note: stacktrace is from hadoop-2.4.0 version and the problem exists in > latest hadoop as well > "2144820863@qtp-313351300-38156" daemon prio=10 tid=0x01e13800 > nid=0xf133 waiting for monitor entry [0x7f7c1d8dd000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$JobListCache.addIfAbsent(HistoryFileManager.java:226) > - waiting to lock <0x00040145c4d8> (a > org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$HistoryFileInfo) > at > org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.scanIntermediateDirectory(HistoryFileManager.java:825) > at > org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.access$200(HistoryFileManager.java:82) > at > org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$UserLogDir.scanIfNeeded(HistoryFileManager.java:280) > - locked <0x000400375388> (a > org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$UserLogDir) > at > org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.scanIntermediateDirectory(HistoryFileManager.java:792) > at > org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.getAllFileInfo(HistoryFileManager.java:920) > at > org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.getAllPartialJobs(CachedHistoryStorage.java:156) > at > org.apache.hadoop.mapreduce.v2.hs.JobHistory.getAllJobs(JobHistory.java:235) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6797) Job history server scans can become blocked on a single, slow entry
[ https://issues.apache.org/jira/browse/MAPREDUCE-6797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated MAPREDUCE-6797: - Fix Version/s: 2.9.0 Status: Patch Available (was: Open) > Job history server scans can become blocked on a single, slow entry > --- > > Key: MAPREDUCE-6797 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6797 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver >Affects Versions: 2.4.0, 2.8.0 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Critical > Fix For: 2.9.0 > > Attachments: 0001-MAPREDUCE-6797.patch, jstack > > > There is one more piece of code in HistoryFileManager where Synchronized > keyword on HistoryFileInfo need to be removed. The JobHistoryServer > contention issue is hit on our environment where stacktrace (attached) shows > the HistoryFileManager$JobListCache.addIfAbsent unnecessarily waiting to lock > on HistoryFileInfo. > Synchronized on isMovePending and didMoveFail has been removed by > Mapreduce-6684. > {code} > HistoryFileInfo firstValue = cache.get(key); > synchronized(firstValue) { ---> Synchronized is not needed > here > if (firstValue.isMovePending()) { > if(firstValue.didMoveFail() && > firstValue.jobIndexInfo.getFinishTime() <= cutoff) { > cache.remove(key); > //Now lets try to delete it > try { > firstValue.delete(); > } catch (IOException e) { > LOG.error("Error while trying to delete history files" + > " that could not be moved to done.", e); > } > } else { > LOG.warn("Waiting to remove " + key > + " from JobListCache because it is not in done yet."); > } > } else { > cache.remove(key); > } > } > {code} > {code} > Note: stacktrace is from hadoop-2.4.0 version and the problem exists in > latest hadoop as well > "2144820863@qtp-313351300-38156" daemon prio=10 tid=0x01e13800 > nid=0xf133 waiting for monitor entry [0x7f7c1d8dd000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$JobListCache.addIfAbsent(HistoryFileManager.java:226) > - waiting to lock <0x00040145c4d8> (a > org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$HistoryFileInfo) > at > org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.scanIntermediateDirectory(HistoryFileManager.java:825) > at > org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.access$200(HistoryFileManager.java:82) > at > org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$UserLogDir.scanIfNeeded(HistoryFileManager.java:280) > - locked <0x000400375388> (a > org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$UserLogDir) > at > org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.scanIntermediateDirectory(HistoryFileManager.java:792) > at > org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.getAllFileInfo(HistoryFileManager.java:920) > at > org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.getAllPartialJobs(CachedHistoryStorage.java:156) > at > org.apache.hadoop.mapreduce.v2.hs.JobHistory.getAllJobs(JobHistory.java:235) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org