[jira] [Commented] (MAPREDUCE-6704) Container fail to launch for mapred application

2016-10-21 Thread Bibin A Chundatt (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15596876#comment-15596876
 ] 

Bibin A Chundatt commented on MAPREDUCE-6704:
-

[~rkanter]
{quote}
Given the difficulty people seem to be having and that there doesn't seem to be 
a single fix that works for everyone for some reason, perhaps we should revisit 
that decision? 
{quote}
IMHO we have to rethink of the same.


Summarizing discussion and solutions

# Add HADOOP_MAPRED_HOME=HADOOP_COMMON_HOME in opts.But its not mandatory that 
MAPRED_HOME=HADOOP_COMMON_HOME
# Add HADOOP_MAPRED_HOME to Yarn.Since we want to keep YARN and MAPRED separate 
initial solution was not accepted. 
# Add documentation to configure yarn.nodemanager.env-whitelist in nodemanager 
to run mapred application
{noformat}

yarn.nodemanager.env-whitelist
JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME

{noformat}
Waiting for +1 for any one of above solution or inputs for any other approach


> Container fail to launch for mapred application
> ---
>
> Key: MAPREDUCE-6704
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6704
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Blocker
> Attachments: 0001-MAPREDUCE-6704.patch, 0001-YARN-5026.patch
>
>
> Container fail to launch for mapred application.
> As part for launch script {{HADOOP_MAPRED_HOME}} default value is not set 
> .After 
> https://github.com/apache/hadoop/commit/9d4d30243b0fc9630da51a2c17b543ef671d035c
>{{HADOOP_MAPRED_HOME}} is not able to get from {{builder.environment()}} 
> since {{DefaultContainerExecutor#buildCommandExecutor}} sets inherit to false.
> {noformat}
> 16/05/02 09:16:05 INFO mapreduce.Job: Job job_1462155939310_0004 failed with 
> state FAILED due to: Application application_1462155939310_0004 failed 2 
> times due to AM Container for appattempt_1462155939310_0004_02 exited 
> with  exitCode: 1
> Failing this attempt.Diagnostics: Exception from container-launch.
> Container id: container_1462155939310_0004_02_01
> Exit code: 1
> Stack trace: ExitCodeException exitCode=1:
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:946)
> at org.apache.hadoop.util.Shell.run(Shell.java:850)
> at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1144)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:227)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:385)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:281)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:89)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Container exited with a non-zero exit code 1. Last 4096 bytes of stderr :
> Java HotSpot(TM) 64-Bit Server VM warning: ignoring option UseSplitVerifier; 
> support was removed in 8.0
> Error: Could not find or load main class 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster
> Container exited with a non-zero exit code 1. Last 4096 bytes of stderr :
> Java HotSpot(TM) 64-Bit Server VM warning: ignoring option UseSplitVerifier; 
> support was removed in 8.0
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6728) Give fetchers hint when ShuffleHandler rejects a shuffling connection

2016-10-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15596842#comment-15596842
 ] 

Hudson commented on MAPREDUCE-6728:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10662 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/10662/])
MAPREDUCE-6728. Give fetchers hint when ShuffleHandler rejects a (rkanter: rev 
d4725bfcb2d300219d65395a78f957afbf37b201)
* (edit) 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java
* (edit) 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Fetcher.java
* (edit) 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/task/reduce/TestFetcher.java
* (edit) 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/MapHost.java
* (edit) 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/test/java/org/apache/hadoop/mapred/TestShuffleHandler.java
* (edit) 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/ShuffleSchedulerImpl.java


> Give fetchers hint when ShuffleHandler rejects a shuffling connection
> -
>
> Key: MAPREDUCE-6728
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6728
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv2
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: mapreduce6728.001.patch, mapreduce6728.002.patch, 
> mapreduce6728.003.patch, mapreduce6728.004.patch, mapreduce6728.005.patch, 
> mapreduce6728.006.patch, mapreduce6728.prelim.patch
>
>
> If # of open shuffle connection to a node goes over the max, ShuffleHandler 
> closes the connection immediately without giving fetchers any hint of the 
> reason, which causes fetchers to fail due to exceptions 
> java.net.SocketException: Unexpected end of file from server
>   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:772)
>   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
>   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:769)
>   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
>   at 
> java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:430)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.setupConnectionsWithRetry(Fetcher.java:395)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.openShuffleUrl(Fetcher.java:266)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:323)
>   at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)
> OR 
> java.net.SocketException: Connection reset
>   at java.net.SocketInputStream.read(SocketInputStream.java:196)
>   at java.net.SocketInputStream.read(SocketInputStream.java:122)
>   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>   at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
>   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
>   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:769)
>   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
>   at 
> java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:430)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.setupConnectionsWithRetry(Fetcher.java:395)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.openShuffleUrl(Fetcher.java:266)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java
> Such failures are counted as fetcher failures



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6728) Give fetchers hint when ShuffleHandler rejects a shuffling connection

2016-10-21 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15596824#comment-15596824
 ] 

Robert Kanter commented on MAPREDUCE-6728:
--

Looks like it doesn't compile against branch-2.  Can you take a look and upload 
a modified patch?
{noformat}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on 
project hadoop-mapreduce-client-shuffle: Compilation failure
[ERROR] 
/Users/rkanter/dev/hadoop-git/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java:[1268,17]
 cannot find symbol
[ERROR] symbol:   method headers()
[ERROR] location: variable response of type 
org.jboss.netty.handler.codec.http.HttpResponse
[ERROR] -> [Help 1]
{noformat}

> Give fetchers hint when ShuffleHandler rejects a shuffling connection
> -
>
> Key: MAPREDUCE-6728
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6728
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv2
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: mapreduce6728.001.patch, mapreduce6728.002.patch, 
> mapreduce6728.003.patch, mapreduce6728.004.patch, mapreduce6728.005.patch, 
> mapreduce6728.006.patch, mapreduce6728.prelim.patch
>
>
> If # of open shuffle connection to a node goes over the max, ShuffleHandler 
> closes the connection immediately without giving fetchers any hint of the 
> reason, which causes fetchers to fail due to exceptions 
> java.net.SocketException: Unexpected end of file from server
>   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:772)
>   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
>   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:769)
>   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
>   at 
> java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:430)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.setupConnectionsWithRetry(Fetcher.java:395)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.openShuffleUrl(Fetcher.java:266)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:323)
>   at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)
> OR 
> java.net.SocketException: Connection reset
>   at java.net.SocketInputStream.read(SocketInputStream.java:196)
>   at java.net.SocketInputStream.read(SocketInputStream.java:122)
>   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>   at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
>   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
>   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:769)
>   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
>   at 
> java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:430)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.setupConnectionsWithRetry(Fetcher.java:395)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.openShuffleUrl(Fetcher.java:266)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java
> Such failures are counted as fetcher failures



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6728) Give fetchers hint when ShuffleHandler rejects a shuffling connection

2016-10-21 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15596800#comment-15596800
 ] 

Robert Kanter commented on MAPREDUCE-6728:
--

+1

> Give fetchers hint when ShuffleHandler rejects a shuffling connection
> -
>
> Key: MAPREDUCE-6728
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6728
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv2
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: mapreduce6728.001.patch, mapreduce6728.002.patch, 
> mapreduce6728.003.patch, mapreduce6728.004.patch, mapreduce6728.005.patch, 
> mapreduce6728.006.patch, mapreduce6728.prelim.patch
>
>
> If # of open shuffle connection to a node goes over the max, ShuffleHandler 
> closes the connection immediately without giving fetchers any hint of the 
> reason, which causes fetchers to fail due to exceptions 
> java.net.SocketException: Unexpected end of file from server
>   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:772)
>   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
>   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:769)
>   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
>   at 
> java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:430)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.setupConnectionsWithRetry(Fetcher.java:395)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.openShuffleUrl(Fetcher.java:266)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:323)
>   at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)
> OR 
> java.net.SocketException: Connection reset
>   at java.net.SocketInputStream.read(SocketInputStream.java:196)
>   at java.net.SocketInputStream.read(SocketInputStream.java:122)
>   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>   at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
>   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
>   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:769)
>   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
>   at 
> java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:430)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.setupConnectionsWithRetry(Fetcher.java:395)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.openShuffleUrl(Fetcher.java:266)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java
> Such failures are counted as fetcher failures



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6728) Give fetchers hint when ShuffleHandler rejects a shuffling connection

2016-10-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15596716#comment-15596716
 ] 

Hadoop QA commented on MAPREDUCE-6728:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 
0s {color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 10s 
{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
53s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 53s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
32s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
28s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
23s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s 
{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 8s 
{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
44s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 54s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 54s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 28s 
{color} | {color:red} hadoop-mapreduce-project/hadoop-mapreduce-client: The 
patch generated 2 new + 277 unchanged - 3 fixed = 279 total (was 280) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 42s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
21s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 
23s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 28s 
{color} | {color:green} hadoop-mapreduce-client-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 17s 
{color} | {color:green} hadoop-mapreduce-client-shuffle in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
17s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 23m 58s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12834778/mapreduce6728.006.patch
 |
| JIRA Issue | MAPREDUCE-6728 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 6e090c3051f6 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 
17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 2543852 |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6770/artifact/patchprocess/diff-checkstyle-hadoop-mapreduce-project_hadoop-mapreduce-client.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6770/testReport/ |
| modules | C: 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 

[jira] [Updated] (MAPREDUCE-6728) Give fetchers hint when ShuffleHandler rejects a shuffling connection

2016-10-21 Thread Haibo Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haibo Chen updated MAPREDUCE-6728:
--
Attachment: mapreduce6728.006.patch

Thanks for your reivews, [~rkanter]. Uploading a new patch to address the 
issues you have raised.

> Give fetchers hint when ShuffleHandler rejects a shuffling connection
> -
>
> Key: MAPREDUCE-6728
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6728
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv2
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: mapreduce6728.001.patch, mapreduce6728.002.patch, 
> mapreduce6728.003.patch, mapreduce6728.004.patch, mapreduce6728.005.patch, 
> mapreduce6728.006.patch, mapreduce6728.prelim.patch
>
>
> If # of open shuffle connection to a node goes over the max, ShuffleHandler 
> closes the connection immediately without giving fetchers any hint of the 
> reason, which causes fetchers to fail due to exceptions 
> java.net.SocketException: Unexpected end of file from server
>   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:772)
>   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
>   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:769)
>   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
>   at 
> java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:430)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.setupConnectionsWithRetry(Fetcher.java:395)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.openShuffleUrl(Fetcher.java:266)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:323)
>   at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)
> OR 
> java.net.SocketException: Connection reset
>   at java.net.SocketInputStream.read(SocketInputStream.java:196)
>   at java.net.SocketInputStream.read(SocketInputStream.java:122)
>   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>   at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
>   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
>   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:769)
>   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
>   at 
> java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:430)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.setupConnectionsWithRetry(Fetcher.java:395)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.openShuffleUrl(Fetcher.java:266)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java
> Such failures are counted as fetcher failures



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6728) Give fetchers hint when ShuffleHandler rejects a shuffling connection

2016-10-21 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15596602#comment-15596602
 ] 

Robert Kanter commented on MAPREDUCE-6728:
--

Thanks for the patch [~haibochen] and the reviews [~templedf].  A few minor 
things:
- {code:java}for (TaskAttemptID left: remaining) {code} still has wrong spacing.
- I think it might be helpful to put the hostname in the 
{{TryAgainLaterException}} message.
- {{Fetcher.FETCH_RETRY_DELAY_DEFAULT}} has a comment to point you to 
{{ShuffleHandler.FETCH_RETRY_DELAY}}.  I think we should add a comment pointing 
back in case someone goes and changes {{ShuffleHandler.FETCH_RETRY_DELAY}}, 
they'll know to go and change {{Fetcher.FETCH_RETRY_DELAY_DEFAULT}}


> Give fetchers hint when ShuffleHandler rejects a shuffling connection
> -
>
> Key: MAPREDUCE-6728
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6728
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv2
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: mapreduce6728.001.patch, mapreduce6728.002.patch, 
> mapreduce6728.003.patch, mapreduce6728.004.patch, mapreduce6728.005.patch, 
> mapreduce6728.prelim.patch
>
>
> If # of open shuffle connection to a node goes over the max, ShuffleHandler 
> closes the connection immediately without giving fetchers any hint of the 
> reason, which causes fetchers to fail due to exceptions 
> java.net.SocketException: Unexpected end of file from server
>   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:772)
>   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
>   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:769)
>   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
>   at 
> java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:430)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.setupConnectionsWithRetry(Fetcher.java:395)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.openShuffleUrl(Fetcher.java:266)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:323)
>   at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)
> OR 
> java.net.SocketException: Connection reset
>   at java.net.SocketInputStream.read(SocketInputStream.java:196)
>   at java.net.SocketInputStream.read(SocketInputStream.java:122)
>   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>   at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
>   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
>   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:769)
>   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
>   at 
> java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:430)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.setupConnectionsWithRetry(Fetcher.java:395)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.openShuffleUrl(Fetcher.java:266)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java
> Such failures are counted as fetcher failures



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6728) Give fetchers hint when ShuffleHandler rejects a shuffling connection

2016-10-21 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15596254#comment-15596254
 ] 

Daniel Templeton commented on MAPREDUCE-6728:
-

Thanks, [~haibochen].  Latest patch looks good to me.  +1 (non-binding)

> Give fetchers hint when ShuffleHandler rejects a shuffling connection
> -
>
> Key: MAPREDUCE-6728
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6728
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv2
>Reporter: Haibo Chen
>Assignee: Haibo Chen
> Attachments: mapreduce6728.001.patch, mapreduce6728.002.patch, 
> mapreduce6728.003.patch, mapreduce6728.004.patch, mapreduce6728.005.patch, 
> mapreduce6728.prelim.patch
>
>
> If # of open shuffle connection to a node goes over the max, ShuffleHandler 
> closes the connection immediately without giving fetchers any hint of the 
> reason, which causes fetchers to fail due to exceptions 
> java.net.SocketException: Unexpected end of file from server
>   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:772)
>   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
>   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:769)
>   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
>   at 
> java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:430)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.setupConnectionsWithRetry(Fetcher.java:395)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.openShuffleUrl(Fetcher.java:266)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:323)
>   at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)
> OR 
> java.net.SocketException: Connection reset
>   at java.net.SocketInputStream.read(SocketInputStream.java:196)
>   at java.net.SocketInputStream.read(SocketInputStream.java:122)
>   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>   at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
>   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
>   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:769)
>   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
>   at 
> java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:430)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.setupConnectionsWithRetry(Fetcher.java:395)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.openShuffleUrl(Fetcher.java:266)
>   at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java
> Such failures are counted as fetcher failures



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6704) Container fail to launch for mapred application

2016-10-21 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15596201#comment-15596201
 ] 

Andrew Wang commented on MAPREDUCE-6704:


Folks, is there any progress we can make on this JIRA? That this doesn't work 
out of the box anymore has been very surprising to our users. I'd like to get 
it fixed for alpha2 if possible.

> Container fail to launch for mapred application
> ---
>
> Key: MAPREDUCE-6704
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6704
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Bibin A Chundatt
>Priority: Blocker
> Attachments: 0001-MAPREDUCE-6704.patch, 0001-YARN-5026.patch
>
>
> Container fail to launch for mapred application.
> As part for launch script {{HADOOP_MAPRED_HOME}} default value is not set 
> .After 
> https://github.com/apache/hadoop/commit/9d4d30243b0fc9630da51a2c17b543ef671d035c
>{{HADOOP_MAPRED_HOME}} is not able to get from {{builder.environment()}} 
> since {{DefaultContainerExecutor#buildCommandExecutor}} sets inherit to false.
> {noformat}
> 16/05/02 09:16:05 INFO mapreduce.Job: Job job_1462155939310_0004 failed with 
> state FAILED due to: Application application_1462155939310_0004 failed 2 
> times due to AM Container for appattempt_1462155939310_0004_02 exited 
> with  exitCode: 1
> Failing this attempt.Diagnostics: Exception from container-launch.
> Container id: container_1462155939310_0004_02_01
> Exit code: 1
> Stack trace: ExitCodeException exitCode=1:
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:946)
> at org.apache.hadoop.util.Shell.run(Shell.java:850)
> at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1144)
> at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:227)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:385)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:281)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:89)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Container exited with a non-zero exit code 1. Last 4096 bytes of stderr :
> Java HotSpot(TM) 64-Bit Server VM warning: ignoring option UseSplitVerifier; 
> support was removed in 8.0
> Error: Could not find or load main class 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster
> Container exited with a non-zero exit code 1. Last 4096 bytes of stderr :
> Java HotSpot(TM) 64-Bit Server VM warning: ignoring option UseSplitVerifier; 
> support was removed in 8.0
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6772) Add MR Job Configurations for Containers reuse

2016-10-21 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15596078#comment-15596078
 ] 

Naganarasimha G R commented on MAPREDUCE-6772:
--

[~devaraj.k], 
Sorry have one last comment, Missed to mention that we need to capture the same 
in {{mapred-default.xml}}, hope we could capture the same with the proper patch 
name pattern. 

> Add MR Job Configurations for Containers reuse
> --
>
> Key: MAPREDUCE-6772
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6772
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: applicationmaster, mrv2
>Reporter: Devaraj K
>Assignee: Devaraj K
> Attachments: MAPREDUCE-6772-v0.patch, MAPREDUCE-6772-v1.patch, 
> MR-6749-MAPREDUCE-6772.003.patch
>
>
> This task adds configurations required for MR AM Container reuse feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6772) Add MR Job Configurations for Containers reuse

2016-10-21 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15596028#comment-15596028
 ] 

Naganarasimha G R commented on MAPREDUCE-6772:
--

[~devaraj.k],
Overall the latest patch LGTM committing this patch to the branch *"MR-6749"*, 
as this is first patch its fine, but for further patches we need to follow the 
naming convention as per https://wiki.apache.org/hadoop/HowToContribute #Naming 
your patch i.e. -..patch,, which would look 
like {{MAPREDUCE-6772-MR-6749.03.patch}}. This would help Jenkins to run the 
patch against the right branch.

> Add MR Job Configurations for Containers reuse
> --
>
> Key: MAPREDUCE-6772
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6772
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: applicationmaster, mrv2
>Reporter: Devaraj K
>Assignee: Devaraj K
> Attachments: MAPREDUCE-6772-v0.patch, MAPREDUCE-6772-v1.patch, 
> MR-6749-MAPREDUCE-6772.003.patch
>
>
> This task adds configurations required for MR AM Container reuse feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (MAPREDUCE-6797) Job history server scans can become blocked on a single, slow entry

2016-10-21 Thread Prabhu Joseph (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15594632#comment-15594632
 ] 

Prabhu Joseph edited comment on MAPREDUCE-6797 at 10/21/16 9:37 AM:


[~kasha] Multiple threads calling addIfAbsent simultaneously and processing a 
same HistoryFileInfo is possible and that won't face any issue even after 
removing synchronized block as the operations inside are thread safe and also 
idempotent. 


was (Author: prabhu joseph):
[~kasha] Multiple threads calling addIfAbsent simultaneously is possible and 
that won't face any issue after removing synchronized block as the operations 
inside Synchronized block are thread safe and also idempotent. 

> Job history server scans can become blocked on a single, slow entry
> ---
>
> Key: MAPREDUCE-6797
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6797
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Affects Versions: 2.4.0, 2.8.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Critical
> Fix For: 2.9.0
>
> Attachments: 0001-MAPREDUCE-6797.patch, jstack
>
>
> There is one more piece of code in HistoryFileManager where Synchronized 
> keyword on HistoryFileInfo need to be removed. The JobHistoryServer 
> contention issue is hit on our environment where stacktrace (attached) shows 
> the HistoryFileManager$JobListCache.addIfAbsent unnecessarily waiting to lock 
> on HistoryFileInfo.
> Synchronized on isMovePending and didMoveFail has been removed by 
> Mapreduce-6684.
> {code}
> HistoryFileInfo firstValue = cache.get(key);
> synchronized(firstValue) {  ---> Synchronized is not needed 
> here
>   if (firstValue.isMovePending()) {
> if(firstValue.didMoveFail() && 
> firstValue.jobIndexInfo.getFinishTime() <= cutoff) {
>   cache.remove(key);
>   //Now lets try to delete it
>   try {
> firstValue.delete();
>   } catch (IOException e) {
> LOG.error("Error while trying to delete history files" +
> " that could not be moved to done.", e);
>   }
> } else {
>   LOG.warn("Waiting to remove " + key
>   + " from JobListCache because it is not in done yet.");
> }
>   } else {
> cache.remove(key);
>   }
> }
> {code}
> {code}
> Note: stacktrace is from hadoop-2.4.0 version and the problem exists in 
> latest hadoop as well
> "2144820863@qtp-313351300-38156" daemon prio=10 tid=0x01e13800 
> nid=0xf133 waiting for monitor entry [0x7f7c1d8dd000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$JobListCache.addIfAbsent(HistoryFileManager.java:226)
> - waiting to lock <0x00040145c4d8> (a 
> org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$HistoryFileInfo)
> at 
> org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.scanIntermediateDirectory(HistoryFileManager.java:825)
> at 
> org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.access$200(HistoryFileManager.java:82)
> at 
> org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$UserLogDir.scanIfNeeded(HistoryFileManager.java:280)
> - locked <0x000400375388> (a 
> org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$UserLogDir)
> at 
> org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.scanIntermediateDirectory(HistoryFileManager.java:792)
> at 
> org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.getAllFileInfo(HistoryFileManager.java:920)
> at 
> org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.getAllPartialJobs(CachedHistoryStorage.java:156)
> at 
> org.apache.hadoop.mapreduce.v2.hs.JobHistory.getAllJobs(JobHistory.java:235)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6797) Job history server scans can become blocked on a single, slow entry

2016-10-21 Thread Prabhu Joseph (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15594632#comment-15594632
 ] 

Prabhu Joseph commented on MAPREDUCE-6797:
--

[~kasha] Multiple threads calling addIfAbsent simultaneously is possible and 
that won't face any issue after removing synchronized block as the operations 
inside Synchronized block are thread safe and also idempotent. 

> Job history server scans can become blocked on a single, slow entry
> ---
>
> Key: MAPREDUCE-6797
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6797
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Affects Versions: 2.4.0, 2.8.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Critical
> Fix For: 2.9.0
>
> Attachments: 0001-MAPREDUCE-6797.patch, jstack
>
>
> There is one more piece of code in HistoryFileManager where Synchronized 
> keyword on HistoryFileInfo need to be removed. The JobHistoryServer 
> contention issue is hit on our environment where stacktrace (attached) shows 
> the HistoryFileManager$JobListCache.addIfAbsent unnecessarily waiting to lock 
> on HistoryFileInfo.
> Synchronized on isMovePending and didMoveFail has been removed by 
> Mapreduce-6684.
> {code}
> HistoryFileInfo firstValue = cache.get(key);
> synchronized(firstValue) {  ---> Synchronized is not needed 
> here
>   if (firstValue.isMovePending()) {
> if(firstValue.didMoveFail() && 
> firstValue.jobIndexInfo.getFinishTime() <= cutoff) {
>   cache.remove(key);
>   //Now lets try to delete it
>   try {
> firstValue.delete();
>   } catch (IOException e) {
> LOG.error("Error while trying to delete history files" +
> " that could not be moved to done.", e);
>   }
> } else {
>   LOG.warn("Waiting to remove " + key
>   + " from JobListCache because it is not in done yet.");
> }
>   } else {
> cache.remove(key);
>   }
> }
> {code}
> {code}
> Note: stacktrace is from hadoop-2.4.0 version and the problem exists in 
> latest hadoop as well
> "2144820863@qtp-313351300-38156" daemon prio=10 tid=0x01e13800 
> nid=0xf133 waiting for monitor entry [0x7f7c1d8dd000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$JobListCache.addIfAbsent(HistoryFileManager.java:226)
> - waiting to lock <0x00040145c4d8> (a 
> org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$HistoryFileInfo)
> at 
> org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.scanIntermediateDirectory(HistoryFileManager.java:825)
> at 
> org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.access$200(HistoryFileManager.java:82)
> at 
> org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$UserLogDir.scanIfNeeded(HistoryFileManager.java:280)
> - locked <0x000400375388> (a 
> org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$UserLogDir)
> at 
> org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.scanIntermediateDirectory(HistoryFileManager.java:792)
> at 
> org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.getAllFileInfo(HistoryFileManager.java:920)
> at 
> org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.getAllPartialJobs(CachedHistoryStorage.java:156)
> at 
> org.apache.hadoop.mapreduce.v2.hs.JobHistory.getAllJobs(JobHistory.java:235)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-1211) Online aggregation and continuous query support

2016-10-21 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15594386#comment-15594386
 ] 

Reynold Xin commented on MAPREDUCE-1211:


This seems useful.


> Online aggregation and continuous query support
> ---
>
> Key: MAPREDUCE-1211
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1211
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: task
>Reporter: Tyson Condie
>Priority: Minor
>
> The purpose of this post is to propose a modified MapReduce architecture that 
> allows data to be pipelined between operators. This extends the MapReduce 
> programming model beyond batch processing, and can reduce completion times 
> and improve system utilization for batch jobs as well. We have built a 
> modified version of the Hadoop MapReduce framework that supports online 
> aggregation, which allows users to see "early returns" from a job as it is 
> being computed. Our Hadoop Online Prototype (HOP) also supports continuous 
> queries, which enable MapReduce programs to be written for applications such 
> as event monitoring and stream processing. HOP retains the fault tolerance 
> properties of Hadoop, and can run unmodified user-defined MapReduce programs.
> For more information on the HOP design, please see our technical report.
> http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-136.html
> Further details are discussed in the following blog posts.
> http://databeta.wordpress.com/2009/10/18/mapreduce-online/
> http://radar.oreilly.com/2009/10/pipelining-and-real-time-analytics-with-mapreduce-online.html
> http://dbmsmusings.blogspot.com/2009/10/analysis-of-mapreduce-online-paper.html
> The HOP code has been published at the following location.
> http://code.google.com/p/hop/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6799) Document mapreduce.jobhistory.webapp.https.address in mapred-default.xml

2016-10-21 Thread Akira Ajisaka (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated MAPREDUCE-6799:
-
Labels: newbie supportability  (was: newbie)

> Document mapreduce.jobhistory.webapp.https.address in mapred-default.xml
> 
>
> Key: MAPREDUCE-6799
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6799
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: jobhistoryserver
>Reporter: Akira Ajisaka
>Priority: Minor
>  Labels: newbie, supportability
>
> The default port number is 19890 but it is not documented.
> {code:title=JHAdminConfig.java}
>   public static final String MR_HISTORY_WEBAPP_HTTPS_ADDRESS =
>   MR_HISTORY_PREFIX + "webapp.https.address";
>   public static final int DEFAULT_MR_HISTORY_WEBAPP_HTTPS_PORT = 19890;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6799) Document mapreduce.jobhistory.webapp.https.address in mapred-default.xml

2016-10-21 Thread Akira Ajisaka (JIRA)
Akira Ajisaka created MAPREDUCE-6799:


 Summary: Document mapreduce.jobhistory.webapp.https.address in 
mapred-default.xml
 Key: MAPREDUCE-6799
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6799
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobhistoryserver
Reporter: Akira Ajisaka
Priority: Minor


The default port number is 19890 but it is not documented.
{code:title=JHAdminConfig.java}
  public static final String MR_HISTORY_WEBAPP_HTTPS_ADDRESS =
  MR_HISTORY_PREFIX + "webapp.https.address";
  public static final int DEFAULT_MR_HISTORY_WEBAPP_HTTPS_PORT = 19890;
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6797) Job history server scans can become blocked on a single, slow entry

2016-10-21 Thread Prabhu Joseph (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated MAPREDUCE-6797:
-
Attachment: 0001-MAPREDUCE-6797.patch

> Job history server scans can become blocked on a single, slow entry
> ---
>
> Key: MAPREDUCE-6797
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6797
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Affects Versions: 2.4.0, 2.8.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Critical
> Fix For: 2.9.0
>
> Attachments: 0001-MAPREDUCE-6797.patch, jstack
>
>
> There is one more piece of code in HistoryFileManager where Synchronized 
> keyword on HistoryFileInfo need to be removed. The JobHistoryServer 
> contention issue is hit on our environment where stacktrace (attached) shows 
> the HistoryFileManager$JobListCache.addIfAbsent unnecessarily waiting to lock 
> on HistoryFileInfo.
> Synchronized on isMovePending and didMoveFail has been removed by 
> Mapreduce-6684.
> {code}
> HistoryFileInfo firstValue = cache.get(key);
> synchronized(firstValue) {  ---> Synchronized is not needed 
> here
>   if (firstValue.isMovePending()) {
> if(firstValue.didMoveFail() && 
> firstValue.jobIndexInfo.getFinishTime() <= cutoff) {
>   cache.remove(key);
>   //Now lets try to delete it
>   try {
> firstValue.delete();
>   } catch (IOException e) {
> LOG.error("Error while trying to delete history files" +
> " that could not be moved to done.", e);
>   }
> } else {
>   LOG.warn("Waiting to remove " + key
>   + " from JobListCache because it is not in done yet.");
> }
>   } else {
> cache.remove(key);
>   }
> }
> {code}
> {code}
> Note: stacktrace is from hadoop-2.4.0 version and the problem exists in 
> latest hadoop as well
> "2144820863@qtp-313351300-38156" daemon prio=10 tid=0x01e13800 
> nid=0xf133 waiting for monitor entry [0x7f7c1d8dd000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$JobListCache.addIfAbsent(HistoryFileManager.java:226)
> - waiting to lock <0x00040145c4d8> (a 
> org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$HistoryFileInfo)
> at 
> org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.scanIntermediateDirectory(HistoryFileManager.java:825)
> at 
> org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.access$200(HistoryFileManager.java:82)
> at 
> org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$UserLogDir.scanIfNeeded(HistoryFileManager.java:280)
> - locked <0x000400375388> (a 
> org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$UserLogDir)
> at 
> org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.scanIntermediateDirectory(HistoryFileManager.java:792)
> at 
> org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.getAllFileInfo(HistoryFileManager.java:920)
> at 
> org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.getAllPartialJobs(CachedHistoryStorage.java:156)
> at 
> org.apache.hadoop.mapreduce.v2.hs.JobHistory.getAllJobs(JobHistory.java:235)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6797) Job history server scans can become blocked on a single, slow entry

2016-10-21 Thread Prabhu Joseph (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated MAPREDUCE-6797:
-
Fix Version/s: 2.9.0
   Status: Patch Available  (was: Open)

> Job history server scans can become blocked on a single, slow entry
> ---
>
> Key: MAPREDUCE-6797
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6797
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Affects Versions: 2.4.0, 2.8.0
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Critical
> Fix For: 2.9.0
>
> Attachments: 0001-MAPREDUCE-6797.patch, jstack
>
>
> There is one more piece of code in HistoryFileManager where Synchronized 
> keyword on HistoryFileInfo need to be removed. The JobHistoryServer 
> contention issue is hit on our environment where stacktrace (attached) shows 
> the HistoryFileManager$JobListCache.addIfAbsent unnecessarily waiting to lock 
> on HistoryFileInfo.
> Synchronized on isMovePending and didMoveFail has been removed by 
> Mapreduce-6684.
> {code}
> HistoryFileInfo firstValue = cache.get(key);
> synchronized(firstValue) {  ---> Synchronized is not needed 
> here
>   if (firstValue.isMovePending()) {
> if(firstValue.didMoveFail() && 
> firstValue.jobIndexInfo.getFinishTime() <= cutoff) {
>   cache.remove(key);
>   //Now lets try to delete it
>   try {
> firstValue.delete();
>   } catch (IOException e) {
> LOG.error("Error while trying to delete history files" +
> " that could not be moved to done.", e);
>   }
> } else {
>   LOG.warn("Waiting to remove " + key
>   + " from JobListCache because it is not in done yet.");
> }
>   } else {
> cache.remove(key);
>   }
> }
> {code}
> {code}
> Note: stacktrace is from hadoop-2.4.0 version and the problem exists in 
> latest hadoop as well
> "2144820863@qtp-313351300-38156" daemon prio=10 tid=0x01e13800 
> nid=0xf133 waiting for monitor entry [0x7f7c1d8dd000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$JobListCache.addIfAbsent(HistoryFileManager.java:226)
> - waiting to lock <0x00040145c4d8> (a 
> org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$HistoryFileInfo)
> at 
> org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.scanIntermediateDirectory(HistoryFileManager.java:825)
> at 
> org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.access$200(HistoryFileManager.java:82)
> at 
> org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$UserLogDir.scanIfNeeded(HistoryFileManager.java:280)
> - locked <0x000400375388> (a 
> org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager$UserLogDir)
> at 
> org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.scanIntermediateDirectory(HistoryFileManager.java:792)
> at 
> org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.getAllFileInfo(HistoryFileManager.java:920)
> at 
> org.apache.hadoop.mapreduce.v2.hs.CachedHistoryStorage.getAllPartialJobs(CachedHistoryStorage.java:156)
> at 
> org.apache.hadoop.mapreduce.v2.hs.JobHistory.getAllJobs(JobHistory.java:235)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org