date:20141125


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224654#comment-14224654
 ] 

Junping Du commented on MAPREDUCE-6168:
---

Hi [~zjshen], Thanks for reporting this.
From the track stack below:
{noformat}
org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup.findCounter(FrameworkCounterGroup.java:182)
at 
org.apache.hadoop.mapreduce.counters.AbstractCounters.findCounter(AbstractCounters.java:154)
{noformat}
It show that the client is still in 2.2 
(https://github.com/apache/hadoop/blob/branch-2.2/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/counters/FrameworkCounterGroup.java),
 and we fix the problem in 2.6 which means the client after 2.6 can be 
compatible with new counters in future (forward compatibility) but not means 
existing broken compatibility can be recovered (except we want to fix the code 
in branch-2.2). 
[~zjshen], I would like to fix this JIRA as won't fix. Do you agree?

 Old MR client is still broken when receiving new counters from MR job
 -

 Key: MAPREDUCE-6168
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6168
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Junping Du
Priority: Blocker

 In the following scenarios:
 1. Either insecure or secure;
 2. MR 2.2 with new shuffle on NM;
 3. Submitting via old client.
 We will see the following console exception:
 {code}
 14/11/17 14:56:19 INFO mapreduce.Job: Job job_1416264695865_0003 completed 
 successfully
 java.lang.IllegalArgumentException: No enum constant 
 org.apache.hadoop.mapreduce.JobCounter.MB_MILLIS_REDUCES
 at java.lang.Enum.valueOf(Enum.java:236)
 at 
 org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup.valueOf(FrameworkCounterGroup.java:148)
 at 
 org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup.findCounter(FrameworkCounterGroup.java:182)
 at 
 org.apache.hadoop.mapreduce.counters.AbstractCounters.findCounter(AbstractCounters.java:154)
 at 
 org.apache.hadoop.mapreduce.TypeConverter.fromYarn(TypeConverter.java:240)
 at 
 org.apache.hadoop.mapred.ClientServiceDelegate.getJobCounters(ClientServiceDelegate.java:370)
 at 
 org.apache.hadoop.mapred.YARNRunner.getJobCounters(YARNRunner.java:511)
 at org.apache.hadoop.mapreduce.Job$7.run(Job.java:756)
 at org.apache.hadoop.mapreduce.Job$7.run(Job.java:753)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
 at org.apache.hadoop.mapreduce.Job.getCounters(Job.java:753)
 at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1361)
 at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1289)
 at 
 org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:306)
 at 
 org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 at 
 org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
 at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
 at 
 org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 {code}
 The problem is supposed to be fixed by MAPREDUCE-5831, however, it seems that 
 we haven't cover all the problematic code path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-5831) Old MR client is not compatible with new MR application


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224663#comment-14224663
 ] 

Junping Du commented on MAPREDUCE-5831:
---

Hi [~zjshen], I think problem get fixed since 2.6. For 2.2, because it doesn't 
have any fix, the problem could be still there. Please see my comments in 
MAPREDUCE-6168.

 Old MR client is not compatible with new MR application
 ---

 Key: MAPREDUCE-5831
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5831
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client, mr-am
Affects Versions: 2.2.0, 2.3.0
Reporter: Zhijie Shen
Assignee: Junping Du
Priority: Blocker
 Fix For: 2.6.0

 Attachments: MAPREDUCE-5831-v2.patch, MAPREDUCE-5831-v3.patch, 
 MAPREDUCE-5831.patch


 Recently, we saw the following scenario:
 1. The user setup a cluster of hadoop 2.3., which contains YARN 2.3 and MR  
 2.3.
 2. The user client on a machine that MR 2.2 is installed and in the classpath.
 Then, when the user submitted a simple wordcount job, he saw the following 
 message:
 {code}
 16:00:41,027  INFO main mapreduce.Job:1345 -  map 100% reduce 100%
 16:00:41,036  INFO main mapreduce.Job:1356 - Job job_1396468045458_0006 
 completed successfully
 16:02:20,535  WARN main mapreduce.JobRunner:212 - Cannot start job 
 [wordcountJob]
 java.lang.IllegalArgumentException: No enum constant 
 org.apache.hadoop.mapreduce.JobCounter.MB_MILLIS_REDUCES
   at java.lang.Enum.valueOf(Enum.java:236)
   at 
 org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup.valueOf(FrameworkCounterGroup.java:148)
   at 
 org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup.findCounter(FrameworkCounterGroup.java:182)
   at 
 org.apache.hadoop.mapreduce.counters.AbstractCounters.findCounter(AbstractCounters.java:154)
   at 
 org.apache.hadoop.mapreduce.TypeConverter.fromYarn(TypeConverter.java:240)
   at 
 org.apache.hadoop.mapred.ClientServiceDelegate.getJobCounters(ClientServiceDelegate.java:370)
   at 
 org.apache.hadoop.mapred.YARNRunner.getJobCounters(YARNRunner.java:511)
   at org.apache.hadoop.mapreduce.Job$7.run(Job.java:756)
   at org.apache.hadoop.mapreduce.Job$7.run(Job.java:753)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
   at org.apache.hadoop.mapreduce.Job.getCounters(Job.java:753)
   at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1361)
   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1289)
 . . .
 {code}
 The problem is that the wordcount job was running on one or more than one 
 nodes of the YARN cluster, where MR 2.3 libs were installed, and 
 JobCounter.MB_MILLIS_REDUCES is available in the counters. On the other side, 
 due to the classpath setting, the client was likely to run with MR 2.2 libs. 
 After the client retrieved the counters from MR AM, it tried to construct the 
 Counter object with the received counter name. Unfortunately, the enum didn't 
 exist in the client's classpath. Therefore, No enum constant exception is 
 thrown here.
 JobCounter.MB_MILLIS_REDUCES is brought to MR2 via MAPREDUCE-5464 since 
 Hadoop 2.3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MAPREDUCE-6173) Document the configuration of deploying MR over distributed cache with enabling wired encryption at the same time

Junping Du created MAPREDUCE-6173:
-

 Summary: Document the configuration of deploying MR over 
distributed cache with enabling wired encryption at the same time
 Key: MAPREDUCE-6173
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6173
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distributed-cache, documentation
Affects Versions: 2.6.0
Reporter: Junping Du
Assignee: Junping Du


Use the current documented configuration (specified in 
http://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/DistributedCacheDeploy.html)
 to work with the cluster enabling shuffle encryption 
(http://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/EncryptedShuffle.html)
 will cause the job failed with exception below:
{noformat}
2014-10-10 02:17:16,600 WARN [fetcher#1] 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: Failed to connect to 
tassapol-centos5nano1-3.cs1cloud.internal:13562 with 1 map outputs
javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: 
PKIX path building failed: 
sun.security.provider.certpath.SunCertPathBuilderException: unable to find 
valid certification path to requested target
at com.sun.net.ssl.internal.ssl.Alerts.getSSLException(Alerts.java:174)
at 
com.sun.net.ssl.internal.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1731)
at com.sun.net.ssl.internal.ssl.Handshaker.fatalSE(Handshaker.java:241)
at com.sun.net.ssl.internal.ssl.Handshaker.fatalSE(Handshaker.java:235)
at 
com.sun.net.ssl.internal.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1206)
at 
com.sun.net.ssl.internal.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:136)
at 
com.sun.net.ssl.internal.ssl.Handshaker.processLoop(Handshaker.java:593)
at 
com.sun.net.ssl.internal.ssl.Handshaker.process_record(Handshaker.java:529)
at 
com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:925)
at 
com.sun.net.ssl.internal.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1170)
at 
com.sun.net.ssl.internal.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1197)
at 
com.sun.net.ssl.internal.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1181)
at 
sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:434)
at 
sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.setNewClient(AbstractDelegateHttpsURLConnection.java:81)
at 
sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.setNewClient(AbstractDelegateHttpsURLConnection.java:61)
at 
sun.net.www.protocol.http.HttpURLConnection.writeRequests(HttpURLConnection.java:584)
at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1193)
at 
java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:379)
at 
sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:318)
at 
org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:427)
{noformat}
This is due to ssl-client.xml is not included in MR tar ball when we deploy it 
over distributed cache. Putting the ssl-client.xml on CLASSPATH of MR job can 
resolve the problem and we should document it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6173) Document the configuration of deploying MR over distributed cache with enabling wired encryption at the same time


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated MAPREDUCE-6173:
--
Attachment: Screen Shot for MAPREDUCE-6173.png

 Document the configuration of deploying MR over distributed cache with 
 enabling wired encryption at the same time
 -

 Key: MAPREDUCE-6173
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6173
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distributed-cache, documentation
Affects Versions: 2.6.0
Reporter: Junping Du
Assignee: Junping Du
 Attachments: Screen Shot for MAPREDUCE-6173.png


 Use the current documented configuration (specified in 
 http://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/DistributedCacheDeploy.html)
  to work with the cluster enabling shuffle encryption 
 (http://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/EncryptedShuffle.html)
  will cause the job failed with exception below:
 {noformat}
 2014-10-10 02:17:16,600 WARN [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher: Failed to connect to 
 tassapol-centos5nano1-3.cs1cloud.internal:13562 with 1 map outputs
 javax.net.ssl.SSLHandshakeException: 
 sun.security.validator.ValidatorException: PKIX path building failed: 
 sun.security.provider.certpath.SunCertPathBuilderException: unable to find 
 valid certification path to requested target
   at com.sun.net.ssl.internal.ssl.Alerts.getSSLException(Alerts.java:174)
   at 
 com.sun.net.ssl.internal.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1731)
   at com.sun.net.ssl.internal.ssl.Handshaker.fatalSE(Handshaker.java:241)
   at com.sun.net.ssl.internal.ssl.Handshaker.fatalSE(Handshaker.java:235)
   at 
 com.sun.net.ssl.internal.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1206)
   at 
 com.sun.net.ssl.internal.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:136)
   at 
 com.sun.net.ssl.internal.ssl.Handshaker.processLoop(Handshaker.java:593)
   at 
 com.sun.net.ssl.internal.ssl.Handshaker.process_record(Handshaker.java:529)
   at 
 com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:925)
   at 
 com.sun.net.ssl.internal.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1170)
   at 
 com.sun.net.ssl.internal.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1197)
   at 
 com.sun.net.ssl.internal.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1181)
   at 
 sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:434)
   at 
 sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.setNewClient(AbstractDelegateHttpsURLConnection.java:81)
   at 
 sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.setNewClient(AbstractDelegateHttpsURLConnection.java:61)
   at 
 sun.net.www.protocol.http.HttpURLConnection.writeRequests(HttpURLConnection.java:584)
   at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1193)
   at 
 java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:379)
   at 
 sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:318)
   at 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:427)
 {noformat}
 This is due to ssl-client.xml is not included in MR tar ball when we deploy 
 it over distributed cache. Putting the ssl-client.xml on CLASSPATH of MR job 
 can resolve the problem and we should document it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6160) Potential NullPointerException in MRClientProtocol interface implementation.

2014-11-25 Thread Rohith (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated MAPREDUCE-6160:
--
Attachment: MAPREDUCE-6160.3.patch

 Potential NullPointerException in MRClientProtocol interface implementation.
 

 Key: MAPREDUCE-6160
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6160
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Rohith
Assignee: Rohith
 Attachments: MAPREDUCE-6160.1.patch, MAPREDUCE-6160.2.patch, 
 MAPREDUCE-6160.3.patch, MAPREDUCE-6160.patch, MAPREDUCE-6160.patch


 In the implementation of MRClientProtocol, many methods can throw 
 NullPointerExceptions. Instead of NullPointerExceptions, better to throw 
 IOException with proper message.
 In the HistoryClientService class and MRClientService class has 
 #verifyAndGetJob() method that return job object as null.
 {code}
 getTaskReport(GetTaskReportRequest request) throws IOException;
 getTaskAttemptReport(GetTaskAttemptReportRequest request) throws IOException;
 getCounters(GetCountersRequest request) throws IOException;
 getTaskAttemptCompletionEvents(GetTaskAttemptCompletionEventsRequest request) 
 throws IOException;
 getTaskReports(GetTaskReportsRequest request) throws IOException;
 getDiagnostics(GetDiagnosticsRequest request) throws IOException;
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6173) Document the configuration of deploying MR over distributed cache with enabling wired encryption at the same time


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated MAPREDUCE-6173:
--
Attachment: MAPREDUCE-6173.patch

 Document the configuration of deploying MR over distributed cache with 
 enabling wired encryption at the same time
 -

 Key: MAPREDUCE-6173
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6173
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distributed-cache, documentation
Affects Versions: 2.6.0
Reporter: Junping Du
Assignee: Junping Du
 Attachments: MAPREDUCE-6173.patch, Screen Shot for MAPREDUCE-6173.png


 Use the current documented configuration (specified in 
 http://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/DistributedCacheDeploy.html)
  to work with the cluster enabling shuffle encryption 
 (http://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/EncryptedShuffle.html)
  will cause the job failed with exception below:
 {noformat}
 2014-10-10 02:17:16,600 WARN [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher: Failed to connect to 
 tassapol-centos5nano1-3.cs1cloud.internal:13562 with 1 map outputs
 javax.net.ssl.SSLHandshakeException: 
 sun.security.validator.ValidatorException: PKIX path building failed: 
 sun.security.provider.certpath.SunCertPathBuilderException: unable to find 
 valid certification path to requested target
   at com.sun.net.ssl.internal.ssl.Alerts.getSSLException(Alerts.java:174)
   at 
 com.sun.net.ssl.internal.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1731)
   at com.sun.net.ssl.internal.ssl.Handshaker.fatalSE(Handshaker.java:241)
   at com.sun.net.ssl.internal.ssl.Handshaker.fatalSE(Handshaker.java:235)
   at 
 com.sun.net.ssl.internal.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1206)
   at 
 com.sun.net.ssl.internal.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:136)
   at 
 com.sun.net.ssl.internal.ssl.Handshaker.processLoop(Handshaker.java:593)
   at 
 com.sun.net.ssl.internal.ssl.Handshaker.process_record(Handshaker.java:529)
   at 
 com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:925)
   at 
 com.sun.net.ssl.internal.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1170)
   at 
 com.sun.net.ssl.internal.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1197)
   at 
 com.sun.net.ssl.internal.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1181)
   at 
 sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:434)
   at 
 sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.setNewClient(AbstractDelegateHttpsURLConnection.java:81)
   at 
 sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.setNewClient(AbstractDelegateHttpsURLConnection.java:61)
   at 
 sun.net.www.protocol.http.HttpURLConnection.writeRequests(HttpURLConnection.java:584)
   at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1193)
   at 
 java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:379)
   at 
 sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:318)
   at 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:427)
 {noformat}
 This is due to ssl-client.xml is not included in MR tar ball when we deploy 
 it over distributed cache. Putting the ssl-client.xml on CLASSPATH of MR job 
 can resolve the problem and we should document it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6173) Document the configuration of deploying MR over distributed cache with enabling wired encryption at the same time


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated MAPREDUCE-6173:
--
Status: Patch Available  (was: Open)

 Document the configuration of deploying MR over distributed cache with 
 enabling wired encryption at the same time
 -

 Key: MAPREDUCE-6173
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6173
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distributed-cache, documentation
Affects Versions: 2.6.0
Reporter: Junping Du
Assignee: Junping Du
 Attachments: MAPREDUCE-6173.patch, Screen Shot for MAPREDUCE-6173.png


 Use the current documented configuration (specified in 
 http://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/DistributedCacheDeploy.html)
  to work with the cluster enabling shuffle encryption 
 (http://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/EncryptedShuffle.html)
  will cause the job failed with exception below:
 {noformat}
 2014-10-10 02:17:16,600 WARN [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher: Failed to connect to 
 tassapol-centos5nano1-3.cs1cloud.internal:13562 with 1 map outputs
 javax.net.ssl.SSLHandshakeException: 
 sun.security.validator.ValidatorException: PKIX path building failed: 
 sun.security.provider.certpath.SunCertPathBuilderException: unable to find 
 valid certification path to requested target
   at com.sun.net.ssl.internal.ssl.Alerts.getSSLException(Alerts.java:174)
   at 
 com.sun.net.ssl.internal.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1731)
   at com.sun.net.ssl.internal.ssl.Handshaker.fatalSE(Handshaker.java:241)
   at com.sun.net.ssl.internal.ssl.Handshaker.fatalSE(Handshaker.java:235)
   at 
 com.sun.net.ssl.internal.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1206)
   at 
 com.sun.net.ssl.internal.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:136)
   at 
 com.sun.net.ssl.internal.ssl.Handshaker.processLoop(Handshaker.java:593)
   at 
 com.sun.net.ssl.internal.ssl.Handshaker.process_record(Handshaker.java:529)
   at 
 com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:925)
   at 
 com.sun.net.ssl.internal.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1170)
   at 
 com.sun.net.ssl.internal.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1197)
   at 
 com.sun.net.ssl.internal.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1181)
   at 
 sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:434)
   at 
 sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.setNewClient(AbstractDelegateHttpsURLConnection.java:81)
   at 
 sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.setNewClient(AbstractDelegateHttpsURLConnection.java:61)
   at 
 sun.net.www.protocol.http.HttpURLConnection.writeRequests(HttpURLConnection.java:584)
   at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1193)
   at 
 java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:379)
   at 
 sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:318)
   at 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:427)
 {noformat}
 This is due to ssl-client.xml is not included in MR tar ball when we deploy 
 it over distributed cache. Putting the ssl-client.xml on CLASSPATH of MR job 
 can resolve the problem and we should document it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6173) Document the configuration of deploying MR over distributed cache with enabling wired encryption at the same time


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224762#comment-14224762
 ] 

Junping Du commented on MAPREDUCE-6173:
---

Attach the first patch and screenshot for this.

 Document the configuration of deploying MR over distributed cache with 
 enabling wired encryption at the same time
 -

 Key: MAPREDUCE-6173
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6173
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distributed-cache, documentation
Affects Versions: 2.6.0
Reporter: Junping Du
Assignee: Junping Du
 Attachments: MAPREDUCE-6173.patch, Screen Shot for MAPREDUCE-6173.png


 Use the current documented configuration (specified in 
 http://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/DistributedCacheDeploy.html)
  to work with the cluster enabling shuffle encryption 
 (http://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/EncryptedShuffle.html)
  will cause the job failed with exception below:
 {noformat}
 2014-10-10 02:17:16,600 WARN [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher: Failed to connect to 
 tassapol-centos5nano1-3.cs1cloud.internal:13562 with 1 map outputs
 javax.net.ssl.SSLHandshakeException: 
 sun.security.validator.ValidatorException: PKIX path building failed: 
 sun.security.provider.certpath.SunCertPathBuilderException: unable to find 
 valid certification path to requested target
   at com.sun.net.ssl.internal.ssl.Alerts.getSSLException(Alerts.java:174)
   at 
 com.sun.net.ssl.internal.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1731)
   at com.sun.net.ssl.internal.ssl.Handshaker.fatalSE(Handshaker.java:241)
   at com.sun.net.ssl.internal.ssl.Handshaker.fatalSE(Handshaker.java:235)
   at 
 com.sun.net.ssl.internal.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1206)
   at 
 com.sun.net.ssl.internal.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:136)
   at 
 com.sun.net.ssl.internal.ssl.Handshaker.processLoop(Handshaker.java:593)
   at 
 com.sun.net.ssl.internal.ssl.Handshaker.process_record(Handshaker.java:529)
   at 
 com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:925)
   at 
 com.sun.net.ssl.internal.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1170)
   at 
 com.sun.net.ssl.internal.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1197)
   at 
 com.sun.net.ssl.internal.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1181)
   at 
 sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:434)
   at 
 sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.setNewClient(AbstractDelegateHttpsURLConnection.java:81)
   at 
 sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.setNewClient(AbstractDelegateHttpsURLConnection.java:61)
   at 
 sun.net.www.protocol.http.HttpURLConnection.writeRequests(HttpURLConnection.java:584)
   at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1193)
   at 
 java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:379)
   at 
 sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:318)
   at 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher.verifyConnection(Fetcher.java:427)
 {noformat}
 This is due to ssl-client.xml is not included in MR tar ball when we deploy 
 it over distributed cache. Putting the ssl-client.xml on CLASSPATH of MR job 
 can resolve the problem and we should document it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6160) Potential NullPointerException in MRClientProtocol interface implementation.

2014-11-25 Thread Rohith (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224764#comment-14224764
 ] 

Rohith commented on MAPREDUCE-6160:
---

bq. just one nit. Should we just say Unknown Job  + jobId for the error 
message?
I changed log as above.

I updated the patch with changing comment message for consistency. Please 
review.

 Potential NullPointerException in MRClientProtocol interface implementation.
 

 Key: MAPREDUCE-6160
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6160
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Rohith
Assignee: Rohith
 Attachments: MAPREDUCE-6160.1.patch, MAPREDUCE-6160.2.patch, 
 MAPREDUCE-6160.3.patch, MAPREDUCE-6160.patch, MAPREDUCE-6160.patch


 In the implementation of MRClientProtocol, many methods can throw 
 NullPointerExceptions. Instead of NullPointerExceptions, better to throw 
 IOException with proper message.
 In the HistoryClientService class and MRClientService class has 
 #verifyAndGetJob() method that return job object as null.
 {code}
 getTaskReport(GetTaskReportRequest request) throws IOException;
 getTaskAttemptReport(GetTaskAttemptReportRequest request) throws IOException;
 getCounters(GetCountersRequest request) throws IOException;
 getTaskAttemptCompletionEvents(GetTaskAttemptCompletionEventsRequest request) 
 throws IOException;
 getTaskReports(GetTaskReportsRequest request) throws IOException;
 getDiagnostics(GetDiagnosticsRequest request) throws IOException;
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6166) Reducers do not catch bad map output transfers during shuffle if data shuffled directly to disk

2014-11-25 Thread Eric Payne (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Payne updated MAPREDUCE-6166:
--
Attachment: MAPREDUCE-6166.v2.201411251627.txt

Thank you very much [~jira.shegalov] and [~jlowe] for your comments and help in 
making this patch better.

If it's okay, I would like to
# update this patch with the {{final}} keyword on {{JobConf jobConf}}
# Create a separate Jira for refactoring the code to inherit from a common 
class.

Uploading patch to cover #1.

 Reducers do not catch bad map output transfers during shuffle if data 
 shuffled directly to disk
 ---

 Key: MAPREDUCE-6166
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6166
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.6.0
Reporter: Eric Payne
Assignee: Eric Payne
 Attachments: MAPREDUCE-6166.v1.201411221941.txt, 
 MAPREDUCE-6166.v2.201411251627.txt


 In very large map/reduce jobs (5 maps, 2500 reducers), the intermediate 
 map partition output gets corrupted on disk on the map side. If this 
 corrupted map output is too large to shuffle in memory, the reducer streams 
 it to disk without validating the checksum. In jobs this large, it could take 
 hours before the reducer finally tries to read the corrupted file and fails. 
 Since retries of the failed reduce attempt will also take hours, this delay 
 in discovering the failure is multiplied greatly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6160) Potential NullPointerException in MRClientProtocol interface implementation.

2014-11-25 Thread Rohith (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohith updated MAPREDUCE-6160:
--
Status: Patch Available  (was: Open)

 Potential NullPointerException in MRClientProtocol interface implementation.
 

 Key: MAPREDUCE-6160
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6160
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Rohith
Assignee: Rohith
 Attachments: MAPREDUCE-6160.1.patch, MAPREDUCE-6160.2.patch, 
 MAPREDUCE-6160.3.patch, MAPREDUCE-6160.patch, MAPREDUCE-6160.patch


 In the implementation of MRClientProtocol, many methods can throw 
 NullPointerExceptions. Instead of NullPointerExceptions, better to throw 
 IOException with proper message.
 In the HistoryClientService class and MRClientService class has 
 #verifyAndGetJob() method that return job object as null.
 {code}
 getTaskReport(GetTaskReportRequest request) throws IOException;
 getTaskAttemptReport(GetTaskAttemptReportRequest request) throws IOException;
 getCounters(GetCountersRequest request) throws IOException;
 getTaskAttemptCompletionEvents(GetTaskAttemptCompletionEventsRequest request) 
 throws IOException;
 getTaskReports(GetTaskReportsRequest request) throws IOException;
 getDiagnostics(GetDiagnosticsRequest request) throws IOException;
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6173) Document the configuration of deploying MR over distributed cache with enabling wired encryption at the same time


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224828#comment-14224828
 ] 

Hadoop QA commented on MAPREDUCE-6173:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12683586/MAPREDUCE-6173.patch
  against trunk revision 61a2510.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5049//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5049//console

This message is automatically generated.

 Document the configuration of deploying MR over distributed cache with 
 enabling wired encryption at the same time
 -

 Key: MAPREDUCE-6173
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6173
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: distributed-cache, documentation
Affects Versions: 2.6.0
Reporter: Junping Du
Assignee: Junping Du
 Attachments: MAPREDUCE-6173.patch, Screen Shot for MAPREDUCE-6173.png


 Use the current documented configuration (specified in 
 http://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/DistributedCacheDeploy.html)
  to work with the cluster enabling shuffle encryption 
 (http://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/EncryptedShuffle.html)
  will cause the job failed with exception below:
 {noformat}
 2014-10-10 02:17:16,600 WARN [fetcher#1] 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher: Failed to connect to 
 tassapol-centos5nano1-3.cs1cloud.internal:13562 with 1 map outputs
 javax.net.ssl.SSLHandshakeException: 
 sun.security.validator.ValidatorException: PKIX path building failed: 
 sun.security.provider.certpath.SunCertPathBuilderException: unable to find 
 valid certification path to requested target
   at com.sun.net.ssl.internal.ssl.Alerts.getSSLException(Alerts.java:174)
   at 
 com.sun.net.ssl.internal.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1731)
   at com.sun.net.ssl.internal.ssl.Handshaker.fatalSE(Handshaker.java:241)
   at com.sun.net.ssl.internal.ssl.Handshaker.fatalSE(Handshaker.java:235)
   at 
 com.sun.net.ssl.internal.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1206)
   at 
 com.sun.net.ssl.internal.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:136)
   at 
 com.sun.net.ssl.internal.ssl.Handshaker.processLoop(Handshaker.java:593)
   at 
 com.sun.net.ssl.internal.ssl.Handshaker.process_record(Handshaker.java:529)
   at 
 com.sun.net.ssl.internal.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:925)
   at 
 com.sun.net.ssl.internal.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1170)
   at 
 com.sun.net.ssl.internal.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1197)
   at 
 com.sun.net.ssl.internal.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1181)
   at 
 sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:434)
   at 
 sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.setNewClient(AbstractDelegateHttpsURLConnection.java:81)
   at 
 sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.setNewClient(AbstractDelegateHttpsURLConnection.java:61)
   at 
 sun.net.www.protocol.http.HttpURLConnection.writeRequests(HttpURLConnection.java:584)
   at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1193)
   at 
 java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:379)
   at 
 sun.net.www.protocol.https.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:318)
   at

[jira] [Commented] (MAPREDUCE-6166) Reducers do not catch bad map output transfers during shuffle if data shuffled directly to disk


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224834#comment-14224834
 ] 

Hadoop QA commented on MAPREDUCE-6166:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12683592/MAPREDUCE-6166.v2.201411251627.txt
  against trunk revision 61a2510.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 eclipse:eclipse{color}.  The patch failed to build with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5050//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5050//console

This message is automatically generated.

 Reducers do not catch bad map output transfers during shuffle if data 
 shuffled directly to disk
 ---

 Key: MAPREDUCE-6166
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6166
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.6.0
Reporter: Eric Payne
Assignee: Eric Payne
 Attachments: MAPREDUCE-6166.v1.201411221941.txt, 
 MAPREDUCE-6166.v2.201411251627.txt


 In very large map/reduce jobs (5 maps, 2500 reducers), the intermediate 
 map partition output gets corrupted on disk on the map side. If this 
 corrupted map output is too large to shuffle in memory, the reducer streams 
 it to disk without validating the checksum. In jobs this large, it could take 
 hours before the reducer finally tries to read the corrupted file and fails. 
 Since retries of the failed reduce attempt will also take hours, this delay 
 in discovering the failure is multiplied greatly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6160) Potential NullPointerException in MRClientProtocol interface implementation.


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224858#comment-14224858
 ] 

Hadoop QA commented on MAPREDUCE-6160:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12683587/MAPREDUCE-6160.3.patch
  against trunk revision 61a2510.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5048//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5048//console

This message is automatically generated.

 Potential NullPointerException in MRClientProtocol interface implementation.
 

 Key: MAPREDUCE-6160
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6160
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Rohith
Assignee: Rohith
 Attachments: MAPREDUCE-6160.1.patch, MAPREDUCE-6160.2.patch, 
 MAPREDUCE-6160.3.patch, MAPREDUCE-6160.patch, MAPREDUCE-6160.patch


 In the implementation of MRClientProtocol, many methods can throw 
 NullPointerExceptions. Instead of NullPointerExceptions, better to throw 
 IOException with proper message.
 In the HistoryClientService class and MRClientService class has 
 #verifyAndGetJob() method that return job object as null.
 {code}
 getTaskReport(GetTaskReportRequest request) throws IOException;
 getTaskAttemptReport(GetTaskAttemptReportRequest request) throws IOException;
 getCounters(GetCountersRequest request) throws IOException;
 getTaskAttemptCompletionEvents(GetTaskAttemptCompletionEventsRequest request) 
 throws IOException;
 getTaskReports(GetTaskReportsRequest request) throws IOException;
 getDiagnostics(GetDiagnosticsRequest request) throws IOException;
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6168) Old MR client is still broken when receiving new counters from MR job


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224857#comment-14224857
 ] 

Karthik Kambatla commented on MAPREDUCE-6168:
-

I would call the issue fixed in 2.x only if 2.2 client is able to talk to 2.x 
server and viceversa. This doesn't seem to be the case for 2.6. Let me know if 
I am missing something here. 

 Old MR client is still broken when receiving new counters from MR job
 -

 Key: MAPREDUCE-6168
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6168
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Junping Du
Priority: Blocker

 In the following scenarios:
 1. Either insecure or secure;
 2. MR 2.2 with new shuffle on NM;
 3. Submitting via old client.
 We will see the following console exception:
 {code}
 14/11/17 14:56:19 INFO mapreduce.Job: Job job_1416264695865_0003 completed 
 successfully
 java.lang.IllegalArgumentException: No enum constant 
 org.apache.hadoop.mapreduce.JobCounter.MB_MILLIS_REDUCES
 at java.lang.Enum.valueOf(Enum.java:236)
 at 
 org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup.valueOf(FrameworkCounterGroup.java:148)
 at 
 org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup.findCounter(FrameworkCounterGroup.java:182)
 at 
 org.apache.hadoop.mapreduce.counters.AbstractCounters.findCounter(AbstractCounters.java:154)
 at 
 org.apache.hadoop.mapreduce.TypeConverter.fromYarn(TypeConverter.java:240)
 at 
 org.apache.hadoop.mapred.ClientServiceDelegate.getJobCounters(ClientServiceDelegate.java:370)
 at 
 org.apache.hadoop.mapred.YARNRunner.getJobCounters(YARNRunner.java:511)
 at org.apache.hadoop.mapreduce.Job$7.run(Job.java:756)
 at org.apache.hadoop.mapreduce.Job$7.run(Job.java:753)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
 at org.apache.hadoop.mapreduce.Job.getCounters(Job.java:753)
 at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1361)
 at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1289)
 at 
 org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:306)
 at 
 org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 at 
 org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
 at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
 at 
 org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 {code}
 The problem is supposed to be fixed by MAPREDUCE-5831, however, it seems that 
 we haven't cover all the problematic code path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6168) Old MR client is still broken when receiving new counters from MR job


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224903#comment-14224903
 ] 

Junping Du commented on MAPREDUCE-6168:
---

Thanks for comments [~kasha]!
bq. I would call the issue fixed in 2.x only if 2.2 client is able to talk to 
2.x server and viceversa. This doesn't seem to be the case for 2.6. Let me know 
if I am missing something here. 
If 2.2 client cannot talk with 2.4 and 2.5 in the sever side, does the meaning 
too much for 2.2 client to talk with 2.6 server? The problem exists in client 
side that exception get thrown when getting new counters from server in new 
version. If we want to fix it in server side, we should have some mechanism to 
filter the new counter in server side with detecting the version of client. So 
far from what I know, we don't have versioning differentiation for MR client 
that make the fix in server side harder. If we are adding some new version 
detection mechanism here after 2.6.x, then we cannot differentiate previous 
release versions so Counter filter in server side still cannot work. Thoughts?

 Old MR client is still broken when receiving new counters from MR job
 -

 Key: MAPREDUCE-6168
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6168
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Junping Du
Priority: Blocker

 In the following scenarios:
 1. Either insecure or secure;
 2. MR 2.2 with new shuffle on NM;
 3. Submitting via old client.
 We will see the following console exception:
 {code}
 14/11/17 14:56:19 INFO mapreduce.Job: Job job_1416264695865_0003 completed 
 successfully
 java.lang.IllegalArgumentException: No enum constant 
 org.apache.hadoop.mapreduce.JobCounter.MB_MILLIS_REDUCES
 at java.lang.Enum.valueOf(Enum.java:236)
 at 
 org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup.valueOf(FrameworkCounterGroup.java:148)
 at 
 org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup.findCounter(FrameworkCounterGroup.java:182)
 at 
 org.apache.hadoop.mapreduce.counters.AbstractCounters.findCounter(AbstractCounters.java:154)
 at 
 org.apache.hadoop.mapreduce.TypeConverter.fromYarn(TypeConverter.java:240)
 at 
 org.apache.hadoop.mapred.ClientServiceDelegate.getJobCounters(ClientServiceDelegate.java:370)
 at 
 org.apache.hadoop.mapred.YARNRunner.getJobCounters(YARNRunner.java:511)
 at org.apache.hadoop.mapreduce.Job$7.run(Job.java:756)
 at org.apache.hadoop.mapreduce.Job$7.run(Job.java:753)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
 at org.apache.hadoop.mapreduce.Job.getCounters(Job.java:753)
 at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1361)
 at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1289)
 at 
 org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:306)
 at 
 org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 at 
 org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
 at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
 at 
 org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 {code}
 The problem is supposed to be fixed by MAPREDUCE-5831, however, it seems that 
 we haven't cover all the problematic code path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6168) Old MR client is still broken when receiving new counters from MR job

2014-11-25 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224974#comment-14224974
 ] 

Zhijie Shen commented on MAPREDUCE-6168:


Now I can recall the context before: MR Client 2.2 and 2.3 is not able to talk 
to MR 2.4+ Runtime (including the rencently released 2.6), because more 
counters are introduced into MR 2.4. In MAPREDUCE-5831, we finally choose to 
fix the client, such that in even later version, if we introduce more counters, 
MR Client 2.6+ is able to be compatible to them. However, for MR 2.2 and 2.3, 
the problem is not fixed unless we back port the fix to these two versions, or 
we fix the runtime not to emit the new counters to the old client. As to the 
latter option, it's going to be hard because we don't have the client version 
information in the communication protocol. I'm Okay if we resolve this issue as 
won't fix.

 Old MR client is still broken when receiving new counters from MR job
 -

 Key: MAPREDUCE-6168
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6168
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Junping Du
Priority: Blocker

 In the following scenarios:
 1. Either insecure or secure;
 2. MR 2.2 with new shuffle on NM;
 3. Submitting via old client.
 We will see the following console exception:
 {code}
 14/11/17 14:56:19 INFO mapreduce.Job: Job job_1416264695865_0003 completed 
 successfully
 java.lang.IllegalArgumentException: No enum constant 
 org.apache.hadoop.mapreduce.JobCounter.MB_MILLIS_REDUCES
 at java.lang.Enum.valueOf(Enum.java:236)
 at 
 org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup.valueOf(FrameworkCounterGroup.java:148)
 at 
 org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup.findCounter(FrameworkCounterGroup.java:182)
 at 
 org.apache.hadoop.mapreduce.counters.AbstractCounters.findCounter(AbstractCounters.java:154)
 at 
 org.apache.hadoop.mapreduce.TypeConverter.fromYarn(TypeConverter.java:240)
 at 
 org.apache.hadoop.mapred.ClientServiceDelegate.getJobCounters(ClientServiceDelegate.java:370)
 at 
 org.apache.hadoop.mapred.YARNRunner.getJobCounters(YARNRunner.java:511)
 at org.apache.hadoop.mapreduce.Job$7.run(Job.java:756)
 at org.apache.hadoop.mapreduce.Job$7.run(Job.java:753)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
 at org.apache.hadoop.mapreduce.Job.getCounters(Job.java:753)
 at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1361)
 at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1289)
 at 
 org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:306)
 at 
 org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 at 
 org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
 at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
 at 
 org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 {code}
 The problem is supposed to be fixed by MAPREDUCE-5831, however, it seems that 
 we haven't cover all the problematic code path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-5568) JHS returns invalid string for reducer completion percentage if AM restarts with 0 reducer.

2014-11-25 Thread Jian He (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated MAPREDUCE-5568:
---
  Resolution: Fixed
   Fix Version/s: 2.7.0
Target Version/s: 2.7.0
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed to trunk and branch-2,  thanks [~minjikim] !

 JHS returns invalid string for reducer completion percentage if AM restarts 
 with 0 reducer.
 ---

 Key: MAPREDUCE-5568
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5568
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.4.1, 2.5.1
Reporter: Jian He
Assignee: MinJi Kim
 Fix For: 2.7.0

 Attachments: 5568.patch01, 5568.patch02, 5568.patch03, 5568.patch04


 JobCLient shows like:
 {code}
 13/10/05 16:26:09 INFO mapreduce.Job:  map 100% reduce NaN%
 13/10/05 16:26:09 INFO mapreduce.Job: Job job_1381015536254_0001 completed 
 successfully
 13/10/05 16:26:09 INFO mapreduce.Job: Counters: 26
   File System Counters
   FILE: Number of bytes read=0
   FILE: Number of bytes written=76741
   FILE: Number of read operations=0
   FILE: Number of large read operations=0
   FILE: Number of write operations=0
   HDFS: Number of bytes read=48
   HDFS: Number of bytes written=0
   HDFS: Number of read operations=1
   HDFS: Number of large read operations=0
   HDFS: Number of write operations=0
 {code}
 With mapped job -status command, it shows:
 {code}
 Uber job : false
 Number of maps: 1
 Number of reduces: 0
 map() completion: 1.0
 reduce() completion: NaN
 Job state: SUCCEEDED
 retired: false
 reason for failure:
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-5568) JHS returns invalid string for reducer completion percentage if AM restarts with 0 reducer.

2014-11-25 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225213#comment-14225213
 ] 

Hudson commented on MAPREDUCE-5568:
---

FAILURE: Integrated in Hadoop-trunk-Commit #6603 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6603/])
MAPREDUCE-5568. Fixed CompletedJob in JHS to show progress percentage correctly 
in case the number of mappers or reducers is zero. Contributed by MinJi Kim 
(jianhe: rev 78f7cdbfd6e2b9fac51c369c748ae93d12ef065a)
* hadoop-mapreduce-project/CHANGES.txt
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/java/org/apache/hadoop/mapreduce/v2/hs/TestJobHistoryEntities.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/CompletedJob.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/resources/job_1416424547277_0002-1416424775281-root-TeraGen-1416424785433-2-0-SUCCEEDED-default-1416424779349.jhist


 JHS returns invalid string for reducer completion percentage if AM restarts 
 with 0 reducer.
 ---

 Key: MAPREDUCE-5568
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5568
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.4.1, 2.5.1
Reporter: Jian He
Assignee: MinJi Kim
 Fix For: 2.7.0

 Attachments: 5568.patch01, 5568.patch02, 5568.patch03, 5568.patch04


 JobCLient shows like:
 {code}
 13/10/05 16:26:09 INFO mapreduce.Job:  map 100% reduce NaN%
 13/10/05 16:26:09 INFO mapreduce.Job: Job job_1381015536254_0001 completed 
 successfully
 13/10/05 16:26:09 INFO mapreduce.Job: Counters: 26
   File System Counters
   FILE: Number of bytes read=0
   FILE: Number of bytes written=76741
   FILE: Number of read operations=0
   FILE: Number of large read operations=0
   FILE: Number of write operations=0
   HDFS: Number of bytes read=48
   HDFS: Number of bytes written=0
   HDFS: Number of read operations=1
   HDFS: Number of large read operations=0
   HDFS: Number of write operations=0
 {code}
 With mapped job -status command, it shows:
 {code}
 Uber job : false
 Number of maps: 1
 Number of reduces: 0
 map() completion: 1.0
 reduce() completion: NaN
 Job state: SUCCEEDED
 retired: false
 reason for failure:
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-5568) JHS returns invalid string for reducer completion percentage if AM restarts with 0 reducer.

2014-11-25 Thread MinJi Kim (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225293#comment-14225293
 ] 

MinJi Kim commented on MAPREDUCE-5568:
--

Awesome.  Thanks, @jianhe!

 JHS returns invalid string for reducer completion percentage if AM restarts 
 with 0 reducer.
 ---

 Key: MAPREDUCE-5568
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5568
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.4.1, 2.5.1
Reporter: Jian He
Assignee: MinJi Kim
 Fix For: 2.7.0

 Attachments: 5568.patch01, 5568.patch02, 5568.patch03, 5568.patch04


 JobCLient shows like:
 {code}
 13/10/05 16:26:09 INFO mapreduce.Job:  map 100% reduce NaN%
 13/10/05 16:26:09 INFO mapreduce.Job: Job job_1381015536254_0001 completed 
 successfully
 13/10/05 16:26:09 INFO mapreduce.Job: Counters: 26
   File System Counters
   FILE: Number of bytes read=0
   FILE: Number of bytes written=76741
   FILE: Number of read operations=0
   FILE: Number of large read operations=0
   FILE: Number of write operations=0
   HDFS: Number of bytes read=48
   HDFS: Number of bytes written=0
   HDFS: Number of read operations=1
   HDFS: Number of large read operations=0
   HDFS: Number of write operations=0
 {code}
 With mapped job -status command, it shows:
 {code}
 Uber job : false
 Number of maps: 1
 Number of reduces: 0
 map() completion: 1.0
 reduce() completion: NaN
 Job state: SUCCEEDED
 retired: false
 reason for failure:
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-5785) Derive heap size or mapreduce.*.memory.mb automatically

2014-11-25 Thread Jian He (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225462#comment-14225462
]

Jian He commented on MAPREDUCE-5785:

After this patch, job somehow fails due to not able to launch task container
{{Error: Could not find or load main class null}}. (might be my own setup
problem)

Derive heap size or mapreduce.*.memory.mb automatically
---

Key: MAPREDUCE-5785
URL: https://issues.apache.org/jira/browse/MAPREDUCE-5785
Project: Hadoop Map/Reduce
Issue Type: New Feature
Components: mr-am, task
Reporter: Gera Shegalov
Assignee: Gera Shegalov
Fix For: 3.0.0

Attachments: MAPREDUCE-5785.v01.patch, MAPREDUCE-5785.v02.patch,
MAPREDUCE-5785.v03.patch, mr-5785-4.patch, mr-5785-5.patch, mr-5785-6.patch

Currently users have to set 2 memory-related configs per Job / per task type.
One first chooses some container size map reduce.\*.memory.mb and then a
corresponding maximum Java heap size Xmx map reduce.\*.memory.mb. This
makes sure that the JVM's C-heap (native memory + Java heap) does not exceed
this mapreduce.*.memory.mb. If one forgets to tune Xmx, MR-AM might be
- allocating big containers whereas the JVM will only use the default
-Xmx200m.
- allocating small containers that will OOM because Xmx is too high.
With this JIRA, we propose to set Xmx automatically based on an empirical
ratio that can be adjusted. Xmx is not changed automatically if provided by
the user.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-5785) Derive heap size or mapreduce.*.memory.mb automatically

[
https://issues.apache.org/jira/browse/MAPREDUCE-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225468#comment-14225468
]

Karthik Kambatla commented on MAPREDUCE-5785:
-

Actually, this is a bug with the patch itself. It might be best to revert it
for now until we fix the issue. Reverting it.

Derive heap size or mapreduce.*.memory.mb automatically
---

Attachments: MAPREDUCE-5785.v01.patch, MAPREDUCE-5785.v02.patch,
MAPREDUCE-5785.v03.patch, mr-5785-4.patch, mr-5785-5.patch, mr-5785-6.patch

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-5785) Derive heap size or mapreduce.*.memory.mb automatically


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225470#comment-14225470
 ] 

Karthik Kambatla commented on MAPREDUCE-5785:
-

Reverted. Let me fix the bug and post another patch. 

 Derive heap size or mapreduce.*.memory.mb automatically
 ---

 Key: MAPREDUCE-5785
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5785
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: mr-am, task
Reporter: Gera Shegalov
Assignee: Gera Shegalov
 Fix For: 3.0.0

 Attachments: MAPREDUCE-5785.v01.patch, MAPREDUCE-5785.v02.patch, 
 MAPREDUCE-5785.v03.patch, mr-5785-4.patch, mr-5785-5.patch, mr-5785-6.patch


 Currently users have to set 2 memory-related configs per Job / per task type. 
  One first chooses some container size map reduce.\*.memory.mb and then a 
 corresponding maximum Java heap size Xmx  map reduce.\*.memory.mb. This 
 makes sure that the JVM's C-heap (native memory + Java heap) does not exceed 
 this mapreduce.*.memory.mb. If one forgets to tune Xmx, MR-AM might be 
 - allocating big containers whereas the JVM will only use the default 
 -Xmx200m.
 - allocating small containers that will OOM because Xmx is too high.
 With this JIRA, we propose to set Xmx automatically based on an empirical 
 ratio that can be adjusted. Xmx is not changed automatically if provided by 
 the user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-5785) Derive heap size or mapreduce.*.memory.mb automatically

2014-11-25 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225487#comment-14225487
 ] 

Hudson commented on MAPREDUCE-5785:
---

FAILURE: Integrated in Hadoop-trunk-Commit #6607 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/6607/])
Revert MAPREDUCE-5785. Derive heap size or mapreduce.*.memory.mb 
automatically. (Gera Shegalov and Karthik Kambatla via kasha) (kasha: rev 
a655973e781caf662b360c96e0fa3f5a873cf676)
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TestMapReduceChildJVM.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/job/impl/TaskAttemptImpl.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/JobConf.java
* hadoop-mapreduce-project/CHANGES.txt
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/MapReduceChildJVM.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java
* 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Task.java


 Derive heap size or mapreduce.*.memory.mb automatically
 ---

 Key: MAPREDUCE-5785
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5785
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: mr-am, task
Reporter: Gera Shegalov
Assignee: Gera Shegalov
 Fix For: 3.0.0

 Attachments: MAPREDUCE-5785.v01.patch, MAPREDUCE-5785.v02.patch, 
 MAPREDUCE-5785.v03.patch, mr-5785-4.patch, mr-5785-5.patch, mr-5785-6.patch


 Currently users have to set 2 memory-related configs per Job / per task type. 
  One first chooses some container size map reduce.\*.memory.mb and then a 
 corresponding maximum Java heap size Xmx  map reduce.\*.memory.mb. This 
 makes sure that the JVM's C-heap (native memory + Java heap) does not exceed 
 this mapreduce.*.memory.mb. If one forgets to tune Xmx, MR-AM might be 
 - allocating big containers whereas the JVM will only use the default 
 -Xmx200m.
 - allocating small containers that will OOM because Xmx is too high.
 With this JIRA, we propose to set Xmx automatically based on an empirical 
 ratio that can be adjusted. Xmx is not changed automatically if provided by 
 the user.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6168) Old MR client is still broken when receiving new counters from MR job


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225572#comment-14225572
 ] 

Karthik Kambatla commented on MAPREDUCE-6168:
-

Thanks for the details, Junping and Zhijie. It is unfortunate we missed it in 
2.2. I guess we can't do much at this point beyond calling it compatible for 
2.6 onwards. I am fine with resolving it as Won't Fix too. 

 Old MR client is still broken when receiving new counters from MR job
 -

 Key: MAPREDUCE-6168
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6168
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Junping Du
Priority: Blocker

 In the following scenarios:
 1. Either insecure or secure;
 2. MR 2.2 with new shuffle on NM;
 3. Submitting via old client.
 We will see the following console exception:
 {code}
 14/11/17 14:56:19 INFO mapreduce.Job: Job job_1416264695865_0003 completed 
 successfully
 java.lang.IllegalArgumentException: No enum constant 
 org.apache.hadoop.mapreduce.JobCounter.MB_MILLIS_REDUCES
 at java.lang.Enum.valueOf(Enum.java:236)
 at 
 org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup.valueOf(FrameworkCounterGroup.java:148)
 at 
 org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup.findCounter(FrameworkCounterGroup.java:182)
 at 
 org.apache.hadoop.mapreduce.counters.AbstractCounters.findCounter(AbstractCounters.java:154)
 at 
 org.apache.hadoop.mapreduce.TypeConverter.fromYarn(TypeConverter.java:240)
 at 
 org.apache.hadoop.mapred.ClientServiceDelegate.getJobCounters(ClientServiceDelegate.java:370)
 at 
 org.apache.hadoop.mapred.YARNRunner.getJobCounters(YARNRunner.java:511)
 at org.apache.hadoop.mapreduce.Job$7.run(Job.java:756)
 at org.apache.hadoop.mapreduce.Job$7.run(Job.java:753)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
 at org.apache.hadoop.mapreduce.Job.getCounters(Job.java:753)
 at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1361)
 at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1289)
 at 
 org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:306)
 at 
 org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 at 
 org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
 at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
 at 
 org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
 {code}
 The problem is supposed to be fixed by MAPREDUCE-5831, however, it seems that 
 we haven't cover all the problematic code path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-5932) Provide an option to use a dedicated reduce-side shuffle log

2014-11-25 Thread Gera Shegalov (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gera Shegalov updated MAPREDUCE-5932:
-
Attachment: MAPREDUCE-5932.v05.patch

bq.  There's a lot of ways to do this, including tacking on yet another boolean 
(which isn't great for readability given there's now one for the AppMaster), 
passing an enum that can differentiate AM/map/task (not sure one exists to 
reuse), pass the Task/TaskId object and null means AM, etc.

Hi [~jlowe], I opted for passing {{task}} as the easiest option. 

 Provide an option to use a dedicated reduce-side shuffle log
 

 Key: MAPREDUCE-5932
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5932
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Affects Versions: 2.4.0
Reporter: Gera Shegalov
Assignee: Gera Shegalov
 Attachments: MAPREDUCE-5932.v01.patch, MAPREDUCE-5932.v02.patch, 
 MAPREDUCE-5932.v03.patch, MAPREDUCE-5932.v04.patch, MAPREDUCE-5932.v05.patch


 For reducers in large jobs our users cannot easily spot portions of the log 
 associated with problems with their code. An example reducer with INFO-level 
 logging generates ~3500 lines / ~700KiB  lines per second. 95% of the log is 
 the client-side of the shuffle {{org.apache.hadoop.mapreduce.task.reduce.*}}
 {code}
 $ wc syslog 
 3642   48192  691013 syslog
 $ grep task.reduce syslog | wc 
 3424   46534  659038
 $ grep task.reduce.ShuffleScheduler syslog | wc 
 1521   17745  251458
 $ grep task.reduce.Fetcher syslog | wc 
 1045   15340  223683
 $ grep task.reduce.InMemoryMapOutput syslog | wc 
  4004800   72060
 $ grep task.reduce.MergeManagerImpl syslog | wc 
  4328200  106555
 {code}
 Byte percentage breakdown:
 {code}
 Shuffle total:   95%
 ShuffleScheduler:36%
 Fetcher: 32%
 InMemoryMapOutput:   10%
 MergeManagerImpl:15%
 {code}
 While this is information is actually often useful for devops debugging 
 shuffle performance issues, the job users are often lost. 
 We propose to have a dedicated syslog.shuffle file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6171) The visibilities of the distributed cache files and archives should be determined by both their permissions and if they are located in HDFS encryption zone

2014-11-25 Thread Arun Suresh (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225672#comment-14225672
 ] 

Arun Suresh commented on MAPREDUCE-6171:


[~dian.fu], Any reason why _yarn_ user is blacklisted from _DECRYPT_EEK_ calls 
? My understanding was that only the HDFS admin ie. the _hdfs_ user only needs 
to be blacklisted

 The visibilities of the distributed cache files and archives should be 
 determined by both their permissions and if they are located in HDFS 
 encryption zone
 ---

 Key: MAPREDUCE-6171
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6171
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: security
Reporter: Dian Fu

 The visibilities of the distributed cache files and archives are currently 
 determined by the permission of these files or archives. 
 The following is the logic of method isPublic() in class 
 ClientDistributedCacheManager:
 {code}
 static boolean isPublic(Configuration conf, URI uri,
   MapURI, FileStatus statCache) throws IOException {
 FileSystem fs = FileSystem.get(uri, conf);
 Path current = new Path(uri.getPath());
 //the leaf level file should be readable by others
 if (!checkPermissionOfOther(fs, current, FsAction.READ, statCache)) {
   return false;
 }
 return ancestorsHaveExecutePermissions(fs, current.getParent(), 
 statCache);
   }
 {code}
 At NodeManager side, it will use yarn user to download public files and use 
 the user who submits the job to download private files. In normal cases, 
 there is no problem with this. However, if the files are located in an 
 encryption zone(HDFS-6134) and yarn user are configured to be disallowed to 
 fetch the DataEncryptionKey(DEK) of this encryption zone by KMS, the download 
 process of this file will fail. 
 You can reproduce this issue with the following steps (assume you submit job 
 with user testUser): 
 # create a clean cluster which has HDFS cryptographic FileSystem feature
 # create directory /data/ in HDFS and make it as an encryption zone with 
 keyName testKey
 # configure KMS to only allow user testUser can decrypt DEK of key 
 testKey in KMS
 {code}
   property
 namekey.acl.testKey.DECRYPT_EEK/name
 valuetestUser/value
   /property
 {code}
 # execute job teragen with user testUser:
 {code}
 su -s /bin/bash testUser -c hadoop jar hadoop-mapreduce-examples*.jar 
 teragen 1 /data/terasort-input 
 {code}
 # execute job terasort with user testUser:
 {code}
 su -s /bin/bash testUser -c hadoop jar hadoop-mapreduce-examples*.jar 
 terasort /data/terasort-input /data/terasort-output
 {code}
 You will see logs like this at the job submitter's console:
 {code}
 INFO mapreduce.Job: Job job_1416860917658_0002 failed with state FAILED due 
 to: Application application_1416860917658_0002 failed 2 times due to AM 
 Container for appattempt_1416860917658_0002_02 exited with  exitCode: 
 -1000 due to: org.apache.hadoop.security.authorize.AuthorizationException: 
 User [yarn] is not authorized to perform [DECRYPT_EEK] on key with ACL name 
 [testKey]!!
 {code}
 The initial idea to solve this issue is to modify the logic in 
 ClientDistributedCacheManager.isPublic to consider also whether this file is 
 in an encryption zone. If it is in an encryption zone, this file should be 
 considered as private. Then at NodeManager side, it will use user who submits 
 the job to fetch the file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-5932) Provide an option to use a dedicated reduce-side shuffle log