[jira] [Commented] (MAPREDUCE-6204) TestJobCounters should use new properties instead JobConf.MAPRED_TASK_JAVA_OPTS

2015-05-18 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548155#comment-14548155
 ] 

Tsuyoshi Ozawa commented on MAPREDUCE-6204:
---

[~sam liu], thank you for pinging us. Let me clarify one point: the 
configurations as follows looks set by your configurations as [~jira.shegalov] 
mentioned. Is it right?

{code}
-Xmx1000m -Xms1000m -Xmn100m -Xtune:virtualized 
-Xshareclasses:name=mrscc_%g,groupAccess,cacheDir=/var/hadoop/tmp,nonFatal 
-Xscmx20m -Xdump:java:file=/var/hadoop/tmp/javacore.%Y%m%d.%H%M%S.%pid.%seq.txt 
-Xdump:heap:file=/var/hadoop/tmp/heapdump.%Y%m%d.%H%M%S.%pid.%seq.phd
{code}

 TestJobCounters should use new properties instead 
 JobConf.MAPRED_TASK_JAVA_OPTS
 ---

 Key: MAPREDUCE-6204
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6204
 Project: Hadoop Map/Reduce
  Issue Type: Test
  Components: test
Affects Versions: 2.6.0
Reporter: sam liu
Assignee: sam liu
Priority: Minor
  Labels: BB2015-05-RFC
 Attachments: MAPREDUCE-6204-1.patch, MAPREDUCE-6204-2.patch, 
 MAPREDUCE-6204-3.patch, MAPREDUCE-6204-4.patch, MAPREDUCE-6204.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5965) Hadoop streaming throws error if list of input files is high. Error is: error=7, Argument list too long at if number of input file is high

2015-05-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547929#comment-14547929
 ] 

Hadoop QA commented on MAPREDUCE-5965:
--

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  15m 10s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 48s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 47s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 21s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 25s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 34s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 42s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | tools/hadoop tests |   6m 14s | Tests passed in 
hadoop-streaming. |
| | |  42m 37s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12733519/MAPREDUCE-5965.2.patch 
|
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 363c355 |
| hadoop-streaming test log | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5741/artifact/patchprocess/testrun_hadoop-streaming.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5741/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5741/console |


This message was automatically generated.

 Hadoop streaming throws error if list of input files is high. Error is: 
 error=7, Argument list too long at if number of input file is high
 

 Key: MAPREDUCE-5965
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5965
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Arup Malakar
Assignee: Arup Malakar
 Attachments: MAPREDUCE-5965.1.patch, MAPREDUCE-5965.2.patch, 
 MAPREDUCE-5965.patch


 Hadoop streaming exposes all the key values in job conf as environment 
 variables when it forks a process for streaming code to run. Unfortunately 
 the variable mapreduce_input_fileinputformat_inputdir contains the list of 
 input files, and Linux has a limit on size of environment variables + 
 arguments.
 Based on how long the list of files and their full path is this could be 
 pretty huge. And given all of these variables are not even used it stops user 
 from running hadoop job with large number of files, even though it could be 
 run.
 Linux throws E2BIG if the size is greater than certain size which is error 
 code 7. And java translates that to error=7, Argument list too long. More: 
 http://man7.org/linux/man-pages/man2/execve.2.html I suggest skipping 
 variables if it is greater than certain length. That way if user code 
 requires the environment variable it would fail. It should also introduce a 
 config variable to skip long variables, and set it to false by default. That 
 way user has to specifically set it to true to invoke this feature.
 Here is the exception:
 {code}
 Error: java.lang.RuntimeException: Error in configuring object at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) 
 at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) 
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426) at 
 org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at 
 org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at 
 java.security.AccessController.doPrivileged(Native Method) at 
 javax.security.auth.Subject.doAs(Subject.java:415) at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
  at 

[jira] [Assigned] (MAPREDUCE-347) Improve the way error messages are displayed from jobclient

2015-05-18 Thread Ruth Wisniewski (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruth Wisniewski reassigned MAPREDUCE-347:
-

Assignee: Ruth Wisniewski

 Improve the way error messages are displayed from jobclient
 ---

 Key: MAPREDUCE-347
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-347
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Peeyush Bishnoi
Assignee: Ruth Wisniewski
  Labels: newbie

 Today if a job is submitted with an already existing output directory then an 
 exception trace is displayed on the client. A simple message like '{{Error 
 running job as output path already exists}}' might suffice.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6368) Unreachable Java code

2015-05-18 Thread Akira AJISAKA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547690#comment-14547690
 ] 

Akira AJISAKA commented on MAPREDUCE-6368:
--

Thanks [~dhirajnilange] for reporting this issue. I'm thinking the condition 
can be true.
{code}
float stepSize = samples.length / (float) numPartitions;
int last = -1;
for(int i = 1; i  numPartitions; ++i) {
  int k = Math.round(stepSize * i);
  while (last = k  comparator.compare(samples[last], samples[k]) == 0) {
++k;
  }
  writer.append(samples[k], nullValue);
  last = k;
}
{code}
{{k = Math.round(stepSize * i)}} can be equal to {{last = Math.round(stepSize * 
(i-1))}} if {{stepSize}} is less than 1.

 Unreachable Java code
 -

 Key: MAPREDUCE-6368
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6368
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.6.0
Reporter: Dhiraj Nilange
Priority: Minor

 Reference
 Class: org.apache.hadoop.mapreduce.lib.partition.InputSampler
 Method: writePartitionFile
 Line: 337
 The issue exists in the following code loop at line 337:-
 while (last = k  comparator.compare(samples[last], samples[k]) == 0) 
 {
   ++k; 
  }
 The problem is that the first condition in the while loop (last = k) will 
 always be false. The value of 'last' will always be lesser than 'k' and hence 
 the first condition will never evaluate to true. There is second condition as 
 well but since it is appearing as AND condition, that will never be checked 
 since the first condition itself is false. Hence this loop is not 
 contributing towards the code output anyways. If this was intended to 
 execute, then I guess it will need investigation. But from what I have 
 noticed, it doesn't seem to harm the output of the method. In that case why 
 even keep it there. We could very well remove it from the code. And if this 
 was done with the some other intention, in that case this needs to be 
 corrected as currently it is unreachable code. This issue very much exists in 
 the release 2.6.0, I have not seen the release 2.7.0 source code, but it may 
 very well exist in that as well (it's worth checking).
 Thanks  Regards,
 Dhiraj



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6350) JobHistory doesn't support fully-functional search

2015-05-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548380#comment-14548380
 ] 

Hadoop QA commented on MAPREDUCE-6350:
--

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 47s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 30s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 36s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   1m  3s | The applied patch generated  2 
new checkstyle issues (total was 15, now 17). |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 1  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 35s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 54s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | mapreduce tests |   9m 55s | Tests passed in 
hadoop-mapreduce-client-app. |
| {color:green}+1{color} | mapreduce tests |   0m 46s | Tests passed in 
hadoop-mapreduce-client-common. |
| | |  48m  9s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12733575/YARN-1614.v3.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / bcc1786 |
| checkstyle |  
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5742/artifact/patchprocess/diffcheckstylehadoop-mapreduce-client-common.txt
 |
| whitespace | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5742/artifact/patchprocess/whitespace.txt
 |
| hadoop-mapreduce-client-app test log | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5742/artifact/patchprocess/testrun_hadoop-mapreduce-client-app.txt
 |
| hadoop-mapreduce-client-common test log | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5742/artifact/patchprocess/testrun_hadoop-mapreduce-client-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5742/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5742/console |


This message was automatically generated.

 JobHistory doesn't support fully-functional search
 --

 Key: MAPREDUCE-6350
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6350
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Reporter: Siqi Li
Assignee: Siqi Li
Priority: Critical
 Attachments: YARN-1614.v1.patch, YARN-1614.v2.patch, 
 YARN-1614.v3.patch


 job history server will only output the first 50 characters of the job names 
 in webUI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5690) TestLocalMRNotification.testMR occasionally fails

2015-05-18 Thread Masatake Iwasaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated MAPREDUCE-5690:

Resolution: Duplicate
Status: Resolved  (was: Patch Available)

Closing this as duplicate of MAPREDUCE-4376

 TestLocalMRNotification.testMR occasionally fails
 -

 Key: MAPREDUCE-5690
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5690
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Liyin Liang
Assignee: Liyin Liang
 Attachments: MAPREDUCE-5690.1.diff


 TestLocalMRNotificationis occasionally failing with the error:
 {code}
 ---
 Test set: org.apache.hadoop.mapred.TestLocalMRNotification
 ---
 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 24.992 sec 
  FAILURE! - in org.apache.hadoop.mapred.TestLocalMRNotification
 testMR(org.apache.hadoop.mapred.TestLocalMRNotification)  Time elapsed: 
 24.881 sec   ERROR!
 java.io.IOException: Job cleanup didn't start in 20 seconds
 at 
 org.apache.hadoop.mapred.UtilsForTests.runJobKill(UtilsForTests.java:685)
 at 
 org.apache.hadoop.mapred.NotificationTestCase.testMR(NotificationTestCase.java:178)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at junit.framework.TestCase.runTest(TestCase.java:168)
 at junit.framework.TestCase.runBare(TestCase.java:134)
 at junit.framework.TestResult$1.protect(TestResult.java:110)
 at junit.framework.TestResult.runProtected(TestResult.java:128)
 at junit.framework.TestResult.run(TestResult.java:113)
 at junit.framework.TestCase.run(TestCase.java:124)
 at junit.framework.TestSuite.runTest(TestSuite.java:243)
 at junit.framework.TestSuite.run(TestSuite.java:238)
 at 
 org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83)
 at 
 org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:254)
 at 
 org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:149)
 at 
 org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
 at 
 org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
 at 
 org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
 at 
 org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4978) Add a updateJobWithSplit() method for new-api job

2015-05-18 Thread Masatake Iwasaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated MAPREDUCE-4978:

Resolution: Won't Fix
Status: Resolved  (was: Patch Available)

This patch adds code to set properties listed below. 

* map.input.file
* map.input.start
* map.input.length

Few part of current MapReduce code use these. 

* Those are used by CombineFileRecordReader but it set them by itself
* map.input.file is used by MultipleOutputFormat which is old api

I am closing this as Won't fix. Please reopen this if you need this, [~liangly].

 Add a updateJobWithSplit() method for new-api job
 -

 Key: MAPREDUCE-4978
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4978
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 1.1.2
Reporter: Liyin Liang
Assignee: Liyin Liang
 Attachments: 4978-1.diff


 HADOOP-1230 adds a method updateJobWithSplit(), which only works for old-api 
 job. It's better to add another method for new-api job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5690) TestLocalMRNotification.testMR occasionally fails

2015-05-18 Thread Masatake Iwasaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated MAPREDUCE-5690:

Labels:   (was: BB2015-05-TBR)

 TestLocalMRNotification.testMR occasionally fails
 -

 Key: MAPREDUCE-5690
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5690
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Liyin Liang
Assignee: Liyin Liang
 Attachments: MAPREDUCE-5690.1.diff


 TestLocalMRNotificationis occasionally failing with the error:
 {code}
 ---
 Test set: org.apache.hadoop.mapred.TestLocalMRNotification
 ---
 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 24.992 sec 
  FAILURE! - in org.apache.hadoop.mapred.TestLocalMRNotification
 testMR(org.apache.hadoop.mapred.TestLocalMRNotification)  Time elapsed: 
 24.881 sec   ERROR!
 java.io.IOException: Job cleanup didn't start in 20 seconds
 at 
 org.apache.hadoop.mapred.UtilsForTests.runJobKill(UtilsForTests.java:685)
 at 
 org.apache.hadoop.mapred.NotificationTestCase.testMR(NotificationTestCase.java:178)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at junit.framework.TestCase.runTest(TestCase.java:168)
 at junit.framework.TestCase.runBare(TestCase.java:134)
 at junit.framework.TestResult$1.protect(TestResult.java:110)
 at junit.framework.TestResult.runProtected(TestResult.java:128)
 at junit.framework.TestResult.run(TestResult.java:113)
 at junit.framework.TestCase.run(TestCase.java:124)
 at junit.framework.TestSuite.runTest(TestSuite.java:243)
 at junit.framework.TestSuite.run(TestSuite.java:238)
 at 
 org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83)
 at 
 org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:254)
 at 
 org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:149)
 at 
 org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
 at 
 org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
 at 
 org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
 at 
 org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4978) Add a updateJobWithSplit() method for new-api job

2015-05-18 Thread Masatake Iwasaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated MAPREDUCE-4978:

Labels:   (was: BB2015-05-TBR)

 Add a updateJobWithSplit() method for new-api job
 -

 Key: MAPREDUCE-4978
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4978
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 1.1.2
Reporter: Liyin Liang
Assignee: Liyin Liang
 Attachments: 4978-1.diff


 HADOOP-1230 adds a method updateJobWithSplit(), which only works for old-api 
 job. It's better to add another method for new-api job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6350) JobHistory doesn't support fully-functional search

2015-05-18 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated MAPREDUCE-6350:
---
Attachment: YARN-1614.v3.patch

 JobHistory doesn't support fully-functional search
 --

 Key: MAPREDUCE-6350
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6350
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Reporter: Siqi Li
Assignee: Siqi Li
Priority: Critical
 Attachments: YARN-1614.v1.patch, YARN-1614.v2.patch, 
 YARN-1614.v3.patch


 job history server will only output the first 50 characters of the job names 
 in webUI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (MAPREDUCE-5074) Remove limits on number of counters and counter groups in MapReduce

2015-05-18 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash resolved MAPREDUCE-5074.
-
Resolution: Won't Fix

We can re-open this if we find users compelling us to increase the limits

 Remove limits on number of counters and counter groups in MapReduce
 ---

 Key: MAPREDUCE-5074
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5074
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mr-am, mrv2
Affects Versions: 3.0.0, 2.0.3-alpha, 0.23.6
Reporter: Ravi Prakash

 Can we please consider removing limits on the number of counters and counter 
 groups now that it is all user code? Thanks to the much better architecture 
 of YARN in which there is no single Job Tracker we have to worry about 
 overloading, I feel we should do away with this (now arbitrary) constraint on 
 users' capabilities. Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (MAPREDUCE-3010) ant mvn-install doesn't work on hadoop-mapreduce-project

2015-05-18 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash resolved MAPREDUCE-3010.
-
Resolution: Invalid

We have moved to maven a long time since

 ant mvn-install doesn't work on hadoop-mapreduce-project
 

 Key: MAPREDUCE-3010
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3010
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Ravi Prakash

 Even though ant jar works, ant mvn-install fails in the compile-fault-inject 
 step



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-4711) Append time elapsed since job-start-time for finished tasks

2015-05-18 Thread Ravi Prakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Prakash updated MAPREDUCE-4711:

Resolution: Won't Fix
Status: Resolved  (was: Patch Available)

 Append time elapsed since job-start-time for finished tasks
 ---

 Key: MAPREDUCE-4711
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4711
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Affects Versions: 0.23.3
Reporter: Ravi Prakash
  Labels: BB2015-05-TBR
 Attachments: MAPREDUCE-4711.branch-0.23.patch


 In 0.20.x/1.x, the analyze job link gave this information
 bq. The last Map task task_sometask finished at (relative to the Job launch 
 time): 5/10 20:23:10 (1hrs, 27mins, 54sec)
 The time it took for the last task to finish needs to be calculated mentally 
 in 0.23. I believe we should print it next to the finish time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5626) TaskLogServlet could not get syslog

2015-05-18 Thread Masatake Iwasaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated MAPREDUCE-5626:

Labels: patch  (was: BB2015-05-TBR patch)

 TaskLogServlet could not get syslog
 ---

 Key: MAPREDUCE-5626
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5626
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.2.1
 Environment: Linux version 2.6.18-238.9.1.el5
 Java(TM) SE Runtime Environment (build 1.6.0_43-b01)
 hadoop-1.2.1
Reporter: yangjun
Priority: Minor
  Labels: patch
 Fix For: 1.2.1

   Original Estimate: 2h
  Remaining Estimate: 2h

 When multiply tasks use one jvm and generated logs.
 eg.
 ./attempt_201211220735_0001_m_00_0:
 log.index
 ./attempt_201211220735_0001_m_01_0:
 log.index
 ./attempt_201211220735_0001_m_02_0:
 log.index  stderr  stdout  syslog
 get from http://:50060/tasklog?attemptid= 
 attempt_201211220735_0001_m_00_0 
 could get stderr,stdout,but not the others,include syslog.
 see TaskLogServlet.haveTaskLog() method, not check from local  log.index, 
 but check the original path.
 resolve:
 modify TaskLogServlet haveTaskLog method
 private boolean haveTaskLog(TaskAttemptID taskId, boolean isCleanup,  
 TaskLog.LogName type) throws IOException {  
 File f = TaskLog.getTaskLogFile(taskId, isCleanup, type);  
 if (f.exists()  f.canRead()) {  
 return true;  
 } else {  
 File indexFile = TaskLog.getIndexFile(taskId, isCleanup);  
 if (!indexFile.exists()) {  
 return false;  
 }  


 BufferedReader fis;  
 try {  
 fis = new BufferedReader(new InputStreamReader(  
 SecureIOUtils.openForRead(indexFile,  
 TaskLog.obtainLogDirOwner(taskId;  
 } catch (FileNotFoundException ex) {  
 LOG.warn(Index file for the log of  + taskId  
 +  does not exist.);  


 // Assume no task reuse is used and files exist on attemptdir 
  
 StringBuffer input = new StringBuffer();  
 input.append(LogFileDetail.LOCATION  
 + TaskLog.getAttemptDir(taskId, isCleanup) + \n);  
 for (LogName logName : TaskLog.LOGS_TRACKED_BY_INDEX_FILES) { 
  
 input.append(logName + :0 -1\n);  
 }  
 fis = new BufferedReader(new StringReader(input.toString())); 
  
 }  


 try {  
 String str = fis.readLine();  
 if (str == null) { // thefile doesn't have anything  
 throw new IOException(Index file for the log of  + 
 taskId  
 + is empty.);  
 }  
 String loc = 
 str.substring(str.indexOf(LogFileDetail.LOCATION)  
 + LogFileDetail.LOCATION.length());  
 File tf = new File(loc, type.toString());  
 return tf.exists()  tf.canRead();  


 } finally {  
 if (fis != null)  
 fis.close();  
 }  
 }  


 }  
 workaround:
 url add filter=SYSLOG could print syslog also.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5626) TaskLogServlet could not get syslog

2015-05-18 Thread Masatake Iwasaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Masatake Iwasaki updated MAPREDUCE-5626:

Resolution: Won't Fix
Status: Resolved  (was: Patch Available)

I think this could be closed as won't fix.

[~yangj...@sohu.com], Could you attach patch file to JIRA as described in 
[wiki|https://wiki.apache.org/hadoop/HowToContribute], if you have update for 
this or another issue?


 TaskLogServlet could not get syslog
 ---

 Key: MAPREDUCE-5626
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5626
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.2.1
 Environment: Linux version 2.6.18-238.9.1.el5
 Java(TM) SE Runtime Environment (build 1.6.0_43-b01)
 hadoop-1.2.1
Reporter: yangjun
Priority: Minor
  Labels: patch
 Fix For: 1.2.1

   Original Estimate: 2h
  Remaining Estimate: 2h

 When multiply tasks use one jvm and generated logs.
 eg.
 ./attempt_201211220735_0001_m_00_0:
 log.index
 ./attempt_201211220735_0001_m_01_0:
 log.index
 ./attempt_201211220735_0001_m_02_0:
 log.index  stderr  stdout  syslog
 get from http://:50060/tasklog?attemptid= 
 attempt_201211220735_0001_m_00_0 
 could get stderr,stdout,but not the others,include syslog.
 see TaskLogServlet.haveTaskLog() method, not check from local  log.index, 
 but check the original path.
 resolve:
 modify TaskLogServlet haveTaskLog method
 private boolean haveTaskLog(TaskAttemptID taskId, boolean isCleanup,  
 TaskLog.LogName type) throws IOException {  
 File f = TaskLog.getTaskLogFile(taskId, isCleanup, type);  
 if (f.exists()  f.canRead()) {  
 return true;  
 } else {  
 File indexFile = TaskLog.getIndexFile(taskId, isCleanup);  
 if (!indexFile.exists()) {  
 return false;  
 }  


 BufferedReader fis;  
 try {  
 fis = new BufferedReader(new InputStreamReader(  
 SecureIOUtils.openForRead(indexFile,  
 TaskLog.obtainLogDirOwner(taskId;  
 } catch (FileNotFoundException ex) {  
 LOG.warn(Index file for the log of  + taskId  
 +  does not exist.);  


 // Assume no task reuse is used and files exist on attemptdir 
  
 StringBuffer input = new StringBuffer();  
 input.append(LogFileDetail.LOCATION  
 + TaskLog.getAttemptDir(taskId, isCleanup) + \n);  
 for (LogName logName : TaskLog.LOGS_TRACKED_BY_INDEX_FILES) { 
  
 input.append(logName + :0 -1\n);  
 }  
 fis = new BufferedReader(new StringReader(input.toString())); 
  
 }  


 try {  
 String str = fis.readLine();  
 if (str == null) { // thefile doesn't have anything  
 throw new IOException(Index file for the log of  + 
 taskId  
 + is empty.);  
 }  
 String loc = 
 str.substring(str.indexOf(LogFileDetail.LOCATION)  
 + LogFileDetail.LOCATION.length());  
 File tf = new File(loc, type.toString());  
 return tf.exists()  tf.canRead();  


 } finally {  
 if (fis != null)  
 fis.close();  
 }  
 }  


 }  
 workaround:
 url add filter=SYSLOG could print syslog also.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5690) TestLocalMRNotification.testMR occasionally fails

2015-05-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548628#comment-14548628
 ] 

Hadoop QA commented on MAPREDUCE-5690:
--

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | patch |   0m  1s | The patch file was not named 
according to hadoop's naming conventions. Please see 
https://wiki.apache.org/hadoop/HowToContribute for instructions. |
| {color:blue}0{color} | pre-patch |   5m 11s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 29s | There were no new javac warning 
messages. |
| {color:green}+1{color} | release audit |   0m 20s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 33s | There were no new checkstyle 
issues. |
| {color:red}-1{color} | whitespace |   0m  0s | The patch has 2  line(s) that 
end in whitespace. Use git apply --whitespace=fix. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 31s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 40s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | mapreduce tests | 108m 13s | Tests passed in 
hadoop-mapreduce-client-jobclient. |
| | | 124m 34s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12619471/MAPREDUCE-5690.1.diff |
| Optional Tests | javac unit findbugs checkstyle |
| git revision | trunk / 060c84e |
| whitespace | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5743/artifact/patchprocess/whitespace.txt
 |
| hadoop-mapreduce-client-jobclient test log | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5743/artifact/patchprocess/testrun_hadoop-mapreduce-client-jobclient.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5743/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5743/console |


This message was automatically generated.

 TestLocalMRNotification.testMR occasionally fails
 -

 Key: MAPREDUCE-5690
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5690
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Liyin Liang
Assignee: Liyin Liang
 Attachments: MAPREDUCE-5690.1.diff


 TestLocalMRNotificationis occasionally failing with the error:
 {code}
 ---
 Test set: org.apache.hadoop.mapred.TestLocalMRNotification
 ---
 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 24.992 sec 
  FAILURE! - in org.apache.hadoop.mapred.TestLocalMRNotification
 testMR(org.apache.hadoop.mapred.TestLocalMRNotification)  Time elapsed: 
 24.881 sec   ERROR!
 java.io.IOException: Job cleanup didn't start in 20 seconds
 at 
 org.apache.hadoop.mapred.UtilsForTests.runJobKill(UtilsForTests.java:685)
 at 
 org.apache.hadoop.mapred.NotificationTestCase.testMR(NotificationTestCase.java:178)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at junit.framework.TestCase.runTest(TestCase.java:168)
 at junit.framework.TestCase.runBare(TestCase.java:134)
 at junit.framework.TestResult$1.protect(TestResult.java:110)
 at junit.framework.TestResult.runProtected(TestResult.java:128)
 at junit.framework.TestResult.run(TestResult.java:113)
 at junit.framework.TestCase.run(TestCase.java:124)
 at junit.framework.TestSuite.runTest(TestSuite.java:243)
 at junit.framework.TestSuite.run(TestSuite.java:238)
 at 
 org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83)
 at 
 org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:254)
 at 
 org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:149)
 at 
 

[jira] [Commented] (MAPREDUCE-5965) Hadoop streaming throws error if list of input files is high. Error is: error=7, Argument list too long at if number of input file is high

2015-05-18 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549114#comment-14549114
 ] 

Ray Chiang commented on MAPREDUCE-5965:
---

Thanks Wilfred.  I guess I'll comment on the meta issue first.  In general, I'm 
not sure whether it's  a good idea to filter based purely on size.  Would it 
better to have a more firm whitelist and/or blacklist capability for Hadoop 
streaming?

 Hadoop streaming throws error if list of input files is high. Error is: 
 error=7, Argument list too long at if number of input file is high
 

 Key: MAPREDUCE-5965
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5965
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Arup Malakar
Assignee: Arup Malakar
 Attachments: MAPREDUCE-5965.1.patch, MAPREDUCE-5965.2.patch, 
 MAPREDUCE-5965.patch


 Hadoop streaming exposes all the key values in job conf as environment 
 variables when it forks a process for streaming code to run. Unfortunately 
 the variable mapreduce_input_fileinputformat_inputdir contains the list of 
 input files, and Linux has a limit on size of environment variables + 
 arguments.
 Based on how long the list of files and their full path is this could be 
 pretty huge. And given all of these variables are not even used it stops user 
 from running hadoop job with large number of files, even though it could be 
 run.
 Linux throws E2BIG if the size is greater than certain size which is error 
 code 7. And java translates that to error=7, Argument list too long. More: 
 http://man7.org/linux/man-pages/man2/execve.2.html I suggest skipping 
 variables if it is greater than certain length. That way if user code 
 requires the environment variable it would fail. It should also introduce a 
 config variable to skip long variables, and set it to false by default. That 
 way user has to specifically set it to true to invoke this feature.
 Here is the exception:
 {code}
 Error: java.lang.RuntimeException: Error in configuring object at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) 
 at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) 
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426) at 
 org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at 
 org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at 
 java.security.AccessController.doPrivileged(Native Method) at 
 javax.security.auth.Subject.doAs(Subject.java:415) at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: 
 java.lang.reflect.InvocationTargetException at 
 sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606) at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) 
 ... 9 more Caused by: java.lang.RuntimeException: Error in configuring object 
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) 
 at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) 
 at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38) ... 14 
 more Caused by: java.lang.reflect.InvocationTargetException at 
 sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606) at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) 
 ... 17 more Caused by: java.lang.RuntimeException: configuration exception at 
 org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222) at 
 org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66) ... 22 
 more Caused by: java.io.IOException: Cannot run program 
 /data/hadoop/hadoop-yarn/cache/yarn/nm-local-dir/usercache/oo-analytics/appcache/application_1403599726264_13177/container_1403599726264_13177_01_06/./rbenv_runner.sh:
  error=7, Argument list too long at 
 java.lang.ProcessBuilder.start(ProcessBuilder.java:1041) at 
 org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209) ... 23 
 more Caused by: java.io.IOException: error=7, Argument list too long 

[jira] [Commented] (MAPREDUCE-6350) JobHistory doesn't support fully-functional search

2015-05-18 Thread Siqi Li (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549356#comment-14549356
 ] 

Siqi Li commented on MAPREDUCE-6350:


Hi [~mitdesai], thank you for your feedback, I have uploaded patch v3 that 
fixed those style issues.

 JobHistory doesn't support fully-functional search
 --

 Key: MAPREDUCE-6350
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6350
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Reporter: Siqi Li
Assignee: Siqi Li
Priority: Critical
 Attachments: MAPREDUCE-6350.v1.patch, YARN-1614.v1.patch, 
 YARN-1614.v2.patch, YARN-1614.v3.patch


 job history server will only output the first 50 characters of the job names 
 in webUI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5965) Hadoop streaming throws error if list of input files is high. Error is: error=7, Argument list too long at if number of input file is high

2015-05-18 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549128#comment-14549128
 ] 

Ray Chiang commented on MAPREDUCE-5965:
---

Making these comments assuming the current patch is an acceptable design 
approach, I have the following nitpicks:

1) Can stream.truncate.long.jobconf.values be put in the appropriate 
*-default.xml file for documentation purposes?

2) Can the lenLimit correspond to a Configuration variable?


 Hadoop streaming throws error if list of input files is high. Error is: 
 error=7, Argument list too long at if number of input file is high
 

 Key: MAPREDUCE-5965
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5965
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Arup Malakar
Assignee: Arup Malakar
 Attachments: MAPREDUCE-5965.1.patch, MAPREDUCE-5965.2.patch, 
 MAPREDUCE-5965.patch


 Hadoop streaming exposes all the key values in job conf as environment 
 variables when it forks a process for streaming code to run. Unfortunately 
 the variable mapreduce_input_fileinputformat_inputdir contains the list of 
 input files, and Linux has a limit on size of environment variables + 
 arguments.
 Based on how long the list of files and their full path is this could be 
 pretty huge. And given all of these variables are not even used it stops user 
 from running hadoop job with large number of files, even though it could be 
 run.
 Linux throws E2BIG if the size is greater than certain size which is error 
 code 7. And java translates that to error=7, Argument list too long. More: 
 http://man7.org/linux/man-pages/man2/execve.2.html I suggest skipping 
 variables if it is greater than certain length. That way if user code 
 requires the environment variable it would fail. It should also introduce a 
 config variable to skip long variables, and set it to false by default. That 
 way user has to specifically set it to true to invoke this feature.
 Here is the exception:
 {code}
 Error: java.lang.RuntimeException: Error in configuring object at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) 
 at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) 
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426) at 
 org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at 
 org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at 
 java.security.AccessController.doPrivileged(Native Method) at 
 javax.security.auth.Subject.doAs(Subject.java:415) at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: 
 java.lang.reflect.InvocationTargetException at 
 sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606) at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) 
 ... 9 more Caused by: java.lang.RuntimeException: Error in configuring object 
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) 
 at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) 
 at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38) ... 14 
 more Caused by: java.lang.reflect.InvocationTargetException at 
 sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606) at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) 
 ... 17 more Caused by: java.lang.RuntimeException: configuration exception at 
 org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222) at 
 org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66) ... 22 
 more Caused by: java.io.IOException: Cannot run program 
 /data/hadoop/hadoop-yarn/cache/yarn/nm-local-dir/usercache/oo-analytics/appcache/application_1403599726264_13177/container_1403599726264_13177_01_06/./rbenv_runner.sh:
  error=7, Argument list too long at 
 java.lang.ProcessBuilder.start(ProcessBuilder.java:1041) at 
 org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209) ... 23 
 more Caused by: 

[jira] [Updated] (MAPREDUCE-6350) JobHistory doesn't support fully-functional search

2015-05-18 Thread Siqi Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siqi Li updated MAPREDUCE-6350:
---
Attachment: MAPREDUCE-6350.v1.patch

 JobHistory doesn't support fully-functional search
 --

 Key: MAPREDUCE-6350
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6350
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Reporter: Siqi Li
Assignee: Siqi Li
Priority: Critical
 Attachments: MAPREDUCE-6350.v1.patch, YARN-1614.v1.patch, 
 YARN-1614.v2.patch, YARN-1614.v3.patch


 job history server will only output the first 50 characters of the job names 
 in webUI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6350) JobHistory doesn't support fully-functional search

2015-05-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549342#comment-14549342
 ] 

Hadoop QA commented on MAPREDUCE-6350:
--

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  15m  6s | Pre-patch trunk compilation is 
healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:red}-1{color} | tests included |   0m  0s | The patch doesn't appear 
to include any new or modified tests.  Please justify why no new tests are 
needed for this patch. Also please list what manual steps were performed to 
verify this patch. |
| {color:green}+1{color} | javac |   7m 45s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 53s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   1m  2s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 37s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 33s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   1m 55s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | mapreduce tests |   9m 35s | Tests passed in 
hadoop-mapreduce-client-app. |
| {color:green}+1{color} | mapreduce tests |   0m 46s | Tests passed in 
hadoop-mapreduce-client-common. |
| | |  48m 39s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12733627/MAPREDUCE-6350.v1.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 0790275 |
| hadoop-mapreduce-client-app test log | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5744/artifact/patchprocess/testrun_hadoop-mapreduce-client-app.txt
 |
| hadoop-mapreduce-client-common test log | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5744/artifact/patchprocess/testrun_hadoop-mapreduce-client-common.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5744/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5744/console |


This message was automatically generated.

 JobHistory doesn't support fully-functional search
 --

 Key: MAPREDUCE-6350
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6350
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Reporter: Siqi Li
Assignee: Siqi Li
Priority: Critical
 Attachments: MAPREDUCE-6350.v1.patch, YARN-1614.v1.patch, 
 YARN-1614.v2.patch, YARN-1614.v3.patch


 job history server will only output the first 50 characters of the job names 
 in webUI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (MAPREDUCE-1380) Adaptive Scheduler

2015-05-18 Thread ericson yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ericson yang reassigned MAPREDUCE-1380:
---

Assignee: ericson yang  (was: Jordà Polo)

 Adaptive Scheduler
 --

 Key: MAPREDUCE-1380
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1380
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 2.4.1
Reporter: Jordà Polo
Assignee: ericson yang
Priority: Minor
 Attachments: MAPREDUCE-1380-branch-1.2.patch, 
 MAPREDUCE-1380_0.1.patch, MAPREDUCE-1380_1.1.patch, MAPREDUCE-1380_1.1.pdf


 The Adaptive Scheduler is a pluggable Hadoop scheduler that automatically 
 adjusts the amount of used resources depending on the performance of jobs and 
 on user-defined high-level business goals.
 Existing Hadoop schedulers are focused on managing large, static clusters in 
 which nodes are added or removed manually. On the other hand, the goal of 
 this scheduler is to improve the integration of Hadoop and the applications 
 that run on top of it with environments that allow a more dynamic 
 provisioning of resources.
 The current implementation is quite straightforward. Users specify a deadline 
 at job submission time, and the scheduler adjusts the resources to meet that 
 deadline (at the moment, the scheduler can be configured to either minimize 
 or maximize the amount of resources). If multiple jobs are run 
 simultaneously, the scheduler prioritizes them by deadline. Note that the 
 current approach to estimate the completion time of jobs is quite simplistic: 
 it is based on the time it takes to finish each task, so it works well with 
 regular jobs, but there is still room for improvement for unpredictable jobs.
 The idea is to further integrate it with cloud-like and virtual environments 
 (such as Amazon EC2, Emotive, etc.) so that if, for instance, a job isn't 
 able to meet its deadline, the scheduler automatically requests more 
 resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-1380) Adaptive Scheduler

2015-05-18 Thread ericson yang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549609#comment-14549609
 ] 

ericson yang commented on MAPREDUCE-1380:
-

I am a beginner of hadoop,I  want to solve this problem, but I have some 
questions: 
1.What is the specific meaning of the adaptive scheduler and the differences 
between the adaptive scheduler and capacity scheduler. 
2.According to my understanding, the adaptive scheduler is located in the 
package mapreduce, why it is not in yarn package.
3.While I have the code of hadoop 2.4.1, how can I alter them to add adaptive 
scheduler using the patch files above.
Please forgive my poor english, Would you please give me a hand?

 Adaptive Scheduler
 --

 Key: MAPREDUCE-1380
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1380
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 2.4.1
Reporter: Jordà Polo
Assignee: Jordà Polo
Priority: Minor
 Attachments: MAPREDUCE-1380-branch-1.2.patch, 
 MAPREDUCE-1380_0.1.patch, MAPREDUCE-1380_1.1.patch, MAPREDUCE-1380_1.1.pdf


 The Adaptive Scheduler is a pluggable Hadoop scheduler that automatically 
 adjusts the amount of used resources depending on the performance of jobs and 
 on user-defined high-level business goals.
 Existing Hadoop schedulers are focused on managing large, static clusters in 
 which nodes are added or removed manually. On the other hand, the goal of 
 this scheduler is to improve the integration of Hadoop and the applications 
 that run on top of it with environments that allow a more dynamic 
 provisioning of resources.
 The current implementation is quite straightforward. Users specify a deadline 
 at job submission time, and the scheduler adjusts the resources to meet that 
 deadline (at the moment, the scheduler can be configured to either minimize 
 or maximize the amount of resources). If multiple jobs are run 
 simultaneously, the scheduler prioritizes them by deadline. Note that the 
 current approach to estimate the completion time of jobs is quite simplistic: 
 it is based on the time it takes to finish each task, so it works well with 
 regular jobs, but there is still room for improvement for unpredictable jobs.
 The idea is to further integrate it with cloud-like and virtual environments 
 (such as Amazon EC2, Emotive, etc.) so that if, for instance, a job isn't 
 able to meet its deadline, the scheduler automatically requests more 
 resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-1380) Adaptive Scheduler

2015-05-18 Thread ericson yang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549608#comment-14549608
 ] 

ericson yang commented on MAPREDUCE-1380:
-

I am a beginner of hadoop,I  want to solve this problem, but I have some 
questions: 
1.What is the specific meaning of the adaptive scheduler and the differences 
between the adaptive scheduler and capacity scheduler. 
2.According to my understanding, the adaptive scheduler is located in the 
package mapreduce, why it is not in yarn package.
3.While I have the code of hadoop 2.4.1, how can I alter them to add adaptive 
scheduler using the patch files above.
Please forgive my poor english, Would you please give me a hand?

 Adaptive Scheduler
 --

 Key: MAPREDUCE-1380
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1380
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 2.4.1
Reporter: Jordà Polo
Assignee: Jordà Polo
Priority: Minor
 Attachments: MAPREDUCE-1380-branch-1.2.patch, 
 MAPREDUCE-1380_0.1.patch, MAPREDUCE-1380_1.1.patch, MAPREDUCE-1380_1.1.pdf


 The Adaptive Scheduler is a pluggable Hadoop scheduler that automatically 
 adjusts the amount of used resources depending on the performance of jobs and 
 on user-defined high-level business goals.
 Existing Hadoop schedulers are focused on managing large, static clusters in 
 which nodes are added or removed manually. On the other hand, the goal of 
 this scheduler is to improve the integration of Hadoop and the applications 
 that run on top of it with environments that allow a more dynamic 
 provisioning of resources.
 The current implementation is quite straightforward. Users specify a deadline 
 at job submission time, and the scheduler adjusts the resources to meet that 
 deadline (at the moment, the scheduler can be configured to either minimize 
 or maximize the amount of resources). If multiple jobs are run 
 simultaneously, the scheduler prioritizes them by deadline. Note that the 
 current approach to estimate the completion time of jobs is quite simplistic: 
 it is based on the time it takes to finish each task, so it works well with 
 regular jobs, but there is still room for improvement for unpredictable jobs.
 The idea is to further integrate it with cloud-like and virtual environments 
 (such as Amazon EC2, Emotive, etc.) so that if, for instance, a job isn't 
 able to meet its deadline, the scheduler automatically requests more 
 resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-5965) Hadoop streaming throws error if list of input files is high. Error is: error=7, Argument list too long at if number of input file is high

2015-05-18 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549661#comment-14549661
 ] 

Wilfred Spiegelenburg commented on MAPREDUCE-5965:
--

Arup: Do you mind if I assign the jira to me? Would like to get this fixed in 
an upcoming release.

 Hadoop streaming throws error if list of input files is high. Error is: 
 error=7, Argument list too long at if number of input file is high
 

 Key: MAPREDUCE-5965
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5965
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Arup Malakar
Assignee: Arup Malakar
 Attachments: MAPREDUCE-5965.1.patch, MAPREDUCE-5965.2.patch, 
 MAPREDUCE-5965.patch


 Hadoop streaming exposes all the key values in job conf as environment 
 variables when it forks a process for streaming code to run. Unfortunately 
 the variable mapreduce_input_fileinputformat_inputdir contains the list of 
 input files, and Linux has a limit on size of environment variables + 
 arguments.
 Based on how long the list of files and their full path is this could be 
 pretty huge. And given all of these variables are not even used it stops user 
 from running hadoop job with large number of files, even though it could be 
 run.
 Linux throws E2BIG if the size is greater than certain size which is error 
 code 7. And java translates that to error=7, Argument list too long. More: 
 http://man7.org/linux/man-pages/man2/execve.2.html I suggest skipping 
 variables if it is greater than certain length. That way if user code 
 requires the environment variable it would fail. It should also introduce a 
 config variable to skip long variables, and set it to false by default. That 
 way user has to specifically set it to true to invoke this feature.
 Here is the exception:
 {code}
 Error: java.lang.RuntimeException: Error in configuring object at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) 
 at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) 
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426) at 
 org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at 
 org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at 
 java.security.AccessController.doPrivileged(Native Method) at 
 javax.security.auth.Subject.doAs(Subject.java:415) at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: 
 java.lang.reflect.InvocationTargetException at 
 sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606) at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) 
 ... 9 more Caused by: java.lang.RuntimeException: Error in configuring object 
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) 
 at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) 
 at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38) ... 14 
 more Caused by: java.lang.reflect.InvocationTargetException at 
 sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606) at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) 
 ... 17 more Caused by: java.lang.RuntimeException: configuration exception at 
 org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222) at 
 org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66) ... 22 
 more Caused by: java.io.IOException: Cannot run program 
 /data/hadoop/hadoop-yarn/cache/yarn/nm-local-dir/usercache/oo-analytics/appcache/application_1403599726264_13177/container_1403599726264_13177_01_06/./rbenv_runner.sh:
  error=7, Argument list too long at 
 java.lang.ProcessBuilder.start(ProcessBuilder.java:1041) at 
 org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209) ... 23 
 more Caused by: java.io.IOException: error=7, Argument list too long at 
 java.lang.UNIXProcess.forkAndExec(Native Method) at 
 java.lang.UNIXProcess.init(UNIXProcess.java:135) at 
 

[jira] [Commented] (MAPREDUCE-6204) TestJobCounters should use new properties instead JobConf.MAPRED_TASK_JAVA_OPTS

2015-05-18 Thread sam liu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549752#comment-14549752
 ] 

sam liu commented on MAPREDUCE-6204:


Hi Tsuyoshi,

In fact, there is no hadoop cluster and hadoop related configuration in my dev 
env on power linux, so I also did not sure why the 'MAPRED_MAP_TASK_JAVA_OPTS' 
and 'MAPRED_REDUCE_TASK_JAVA_OPTS' has above default and unexpected value 
there. At the same time, we mentioned that the root causes of the issue is 
described in MAPREDUCE-6205, however its patch is still not reviewed/accepted 
yet. Therefore, current fix will greatly make sense to the ut TestJobCounters: 
it explicitly replaces the deprecated property with the legal ones, and really 
could set correct value to MAP/REDUCE OPTS properties and fix the unexpected 
env/configurations issue and make the ut more robust. 

Thanks!

 TestJobCounters should use new properties instead 
 JobConf.MAPRED_TASK_JAVA_OPTS
 ---

 Key: MAPREDUCE-6204
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6204
 Project: Hadoop Map/Reduce
  Issue Type: Test
  Components: test
Affects Versions: 2.6.0
Reporter: sam liu
Assignee: sam liu
Priority: Minor
  Labels: BB2015-05-RFC
 Attachments: MAPREDUCE-6204-1.patch, MAPREDUCE-6204-2.patch, 
 MAPREDUCE-6204-3.patch, MAPREDUCE-6204-4.patch, MAPREDUCE-6204.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6204) TestJobCounters should use new properties instead JobConf.MAPRED_TASK_JAVA_OPTS

2015-05-18 Thread Tsuyoshi Ozawa (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549762#comment-14549762
 ] 

Tsuyoshi Ozawa commented on MAPREDUCE-6204:
---

OK. I agree with the fix for using newer properties instead of using deprecated 
one.

[~jira.shegalov] As you mentioned, the test failure can be not related and it's 
addressed on MAPREDUCE-6205. However, we should move using newer, we can say 
more proper, properties. Do you agree with fixing this?

 TestJobCounters should use new properties instead 
 JobConf.MAPRED_TASK_JAVA_OPTS
 ---

 Key: MAPREDUCE-6204
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6204
 Project: Hadoop Map/Reduce
  Issue Type: Test
  Components: test
Affects Versions: 2.6.0
Reporter: sam liu
Assignee: sam liu
Priority: Minor
  Labels: BB2015-05-RFC
 Attachments: MAPREDUCE-6204-1.patch, MAPREDUCE-6204-2.patch, 
 MAPREDUCE-6204-3.patch, MAPREDUCE-6204-4.patch, MAPREDUCE-6204.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-1380) Adaptive Scheduler

2015-05-18 Thread ericson yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ericson yang updated MAPREDUCE-1380:

Assignee: Jordà Polo  (was: ericson yang)

 Adaptive Scheduler
 --

 Key: MAPREDUCE-1380
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1380
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
Affects Versions: 2.4.1
Reporter: Jordà Polo
Assignee: Jordà Polo
Priority: Minor
 Attachments: MAPREDUCE-1380-branch-1.2.patch, 
 MAPREDUCE-1380_0.1.patch, MAPREDUCE-1380_1.1.patch, MAPREDUCE-1380_1.1.pdf


 The Adaptive Scheduler is a pluggable Hadoop scheduler that automatically 
 adjusts the amount of used resources depending on the performance of jobs and 
 on user-defined high-level business goals.
 Existing Hadoop schedulers are focused on managing large, static clusters in 
 which nodes are added or removed manually. On the other hand, the goal of 
 this scheduler is to improve the integration of Hadoop and the applications 
 that run on top of it with environments that allow a more dynamic 
 provisioning of resources.
 The current implementation is quite straightforward. Users specify a deadline 
 at job submission time, and the scheduler adjusts the resources to meet that 
 deadline (at the moment, the scheduler can be configured to either minimize 
 or maximize the amount of resources). If multiple jobs are run 
 simultaneously, the scheduler prioritizes them by deadline. Note that the 
 current approach to estimate the completion time of jobs is quite simplistic: 
 it is based on the time it takes to finish each task, so it works well with 
 regular jobs, but there is still room for improvement for unpredictable jobs.
 The idea is to further integrate it with cloud-like and virtual environments 
 (such as Amazon EC2, Emotive, etc.) so that if, for instance, a job isn't 
 able to meet its deadline, the scheduler automatically requests more 
 resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5965) Hadoop streaming throws error if list of input files is high. Error is: error=7, Argument list too long at if number of input file is high

2015-05-18 Thread Wilfred Spiegelenburg (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated MAPREDUCE-5965:
-
Attachment: MAPREDUCE-5965.2.patch

Ran into the same issue. Re-based and cleaned up patch which does the same as 
the Hive patch (truncate the environment value)

 Hadoop streaming throws error if list of input files is high. Error is: 
 error=7, Argument list too long at if number of input file is high
 

 Key: MAPREDUCE-5965
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5965
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Arup Malakar
Assignee: Arup Malakar
 Attachments: MAPREDUCE-5965.1.patch, MAPREDUCE-5965.2.patch, 
 MAPREDUCE-5965.patch


 Hadoop streaming exposes all the key values in job conf as environment 
 variables when it forks a process for streaming code to run. Unfortunately 
 the variable mapreduce_input_fileinputformat_inputdir contains the list of 
 input files, and Linux has a limit on size of environment variables + 
 arguments.
 Based on how long the list of files and their full path is this could be 
 pretty huge. And given all of these variables are not even used it stops user 
 from running hadoop job with large number of files, even though it could be 
 run.
 Linux throws E2BIG if the size is greater than certain size which is error 
 code 7. And java translates that to error=7, Argument list too long. More: 
 http://man7.org/linux/man-pages/man2/execve.2.html I suggest skipping 
 variables if it is greater than certain length. That way if user code 
 requires the environment variable it would fail. It should also introduce a 
 config variable to skip long variables, and set it to false by default. That 
 way user has to specifically set it to true to invoke this feature.
 Here is the exception:
 {code}
 Error: java.lang.RuntimeException: Error in configuring object at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) 
 at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) 
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426) at 
 org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at 
 org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at 
 java.security.AccessController.doPrivileged(Native Method) at 
 javax.security.auth.Subject.doAs(Subject.java:415) at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: 
 java.lang.reflect.InvocationTargetException at 
 sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606) at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) 
 ... 9 more Caused by: java.lang.RuntimeException: Error in configuring object 
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) 
 at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) 
 at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38) ... 14 
 more Caused by: java.lang.reflect.InvocationTargetException at 
 sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606) at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) 
 ... 17 more Caused by: java.lang.RuntimeException: configuration exception at 
 org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222) at 
 org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66) ... 22 
 more Caused by: java.io.IOException: Cannot run program 
 /data/hadoop/hadoop-yarn/cache/yarn/nm-local-dir/usercache/oo-analytics/appcache/application_1403599726264_13177/container_1403599726264_13177_01_06/./rbenv_runner.sh:
  error=7, Argument list too long at 
 java.lang.ProcessBuilder.start(ProcessBuilder.java:1041) at 
 org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209) ... 23 
 more Caused by: java.io.IOException: error=7, Argument list too long at 
 java.lang.UNIXProcess.forkAndExec(Native Method) at 
 java.lang.UNIXProcess.init(UNIXProcess.java:135) at 
 

[jira] [Updated] (MAPREDUCE-5965) Hadoop streaming throws error if list of input files is high. Error is: error=7, Argument list too long at if number of input file is high

2015-05-18 Thread Wilfred Spiegelenburg (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wilfred Spiegelenburg updated MAPREDUCE-5965:
-
Status: Patch Available  (was: Open)

 Hadoop streaming throws error if list of input files is high. Error is: 
 error=7, Argument list too long at if number of input file is high
 

 Key: MAPREDUCE-5965
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5965
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Arup Malakar
Assignee: Arup Malakar
 Attachments: MAPREDUCE-5965.1.patch, MAPREDUCE-5965.2.patch, 
 MAPREDUCE-5965.patch


 Hadoop streaming exposes all the key values in job conf as environment 
 variables when it forks a process for streaming code to run. Unfortunately 
 the variable mapreduce_input_fileinputformat_inputdir contains the list of 
 input files, and Linux has a limit on size of environment variables + 
 arguments.
 Based on how long the list of files and their full path is this could be 
 pretty huge. And given all of these variables are not even used it stops user 
 from running hadoop job with large number of files, even though it could be 
 run.
 Linux throws E2BIG if the size is greater than certain size which is error 
 code 7. And java translates that to error=7, Argument list too long. More: 
 http://man7.org/linux/man-pages/man2/execve.2.html I suggest skipping 
 variables if it is greater than certain length. That way if user code 
 requires the environment variable it would fail. It should also introduce a 
 config variable to skip long variables, and set it to false by default. That 
 way user has to specifically set it to true to invoke this feature.
 Here is the exception:
 {code}
 Error: java.lang.RuntimeException: Error in configuring object at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) 
 at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) 
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426) at 
 org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at 
 org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at 
 java.security.AccessController.doPrivileged(Native Method) at 
 javax.security.auth.Subject.doAs(Subject.java:415) at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: 
 java.lang.reflect.InvocationTargetException at 
 sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606) at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) 
 ... 9 more Caused by: java.lang.RuntimeException: Error in configuring object 
 at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) 
 at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) 
 at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38) ... 14 
 more Caused by: java.lang.reflect.InvocationTargetException at 
 sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606) at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) 
 ... 17 more Caused by: java.lang.RuntimeException: configuration exception at 
 org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222) at 
 org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66) ... 22 
 more Caused by: java.io.IOException: Cannot run program 
 /data/hadoop/hadoop-yarn/cache/yarn/nm-local-dir/usercache/oo-analytics/appcache/application_1403599726264_13177/container_1403599726264_13177_01_06/./rbenv_runner.sh:
  error=7, Argument list too long at 
 java.lang.ProcessBuilder.start(ProcessBuilder.java:1041) at 
 org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209) ... 23 
 more Caused by: java.io.IOException: error=7, Argument list too long at 
 java.lang.UNIXProcess.forkAndExec(Native Method) at 
 java.lang.UNIXProcess.init(UNIXProcess.java:135) at 
 java.lang.ProcessImpl.start(ProcessImpl.java:130) at 
 java.lang.ProcessBuilder.start(ProcessBuilder.java:1022) ... 24 more