[jira] [Updated] (MAPREDUCE-4052) Windows eclipse cannot submit job from Windows client to Linux/Unix Hadoop cluster.

2014-03-10 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated MAPREDUCE-4052:
---

Status: Patch Available  (was: Open)

 Windows eclipse cannot submit job from Windows client to Linux/Unix Hadoop 
 cluster.
 ---

 Key: MAPREDUCE-4052
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4052
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: job submission
Affects Versions: 2.2.0, 0.23.1
 Environment: client on the Windows, the the cluster on the suse
Reporter: xieguiming
Assignee: Jian He
 Attachments: MAPREDUCE-4052-0.patch, MAPREDUCE-4052.1.patch, 
 MAPREDUCE-4052.2.patch, MAPREDUCE-4052.patch


 when I use the eclipse on the windows to submit the job. and the 
 applicationmaster throw the exception:
 Exception in thread main java.lang.NoClassDefFoundError: 
 org/apache/hadoop/mapreduce/v2/app/MRAppMaster
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster
 at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
 Could not find the main class: 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.  Program will exit.
 The reasion is :
 class Apps addToEnvironment function, use the
 private static final String SYSTEM_PATH_SEPARATOR =
   System.getProperty(path.separator);
 and will result the MRApplicationMaster classpath use the ; separator.
 I suggest that nodemanger do the replace.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (MAPREDUCE-4052) Windows eclipse cannot submit job from Windows client to Linux/Unix Hadoop cluster.

2014-03-10 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He reassigned MAPREDUCE-4052:
--

Assignee: Jian He  (was: xieguiming)

 Windows eclipse cannot submit job from Windows client to Linux/Unix Hadoop 
 cluster.
 ---

 Key: MAPREDUCE-4052
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4052
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: job submission
Affects Versions: 0.23.1, 2.2.0
 Environment: client on the Windows, the the cluster on the suse
Reporter: xieguiming
Assignee: Jian He
 Attachments: MAPREDUCE-4052-0.patch, MAPREDUCE-4052.1.patch, 
 MAPREDUCE-4052.2.patch, MAPREDUCE-4052.patch


 when I use the eclipse on the windows to submit the job. and the 
 applicationmaster throw the exception:
 Exception in thread main java.lang.NoClassDefFoundError: 
 org/apache/hadoop/mapreduce/v2/app/MRAppMaster
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster
 at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
 Could not find the main class: 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.  Program will exit.
 The reasion is :
 class Apps addToEnvironment function, use the
 private static final String SYSTEM_PATH_SEPARATOR =
   System.getProperty(path.separator);
 and will result the MRApplicationMaster classpath use the ; separator.
 I suggest that nodemanger do the replace.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-4052) Windows eclipse cannot submit job from Windows client to Linux/Unix Hadoop cluster.

2014-03-10 Thread Jian He (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated MAPREDUCE-4052:
---

Status: Open  (was: Patch Available)

 Windows eclipse cannot submit job from Windows client to Linux/Unix Hadoop 
 cluster.
 ---

 Key: MAPREDUCE-4052
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4052
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: job submission
Affects Versions: 2.2.0, 0.23.1
 Environment: client on the Windows, the the cluster on the suse
Reporter: xieguiming
Assignee: xieguiming
 Attachments: MAPREDUCE-4052-0.patch, MAPREDUCE-4052.1.patch, 
 MAPREDUCE-4052.2.patch, MAPREDUCE-4052.patch


 when I use the eclipse on the windows to submit the job. and the 
 applicationmaster throw the exception:
 Exception in thread main java.lang.NoClassDefFoundError: 
 org/apache/hadoop/mapreduce/v2/app/MRAppMaster
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster
 at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
 Could not find the main class: 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.  Program will exit.
 The reasion is :
 class Apps addToEnvironment function, use the
 private static final String SYSTEM_PATH_SEPARATOR =
   System.getProperty(path.separator);
 and will result the MRApplicationMaster classpath use the ; separator.
 I suggest that nodemanger do the replace.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5785) Derive task attempt JVM max heap size automatically from mapreduce.*.memory.mb

2014-03-10 Thread Gera Shegalov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gera Shegalov updated MAPREDUCE-5785:
-

Description: 
Currently users have to set 2 memory-related configs per Job / per task type.  
One first chooses some container size mapreduce.*.memory.mb and then a 
corresponding maximum Java heap size Xmx  mapreduce.*.memory.mb. This makes 
sure that the JVM's C-heap (native memory + Java heap) does not exceed this 
mapreduce.*.memory.mb. If one forgets to tune Xmx, MR-AM might be 
- allocating big containers whereas the JVM will only use the default -Xmx200m.
- allocating small containers that will OOM because Xmx is too high.

With this JIRA, we propose to set Xmx automatically based on an empirical ratio 
that can be adjusted. Xmx is not changed automatically if provided by the user.


  was:
Currently users have to set 2 memory-related configs for per Job / per task 
type.  One fist choses some container size mapreduce.*.memory.mb and then a 
corresponding Xmx  mapreduce.*.memory.mb to make sure that the JVM with the 
user code heap, and its native memory do not  exceed this limit. If one forgets 
to tune Xmx, MR-AM might be allocating big containers whereas the JVM will only 
use the default -Xmx200m.

With this JIRA, we propose to set Xmx automatically base on an empirical ratio 
that can be adjusted. Xmx is not changed automaically if provided by the user.



 Derive task attempt JVM max heap size automatically from mapreduce.*.memory.mb
 --

 Key: MAPREDUCE-5785
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5785
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: mr-am, task
Reporter: Gera Shegalov
Assignee: Gera Shegalov
 Attachments: MAPREDUCE-5785.v01.patch


 Currently users have to set 2 memory-related configs per Job / per task type. 
  One first chooses some container size mapreduce.*.memory.mb and then a 
 corresponding maximum Java heap size Xmx  mapreduce.*.memory.mb. This makes 
 sure that the JVM's C-heap (native memory + Java heap) does not exceed this 
 mapreduce.*.memory.mb. If one forgets to tune Xmx, MR-AM might be 
 - allocating big containers whereas the JVM will only use the default 
 -Xmx200m.
 - allocating small containers that will OOM because Xmx is too high.
 With this JIRA, we propose to set Xmx automatically based on an empirical 
 ratio that can be adjusted. Xmx is not changed automatically if provided by 
 the user.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5785) Derive task attempt JVM max heap size automatically from mapreduce.*.memory.mb

2014-03-10 Thread Gera Shegalov (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gera Shegalov updated MAPREDUCE-5785:
-

Description: 
Currently users have to set 2 memory-related configs per Job / per task type.  
One first chooses some container size map reduce.\*.memory.mb and then a 
corresponding maximum Java heap size Xmx  map reduce.\*.memory.mb. This makes 
sure that the JVM's C-heap (native memory + Java heap) does not exceed this 
mapreduce.*.memory.mb. If one forgets to tune Xmx, MR-AM might be 
- allocating big containers whereas the JVM will only use the default -Xmx200m.
- allocating small containers that will OOM because Xmx is too high.

With this JIRA, we propose to set Xmx automatically based on an empirical ratio 
that can be adjusted. Xmx is not changed automatically if provided by the user.


  was:
Currently users have to set 2 memory-related configs per Job / per task type.  
One first chooses some container size mapreduce.*.memory.mb and then a 
corresponding maximum Java heap size Xmx  mapreduce.*.memory.mb. This makes 
sure that the JVM's C-heap (native memory + Java heap) does not exceed this 
mapreduce.*.memory.mb. If one forgets to tune Xmx, MR-AM might be 
- allocating big containers whereas the JVM will only use the default -Xmx200m.
- allocating small containers that will OOM because Xmx is too high.

With this JIRA, we propose to set Xmx automatically based on an empirical ratio 
that can be adjusted. Xmx is not changed automatically if provided by the user.



 Derive task attempt JVM max heap size automatically from mapreduce.*.memory.mb
 --

 Key: MAPREDUCE-5785
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5785
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: mr-am, task
Reporter: Gera Shegalov
Assignee: Gera Shegalov
 Attachments: MAPREDUCE-5785.v01.patch


 Currently users have to set 2 memory-related configs per Job / per task type. 
  One first chooses some container size map reduce.\*.memory.mb and then a 
 corresponding maximum Java heap size Xmx  map reduce.\*.memory.mb. This 
 makes sure that the JVM's C-heap (native memory + Java heap) does not exceed 
 this mapreduce.*.memory.mb. If one forgets to tune Xmx, MR-AM might be 
 - allocating big containers whereas the JVM will only use the default 
 -Xmx200m.
 - allocating small containers that will OOM because Xmx is too high.
 With this JIRA, we propose to set Xmx automatically based on an empirical 
 ratio that can be adjusted. Xmx is not changed automatically if provided by 
 the user.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5785) Derive task attempt JVM max heap size automatically from mapreduce.*.memory.mb

2014-03-10 Thread Gera Shegalov (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925720#comment-13925720
 ] 

Gera Shegalov commented on MAPREDUCE-5785:
--

[~kkambatl], the constant headroom has been discussed. My thinking in favor of 
percentage-kind of overhead is that
- it's easy to reason about direct proportional overhead
- reasons a larger container size is specified is to process more data. Side 
effects of it is that some code is executed more frequently and more byte code 
is compiled into native.  The more native memory is used through the NIO stack, 
and native compressor libraries. The more tracking structures a GC might have.

I like your idea to tune io.sort.mb accordingly. I'd pick the default 50% of 
Xmx to match the current defaults: io.sort.mb=100 and -Xmx200m. I'll add this 
to the patch.








 Derive task attempt JVM max heap size automatically from mapreduce.*.memory.mb
 --

 Key: MAPREDUCE-5785
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5785
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: mr-am, task
Reporter: Gera Shegalov
Assignee: Gera Shegalov
 Attachments: MAPREDUCE-5785.v01.patch


 Currently users have to set 2 memory-related configs per Job / per task type. 
  One first chooses some container size map reduce.\*.memory.mb and then a 
 corresponding maximum Java heap size Xmx  map reduce.\*.memory.mb. This 
 makes sure that the JVM's C-heap (native memory + Java heap) does not exceed 
 this mapreduce.*.memory.mb. If one forgets to tune Xmx, MR-AM might be 
 - allocating big containers whereas the JVM will only use the default 
 -Xmx200m.
 - allocating small containers that will OOM because Xmx is too high.
 With this JIRA, we propose to set Xmx automatically based on an empirical 
 ratio that can be adjusted. Xmx is not changed automatically if provided by 
 the user.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5785) Derive task attempt JVM max heap size automatically from mapreduce.*.memory.mb

2014-03-10 Thread Gera Shegalov (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925743#comment-13925743
 ] 

Gera Shegalov commented on MAPREDUCE-5785:
--

[~sandyr], please elaborate regarding increased chances of OOM. Currently, if 
the users have not tuned Xmx, they'll get 200m.  With the patch, the'll get 
770m. If the user specified map reduce.\*.memory.mb smaller than the default 
1024 (also minimum-allocation-mb), I don't allow the Xmx be lower than the 
previous default Xmx200m in the patch.

Regarding the reversal of which parameter controls the other, I can see it 
either way. Your point works for me. But it is also convenient to explicitly 
state the cap for the container. The latter seems to be more 
backwards-compatible.

 Derive task attempt JVM max heap size automatically from mapreduce.*.memory.mb
 --

 Key: MAPREDUCE-5785
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5785
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: mr-am, task
Reporter: Gera Shegalov
Assignee: Gera Shegalov
 Attachments: MAPREDUCE-5785.v01.patch


 Currently users have to set 2 memory-related configs per Job / per task type. 
  One first chooses some container size map reduce.\*.memory.mb and then a 
 corresponding maximum Java heap size Xmx  map reduce.\*.memory.mb. This 
 makes sure that the JVM's C-heap (native memory + Java heap) does not exceed 
 this mapreduce.*.memory.mb. If one forgets to tune Xmx, MR-AM might be 
 - allocating big containers whereas the JVM will only use the default 
 -Xmx200m.
 - allocating small containers that will OOM because Xmx is too high.
 With this JIRA, we propose to set Xmx automatically based on an empirical 
 ratio that can be adjusted. Xmx is not changed automatically if provided by 
 the user.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5785) Derive task attempt JVM max heap size automatically from mapreduce.*.memory.mb

2014-03-10 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925767#comment-13925767
 ] 

Jason Lowe commented on MAPREDUCE-5785:
---

Haven't had a chance to look into the patch into great detail, but here are 
some initial comments:

- should 'memory.mb.xmx.ratio' be 'memory.mb.heap.ratio'?  Even the code names 
it that internally. ;-)
- rather than commenting out the mapred-default property it should leave it in 
without a value set.  See the mapred.child.env entry as an example.
- should be easy to add a unit test that verifies the ratio is working as 
intended, e.g.: changing it sees a corresponding jvm argument change out of 
MapReduceChildJVM.getVMCommand and setting an explicit heap setting in the 
config prevents the ratio from taking effect.

 Derive task attempt JVM max heap size automatically from mapreduce.*.memory.mb
 --

 Key: MAPREDUCE-5785
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5785
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: mr-am, task
Reporter: Gera Shegalov
Assignee: Gera Shegalov
 Attachments: MAPREDUCE-5785.v01.patch


 Currently users have to set 2 memory-related configs per Job / per task type. 
  One first chooses some container size map reduce.\*.memory.mb and then a 
 corresponding maximum Java heap size Xmx  map reduce.\*.memory.mb. This 
 makes sure that the JVM's C-heap (native memory + Java heap) does not exceed 
 this mapreduce.*.memory.mb. If one forgets to tune Xmx, MR-AM might be 
 - allocating big containers whereas the JVM will only use the default 
 -Xmx200m.
 - allocating small containers that will OOM because Xmx is too high.
 With this JIRA, we propose to set Xmx automatically based on an empirical 
 ratio that can be adjusted. Xmx is not changed automatically if provided by 
 the user.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (MAPREDUCE-3184) Improve handling of fetch failures when a tasktracker is not responding on HTTP

2014-03-10 Thread Jordan Zimmerman (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jordan Zimmerman reassigned MAPREDUCE-3184:
---

Assignee: Jordan Zimmerman  (was: Todd Lipcon)

 Improve handling of fetch failures when a tasktracker is not responding on 
 HTTP
 ---

 Key: MAPREDUCE-3184
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3184
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobtracker
Affects Versions: 0.20.205.0
Reporter: Todd Lipcon
Assignee: Jordan Zimmerman
 Fix For: 1.0.1

 Attachments: mr-3184.txt


 On a 100 node cluster, we had an issue where one of the TaskTrackers was hit 
 by MAPREDUCE-2386 and stopped responding to fetches. The behavior observed 
 was the following:
 - every reducer would try to fetch the same map task, and fail after ~13 
 minutes.
 - At that point, all reducers would report this failed fetch to the JT for 
 the same task, and the task would be re-run.
 - Meanwhile, the reducers would move on to the next map task that ran on the 
 TT, and hang for another 13 minutes.
 The job essentially made no progress for hours, as each map task that ran on 
 the bad node was serially marked failed.
 To combat this issue, we should introduce a second type of failed fetch 
 notification, used when the TT does not respond at all (ie 
 SocketTimeoutException, etc). These fetch failure notifications should count 
 against the TT at large, rather than a single task. If more than half of the 
 reducers report such an issue for a given TT, then all of the tasks from that 
 TT should be re-run.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5786) Support Keep-Alive connections in ShuffleHandler

2014-03-10 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925781#comment-13925781
 ] 

Jason Lowe commented on MAPREDUCE-5786:
---

Couldn't this lead to file descriptor exhaustion on the NM side?  Thinking of 
cases where we're running many large jobs on the cluster with thousands of 
reducers each and:

- reducer decides it wants to shuffle to memory but there isn't enough memory 
yet so it waits for the memory merge to complete (which could take a while)
- reducer is waiting for a subsequent map to complete (which could take many 
minutes or hours)

Seems like we could have a situation where reducers start piling up on the 
shuffle handler and camping out.

 Support Keep-Alive connections in ShuffleHandler
 

 Key: MAPREDUCE-5786
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5786
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
  Labels: shuffle

 Currently ShuffleHandler supports fetching map-outputs in batches from same 
 host.  But there are scenarios wherein, fetchers pull data aggressively (i.e 
 start pulling the data as  when they are available).  In this case, the 
 number of mapIds that are pulled from same host remains at 1. This causes 
 lots of connections to be established.
 Number of connections can be reduced a lot if ShuffleHandler supports 
 Keep-Alive.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5751) MR app master fails to start in some cases if mapreduce.job.classloader is true

2014-03-10 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925809#comment-13925809
 ] 

Jason Lowe commented on MAPREDUCE-5751:
---

+1, lgtm.  Will wait a few days before committing to give [~tomwhite] a chance 
to comment.

 MR app master fails to start in some cases if mapreduce.job.classloader is 
 true
 ---

 Key: MAPREDUCE-5751
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5751
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.2.0
Reporter: Sangjin Lee
Assignee: Sangjin Lee
 Attachments: mapreduce-5751.patch, mapreduce-5751.patch


 If mapreduce.job.classloader is set to true, and the MR client includes a 
 jetty jar in its libjars or job jar, the MR app master fails to start. A 
 typical stack trace we get is as follows:
 {noformat}
 java.lang.ClassCastException: org.mortbay.jetty.webapp.WebInfConfiguration 
 cannot be cast to org.mortbay.jetty.webapp.Configuration
   at 
 org.mortbay.jetty.webapp.WebAppContext.loadConfigurations(WebAppContext.java:890)
   at 
 org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:462)
   at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
   at 
 org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
   at 
 org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156)
   at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
   at 
 org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130)
   at org.mortbay.jetty.Server.doStart(Server.java:224)
   at 
 org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50)
   at org.apache.hadoop.http.HttpServer.start(HttpServer.java:676)
   at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:208)
   at 
 org.apache.hadoop.mapreduce.v2.app.client.MRClientService.start(MRClientService.java:151)
   at 
 org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.start(MRAppMaster.java:1040)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1307)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1303)
   at 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1259)
 {noformat}
 This happens because as part of the MR app master start the jetty classes are 
 loaded normally through the app classloader, but WebAppContext tries to load 
 the specific Configuration class via the thread context classloader (which 
 had been set to the user job classloader).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-4052) Windows eclipse cannot submit job from Windows client to Linux/Unix Hadoop cluster.

2014-03-10 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13926084#comment-13926084
 ] 

Chris Nauroth commented on MAPREDUCE-4052:
--

bq. btw, Chris Nauroth, is the use case that upgraded-client with non-upgraded 
NM important ?

I brought this up, because I've been in situations where someone wanted to pick 
up a client-side bug fix ahead of the cluster's upgrade schedule.  It looks to 
me like this is a gray area in our policies though.

http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-common/Compatibility.html#Wire_compatibility

From the content in that page, we've made a specific commitment that old 
clients continue to work with new servers.  As Jian said, that part is fine 
with this patch.  What is less clear is whether or not we've made a commitment 
for new clients to work with old servers.  Of course, it's best to strive for 
it, and forward compatibility is one of our motivations in the protobuf 
messages, but I can't tell from that policy statement if we've made a 
commitment to it.  This is probably worth some wider discussion before 
changing the patch.

If we do need to achieve that kind of compatibility, then it's going to be a 
more challenging patch.  I think we'd end up needing to add an optional version 
number or at least a flag on the {{Container}} returned in the 
{{AllocateResponse}}.  This would tell the client whether or not the container 
can accept the new syntax, and then the client could use the old code path as a 
fallback path for compatibility with old servers that don't set this version 
number or flag.  That would work for containers submitted by an AM.  I can't 
think of a similar solution that would work for the initial AM container 
though, because it seems to me like the RPC sequence there doesn't have as 
clear of a way for indicating capabilities inside the container that's going to 
run the AM before its submission.

Like I said, please do discuss wider before pursuing this.  I'd hate to send 
you down an unnecessary rathole if the current patch is fine.  :-)  Thanks, 
Jian.

 Windows eclipse cannot submit job from Windows client to Linux/Unix Hadoop 
 cluster.
 ---

 Key: MAPREDUCE-4052
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4052
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: job submission
Affects Versions: 0.23.1, 2.2.0
 Environment: client on the Windows, the the cluster on the suse
Reporter: xieguiming
Assignee: Jian He
 Attachments: MAPREDUCE-4052-0.patch, MAPREDUCE-4052.1.patch, 
 MAPREDUCE-4052.2.patch, MAPREDUCE-4052.patch


 when I use the eclipse on the windows to submit the job. and the 
 applicationmaster throw the exception:
 Exception in thread main java.lang.NoClassDefFoundError: 
 org/apache/hadoop/mapreduce/v2/app/MRAppMaster
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster
 at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
 Could not find the main class: 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.  Program will exit.
 The reasion is :
 class Apps addToEnvironment function, use the
 private static final String SYSTEM_PATH_SEPARATOR =
   System.getProperty(path.separator);
 and will result the MRApplicationMaster classpath use the ; separator.
 I suggest that nodemanger do the replace.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-4052) Windows eclipse cannot submit job from Windows client to Linux/Unix Hadoop cluster.

2014-03-10 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13926093#comment-13926093
 ] 

Karthik Kambatla commented on MAPREDUCE-4052:
-

(Haven't looked at the patch at all). 

My interpretation of the Wire-Compatibility policies is that new clients in the 
same major release should work old servers and vice-versa. If that is not 
clear, we should probably update the policies to clarify it. [~cnauroth], 
[~jianhe] - do any of you want to start a thread in the dev lists to discuss 
the same? 

 Windows eclipse cannot submit job from Windows client to Linux/Unix Hadoop 
 cluster.
 ---

 Key: MAPREDUCE-4052
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4052
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: job submission
Affects Versions: 0.23.1, 2.2.0
 Environment: client on the Windows, the the cluster on the suse
Reporter: xieguiming
Assignee: Jian He
 Attachments: MAPREDUCE-4052-0.patch, MAPREDUCE-4052.1.patch, 
 MAPREDUCE-4052.2.patch, MAPREDUCE-4052.patch


 when I use the eclipse on the windows to submit the job. and the 
 applicationmaster throw the exception:
 Exception in thread main java.lang.NoClassDefFoundError: 
 org/apache/hadoop/mapreduce/v2/app/MRAppMaster
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster
 at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
 Could not find the main class: 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.  Program will exit.
 The reasion is :
 class Apps addToEnvironment function, use the
 private static final String SYSTEM_PATH_SEPARATOR =
   System.getProperty(path.separator);
 and will result the MRApplicationMaster classpath use the ; separator.
 I suggest that nodemanger do the replace.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-4052) Windows eclipse cannot submit job from Windows client to Linux/Unix Hadoop cluster.

2014-03-10 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13926115#comment-13926115
 ] 

Chris Nauroth commented on MAPREDUCE-4052:
--

Thanks, [~kkambatl].  I can start that thread later today to clarify policy and 
update the text accordingly.

For this specific patch, I suppose another other option is to gate use of the 
new command syntax behind a client-side config flag.  Then, clients can turn on 
the flag only after they know their clusters have been upgraded.  That's a 
compatible solution that avoids protocol impacts entirely, but it pushes some 
complexity on to the end user to turn on the flag.

 Windows eclipse cannot submit job from Windows client to Linux/Unix Hadoop 
 cluster.
 ---

 Key: MAPREDUCE-4052
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4052
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: job submission
Affects Versions: 0.23.1, 2.2.0
 Environment: client on the Windows, the the cluster on the suse
Reporter: xieguiming
Assignee: Jian He
 Attachments: MAPREDUCE-4052-0.patch, MAPREDUCE-4052.1.patch, 
 MAPREDUCE-4052.2.patch, MAPREDUCE-4052.patch


 when I use the eclipse on the windows to submit the job. and the 
 applicationmaster throw the exception:
 Exception in thread main java.lang.NoClassDefFoundError: 
 org/apache/hadoop/mapreduce/v2/app/MRAppMaster
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster
 at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
 Could not find the main class: 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.  Program will exit.
 The reasion is :
 class Apps addToEnvironment function, use the
 private static final String SYSTEM_PATH_SEPARATOR =
   System.getProperty(path.separator);
 and will result the MRApplicationMaster classpath use the ; separator.
 I suggest that nodemanger do the replace.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (MAPREDUCE-5789) Average Reduce time is incorrect on Job Overview page

2014-03-10 Thread Rushabh S Shah (JIRA)
Rushabh S Shah created MAPREDUCE-5789:
-

 Summary: Average Reduce time is incorrect on Job Overview page
 Key: MAPREDUCE-5789
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5789
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver, webapps
Affects Versions: 2.3.0, 0.23.10
Reporter: Rushabh S Shah
Assignee: Rushabh S Shah


The Average Reduce time displayed on the job overview page is incorrect.
Previously Reduce time was calculated as difference between finishTime and 
shuffleFinishTime.
It should be difference of finishTime and sortFinishTime



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5778) JobSummary does not escape newlines in the job name

2014-03-10 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated MAPREDUCE-5778:
--

Status: Open  (was: Patch Available)

 JobSummary does not escape newlines in the job name
 ---

 Key: MAPREDUCE-5778
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5778
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Affects Versions: 2.3.0, 0.23.10
Reporter: Jason Lowe
Assignee: Akira AJISAKA
 Attachments: MAPREDUCE-5778.2.patch, MAPREDUCE-5778.3.patch, 
 MAPREDUCE-5778.patch


 JobSummary is not escaping newlines in the job name.  This can result in a 
 job summary log entry that spans multiple lines when users are expecting 
 one-job-per-line output.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5778) JobSummary does not escape newlines in the job name

2014-03-10 Thread Tsuyoshi OZAWA (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsuyoshi OZAWA updated MAPREDUCE-5778:
--

Status: Patch Available  (was: Open)

 JobSummary does not escape newlines in the job name
 ---

 Key: MAPREDUCE-5778
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5778
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Affects Versions: 2.3.0, 0.23.10
Reporter: Jason Lowe
Assignee: Akira AJISAKA
 Attachments: MAPREDUCE-5778.2.patch, MAPREDUCE-5778.3.patch, 
 MAPREDUCE-5778.patch


 JobSummary is not escaping newlines in the job name.  This can result in a 
 job summary log entry that spans multiple lines when users are expecting 
 one-job-per-line output.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5778) JobSummary does not escape newlines in the job name

2014-03-10 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13926188#comment-13926188
 ] 

Tsuyoshi OZAWA commented on MAPREDUCE-5778:
---

LGTM. Waiting for Jenkins report.

 JobSummary does not escape newlines in the job name
 ---

 Key: MAPREDUCE-5778
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5778
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Affects Versions: 0.23.10, 2.3.0
Reporter: Jason Lowe
Assignee: Akira AJISAKA
 Attachments: MAPREDUCE-5778.2.patch, MAPREDUCE-5778.3.patch, 
 MAPREDUCE-5778.patch


 JobSummary is not escaping newlines in the job name.  This can result in a 
 job summary log entry that spans multiple lines when users are expecting 
 one-job-per-line output.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5789) Average Reduce time is incorrect on Job Overview page

2014-03-10 Thread Rushabh S Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rushabh S Shah updated MAPREDUCE-5789:
--

Status: Patch Available  (was: Open)

Fixed the issue and confirmed with test case

 Average Reduce time is incorrect on Job Overview page
 -

 Key: MAPREDUCE-5789
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5789
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver, webapps
Affects Versions: 2.3.0, 0.23.10
Reporter: Rushabh S Shah
Assignee: Rushabh S Shah
 Attachments: patch-MapReduce-5789.patch


 The Average Reduce time displayed on the job overview page is incorrect.
 Previously Reduce time was calculated as difference between finishTime and 
 shuffleFinishTime.
 It should be difference of finishTime and sortFinishTime



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5789) Average Reduce time is incorrect on Job Overview page

2014-03-10 Thread Rushabh S Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rushabh S Shah updated MAPREDUCE-5789:
--

Attachment: patch-MapReduce-5789.patch

 Average Reduce time is incorrect on Job Overview page
 -

 Key: MAPREDUCE-5789
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5789
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver, webapps
Affects Versions: 0.23.10, 2.3.0
Reporter: Rushabh S Shah
Assignee: Rushabh S Shah
 Attachments: patch-MapReduce-5789.patch


 The Average Reduce time displayed on the job overview page is incorrect.
 Previously Reduce time was calculated as difference between finishTime and 
 shuffleFinishTime.
 It should be difference of finishTime and sortFinishTime



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2014-03-10 Thread Karthik Kambatla (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Kambatla updated MAPREDUCE-5028:


Attachment: mr-5028-3.patch

 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 1.2.0

 Attachments: MR-5028_testapp.patch, mr-5028-1.patch, mr-5028-2.patch, 
 mr-5028-3.patch, mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-branch1.patch, mr-5028-trunk.patch, mr-5028-trunk.patch, 
 mr-5028-trunk.patch, repro-mr-5028.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2014-03-10 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13926272#comment-13926272
 ] 

Karthik Kambatla commented on MAPREDUCE-5028:
-

Updated patch adds a unit tests and incorporates most of Vinod's suggestions. 

bq. There seems to be one more related bug w.r.t usage of 
DatInputBuffer.reset(). 
One of my earlier patches (the one that got reverted last year) was needlessly 
fixing this related bug.  

bq. Can you also cross verify Task.ValuesIterator.readNextKey() and 
readNextValue()?
Looks like you are right, but we need to be very careful and run several 
workloads before including them though (got bit by this before). If you are 
okay with it, I would like to address them in MAPREDUCE-5032, so we can get 
this piece in first as it has been tested by multiple people. 

 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 1.2.0

 Attachments: MR-5028_testapp.patch, mr-5028-1.patch, mr-5028-2.patch, 
 mr-5028-3.patch, mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-branch1.patch, mr-5028-trunk.patch, mr-5028-trunk.patch, 
 mr-5028-trunk.patch, repro-mr-5028.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2014-03-10 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13926363#comment-13926363
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-5028:


The latest patch looks good to me. +1. The test is failing without the patch 
and passes with.

Jenkins is having issues. Trying again manually. Will commit this once Jenkins 
says okay..

 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 1.2.0

 Attachments: MR-5028_testapp.patch, mr-5028-1.patch, mr-5028-2.patch, 
 mr-5028-3.patch, mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-branch1.patch, mr-5028-trunk.patch, mr-5028-trunk.patch, 
 mr-5028-trunk.patch, repro-mr-5028.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5786) Support Keep-Alive connections in ShuffleHandler

2014-03-10 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13926423#comment-13926423
 ] 

Rajesh Balamohan commented on MAPREDUCE-5786:
-

Thanks for comments Jason.  We need to have 
mapreduce.shuffle.enable.keep.alive to enable keep-alive in the 
ShuffleHandler and mapreduce.shuffle.enable.keep.alive.timeout to determine 
the time-out value for the persistent connection.  E.g, Keep-Alive: 
timeout=60 header specifies that the connection will be kept alive for 60 
seconds after which the connection will be closed.  This will allow us to tune 
persistent connection duration on large clusters with different job patterns.

 Support Keep-Alive connections in ShuffleHandler
 

 Key: MAPREDUCE-5786
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5786
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Rajesh Balamohan
Assignee: Rajesh Balamohan
  Labels: shuffle

 Currently ShuffleHandler supports fetching map-outputs in batches from same 
 host.  But there are scenarios wherein, fetchers pull data aggressively (i.e 
 start pulling the data as  when they are available).  In this case, the 
 number of mapIds that are pulled from same host remains at 1. This causes 
 lots of connections to be established.
 Number of connections can be reduced a lot if ShuffleHandler supports 
 Keep-Alive.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (MAPREDUCE-5790) Default map hprof profile options do not work

2014-03-10 Thread Andrew Wang (JIRA)
Andrew Wang created MAPREDUCE-5790:
--

 Summary: Default map hprof profile options do not work
 Key: MAPREDUCE-5790
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5790
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.3.0
 Environment: java version 1.6.0_31
Java(TM) SE Runtime Environment (build 1.6.0_31-b04)
Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode)
Reporter: Andrew Wang


I have an MR job doing the following:

{code}
Job job = Job.getInstance(conf);

// Enable profiling
job.setProfileEnabled(true);
job.setProfileTaskRange(true, 0);
job.setProfileTaskRange(false, 0);
{code}

When I run this job, some of my map tasks fail with this error message:

{noformat}
org.apache.hadoop.util.Shell$ExitCodeException: 
/data/5/yarn/nm/usercache/hdfs/appcache/application_1394482121761_0012/container_1394482121761_0012_01_41/launch_container.sh:
 line 32: $JAVA_HOME/bin/java -Djava.net.preferIPv4Stack=true 
-Dhadoop.metrics.log.level=WARN   -Xmx825955249 -Djava.io.tmpdir=$PWD/tmp 
-Dlog4j.configuration=container-log4j.properties 
-Dyarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1394482121761_0012/container_1394482121761_0012_01_41
 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA 
${mapreduce.task.profile.params} org.apache.hadoop.mapred.YarnChild 
10.20.212.12 43135 attempt_1394482121761_0012_r_00_0 41 
1/var/log/hadoop-yarn/container/application_1394482121761_0012/container_1394482121761_0012_01_41/stdout
 
2/var/log/hadoop-yarn/container/application_1394482121761_0012/container_1394482121761_0012_01_41/stderr
 : bad substitution
{noformat}

It looks like ${mapreduce.task.profile.params} is not getting subbed in 
correctly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5778) JobSummary does not escape newlines in the job name

2014-03-10 Thread Tsuyoshi OZAWA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13926457#comment-13926457
 ] 

Tsuyoshi OZAWA commented on MAPREDUCE-5778:
---

[~ajisakaa], I noticed one minor point while waiting for Jenkins:

{code}
 private static JobSummary summary = new JobSummary();
{code}

We should avoid creating static object if we can. Can you fix it? Thanks

 JobSummary does not escape newlines in the job name
 ---

 Key: MAPREDUCE-5778
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5778
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Affects Versions: 0.23.10, 2.3.0
Reporter: Jason Lowe
Assignee: Akira AJISAKA
 Attachments: MAPREDUCE-5778.2.patch, MAPREDUCE-5778.3.patch, 
 MAPREDUCE-5778.patch


 JobSummary is not escaping newlines in the job name.  This can result in a 
 job summary log entry that spans multiple lines when users are expecting 
 one-job-per-line output.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-4052) Windows eclipse cannot submit job from Windows client to Linux/Unix Hadoop cluster.

2014-03-10 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13929739#comment-13929739
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-4052:


bq. For this specific patch, I suppose another other option is to gate use of 
the new command syntax behind a client-side config flag.
This makes sense. In a way, the protocol with the NodeManager is already 
optional - if there are named tags then NM will substitute them, otherwise 
nothing to do. Let's see if it can be optional in MapReduce too.

 Windows eclipse cannot submit job from Windows client to Linux/Unix Hadoop 
 cluster.
 ---

 Key: MAPREDUCE-4052
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4052
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: job submission
Affects Versions: 0.23.1, 2.2.0
 Environment: client on the Windows, the the cluster on the suse
Reporter: xieguiming
Assignee: Jian He
 Attachments: MAPREDUCE-4052-0.patch, MAPREDUCE-4052.1.patch, 
 MAPREDUCE-4052.2.patch, MAPREDUCE-4052.patch


 when I use the eclipse on the windows to submit the job. and the 
 applicationmaster throw the exception:
 Exception in thread main java.lang.NoClassDefFoundError: 
 org/apache/hadoop/mapreduce/v2/app/MRAppMaster
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster
 at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
 Could not find the main class: 
 org.apache.hadoop.mapreduce.v2.app.MRAppMaster.  Program will exit.
 The reasion is :
 class Apps addToEnvironment function, use the
 private static final String SYSTEM_PATH_SEPARATOR =
   System.getProperty(path.separator);
 and will result the MRApplicationMaster classpath use the ; separator.
 I suggest that nodemanger do the replace.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5778) JobSummary does not escape newlines in the job name

2014-03-10 Thread Akira AJISAKA (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira AJISAKA updated MAPREDUCE-5778:
-

Attachment: MAPREDUCE-5778.4.patch

Thanks, I fixed it.

 JobSummary does not escape newlines in the job name
 ---

 Key: MAPREDUCE-5778
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5778
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Affects Versions: 0.23.10, 2.3.0
Reporter: Jason Lowe
Assignee: Akira AJISAKA
 Attachments: MAPREDUCE-5778.2.patch, MAPREDUCE-5778.3.patch, 
 MAPREDUCE-5778.4.patch, MAPREDUCE-5778.patch


 JobSummary is not escaping newlines in the job name.  This can result in a 
 job summary log entry that spans multiple lines when users are expecting 
 one-job-per-line output.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2014-03-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13929831#comment-13929831
 ] 

Hadoop QA commented on MAPREDUCE-5028:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12633783/mr-5028-3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4409//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4409//console

This message is automatically generated.

 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 1.2.0

 Attachments: MR-5028_testapp.patch, mr-5028-1.patch, mr-5028-2.patch, 
 mr-5028-3.patch, mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-branch1.patch, mr-5028-trunk.patch, mr-5028-trunk.patch, 
 mr-5028-trunk.patch, repro-mr-5028.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2014-03-10 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-5028:
---

   Resolution: Fixed
Fix Version/s: 2.4.0
   Status: Resolved  (was: Patch Available)

Committed this to trunk, branch-2 and branch-2.4. Thanks Karthik!

[~jlowe]/[~tgraves] et al, please pull this into 0.23 if needed.

 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 2.4.0, 1.2.0

 Attachments: MR-5028_testapp.patch, mr-5028-1.patch, mr-5028-2.patch, 
 mr-5028-3.patch, mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-branch1.patch, mr-5028-trunk.patch, mr-5028-trunk.patch, 
 mr-5028-trunk.patch, repro-mr-5028.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

2014-03-10 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13929857#comment-13929857
 ] 

Hudson commented on MAPREDUCE-5028:
---

SUCCESS: Integrated in Hadoop-trunk-Commit #5302 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5302/])
MAPREDUCE-5028. Fixed a bug in MapTask that was causing mappers to fail when a 
large value of io.sort.mb is set. Contributed by Karthik Kambatla. (vinodkv: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1576170)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/DataInputBuffer.java
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/MapTask.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/ReduceContextImpl.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/InMemoryReader.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/LargeSorter.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapreduce/TestLargeSort.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/test/MapredTestDriver.java


 Maps fail when io.sort.mb is set to high value
 --

 Key: MAPREDUCE-5028
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 1.1.1, 2.0.3-alpha, 0.23.5
Reporter: Karthik Kambatla
Assignee: Karthik Kambatla
Priority: Critical
 Fix For: 1.2.0, 2.4.0

 Attachments: MR-5028_testapp.patch, mr-5028-1.patch, mr-5028-2.patch, 
 mr-5028-3.patch, mr-5028-branch1.patch, mr-5028-branch1.patch, 
 mr-5028-branch1.patch, mr-5028-trunk.patch, mr-5028-trunk.patch, 
 mr-5028-trunk.patch, repro-mr-5028.patch


 Verified the problem exists on branch-1 with the following configuration:
 Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, 
 io.sort.mb=1280, dfs.block.size=2147483648
 Run teragen to generate 4 GB data
 Maps fail when you run wordcount on this configuration with the following 
 error: 
 {noformat}
 java.io.IOException: Spill failed
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
   at 
 org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
   at 
 org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5778) JobSummary does not escape newlines in the job name

2014-03-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13929870#comment-13929870
 ] 

Hadoop QA commented on MAPREDUCE-5778:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12633825/MAPREDUCE-5778.4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4410//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4410//console

This message is automatically generated.

 JobSummary does not escape newlines in the job name
 ---

 Key: MAPREDUCE-5778
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5778
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Affects Versions: 0.23.10, 2.3.0
Reporter: Jason Lowe
Assignee: Akira AJISAKA
 Attachments: MAPREDUCE-5778.2.patch, MAPREDUCE-5778.3.patch, 
 MAPREDUCE-5778.4.patch, MAPREDUCE-5778.patch


 JobSummary is not escaping newlines in the job name.  This can result in a 
 job summary log entry that spans multiple lines when users are expecting 
 one-job-per-line output.



--
This message was sent by Atlassian JIRA
(v6.2#6252)