from:"Vinod Kumar Vavilapalli \(JIRA\)"

[jira] [Updated] (MAPREDUCE-7166) map-only job should ignore node lost event when task is already succeeded

2018-11-29 Thread Vinod Kumar Vavilapalli (JIRA)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-7166:
---
Target Version/s: 2.9.2, 2.7.7, 2.8.5, 3.1.1, 3.0.0, 2.7.5
   Fix Version/s: (was: 2.9.2)
  (was: 2.7.7)
  (was: 2.8.5)
  (was: 3.1.1)
  (was: 3.0.0)
  (was: 2.7.5)

Removing fix-version for this open issue. Please always use Target-version for 
your intentions and let committers set the fix-version. Tx.

> map-only job should ignore node lost event when task is already succeeded
> -
>
> Key: MAPREDUCE-7166
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7166
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv2
>Affects Versions: 2.7.5, 3.0.0, 3.1.1, 2.8.5, 2.7.7, 2.9.2
>Reporter: Zhaohui Xin
>Assignee: Li Lei
>Priority: Major
>  Labels: newbie
> Attachments: MAPREDUCE-7166.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-7101) Revisit behavior of JHS scan file behavior

2018-06-05 Thread Vinod Kumar Vavilapalli (JIRA)



[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16502212#comment-16502212
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-7101:


There are two options
 # Have FS specific scan algorithms.
 # Have both algos for all FSes - i.e. instead of completely removing the 
modification-time check, have it (helps HDFS), but augment it with a fall-back 
more relaxed monitoring which just scans all users' directories every so often 
(helps those cloud FSes)

bq. Not sure how bad it could impact performance.
In long running large mutli-tenant cluster scenarios, this is bad.  This is 
true for all HDFS deployments, and may be some deployments in the cloud. For 
cloud deployments that are per user/tenant, it's okay there.

> Revisit behavior of JHS scan file behavior
> --
>
> Key: MAPREDUCE-7101
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7101
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Wangda Tan
>Priority: Critical
>
> Currently, the JHS scan directory if the modification of *directory* changed: 
> {code} 
> public synchronized void scanIfNeeded(FileStatus fs) {
>   long newModTime = fs.getModificationTime();
>   if (modTime != newModTime) {
> <... omitted some logics ...>
> // reset scanTime before scanning happens
> scanTime = System.currentTimeMillis();
> Path p = fs.getPath();
> try {
>   scanIntermediateDirectory(p);
> {code}
> This logic relies on an assumption that, the directory's modification time 
> will be updated if a file got placed under the directory.
> However, the semantic of directory's modification time is not consistent in 
> different FS implementations. For example, MAPREDUCE-6680 fixed some issues 
> of truncated modification time. And HADOOP-12837 mentioned on S3, the 
> directory's modification time is always 0.
> I think we need to revisit behavior of this logic to make it to more robustly 
> work on different file systems.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Updated] (MAPREDUCE-7097) MapReduce JHS should honor yarn.resourcemanager.display.per-user-apps

2018-05-17 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-7097:
---
Target Version/s: 2.9.1, 2.10.0, 3.0.3, 3.1.1  (was: 2.9.1, 3.0.3, 3.1.1)

> MapReduce JHS should honor yarn.resourcemanager.display.per-user-apps
> -
>
> Key: MAPREDUCE-7097
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7097
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Vinod Kumar Vavilapalli
>Priority: Major
>
> When this config is on, MR JHS should filter the app list based on 
> authenticated user.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7097) MapReduce JHS should honor yarn.resourcemanager.display.per-user-apps

2018-05-17 Thread Vinod Kumar Vavilapalli (JIRA)

Vinod Kumar Vavilapalli created MAPREDUCE-7097:
--

 Summary: MapReduce JHS should honor 
yarn.resourcemanager.display.per-user-apps
 Key: MAPREDUCE-7097
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7097
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli


When this config is on, MR JHS should filter the app list based on 
authenticated user.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-7036) ASF License warning in hadoop-mapreduce-client

2018-04-11 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434695#comment-16434695
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-7036:


Just realized reopening this doesn't make sense since 3.1.0 is already 
released. We should keep this ticket fixed and open a new JIRA.

> ASF License warning in hadoop-mapreduce-client
> --
>
> Key: MAPREDUCE-7036
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7036
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Minor
> Fix For: 3.1.0
>
> Attachments: MAPREDUCE-7036.1.patch
>
>
> it occurred in MAPREDUCE-7021 and MAPREDUCE-7034.
> {noformat}
> Lines that start with ? in the ASF License report indicate files that do 
> not have an Apache license header: !? 
> /testptch/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/jobTokenPassword
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Reopened] (MAPREDUCE-6823) FileOutputFormat to support configurable PathOutputCommitter factory

2018-03-22 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli reopened MAPREDUCE-6823:


I'm making a guess that this is a dup of HADOOP-13786 like the others I just 
closed as dups.

Reopening and closing this as a dup. [~ste...@apache.org], please revert back 
if this is incorrect.

> FileOutputFormat to support configurable PathOutputCommitter factory
> 
>
> Key: MAPREDUCE-6823
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6823
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 3.0.0-alpha2
> Environment: Targeting S3 as the output of work
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-13786-HADOOP-13345-001.patch, 
> MAPREDUCE-6823-002.patch, MAPREDUCE-6823-002.patch, MAPREDUCE-6823-004.patch
>
>
> In HADOOP-13786 I'm adding a custom subclass for FileOutputFormat, one which 
> can talk direct to the S3A Filesystem for more efficient operations, better 
> failure modes, and, most critically, as part of HADOOP-13345, atomic commit 
> of output. The normal committer relies on directory rename() being atomic for 
> this; for S3 we don't have that luxury.
> To support a custom committer, we need to be able to tell FileOutputFormat 
> (and implicitly, all subclasses which don't have their own custom committer), 
> to use our new {{S3AOutputCommitter}}.
> I propose: 
> # {{FileOutputFormat}} takes a factory to create committers.
> # The factory to take a URI and {{TaskAttemptContext}} and return a committer
> # the default implementation always returns a {{FileOutputCommitter}}
> # A configuration option allows a new factory to be named
> # An {{S3AOutputCommitterFactory}} to return a  {{FileOutputCommitter}} or 
> new {{S3AOutputCommitter}} depending upon the URI of the destination.
> Note that MRv1 already supports configurable committers; this is only the V2 
> API



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Resolved] (MAPREDUCE-6823) FileOutputFormat to support configurable PathOutputCommitter factory

2018-03-22 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved MAPREDUCE-6823.

   Resolution: Duplicate
Fix Version/s: (was: 3.1.0)

> FileOutputFormat to support configurable PathOutputCommitter factory
> 
>
> Key: MAPREDUCE-6823
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6823
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 3.0.0-alpha2
> Environment: Targeting S3 as the output of work
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Attachments: HADOOP-13786-HADOOP-13345-001.patch, 
> MAPREDUCE-6823-002.patch, MAPREDUCE-6823-002.patch, MAPREDUCE-6823-004.patch
>
>
> In HADOOP-13786 I'm adding a custom subclass for FileOutputFormat, one which 
> can talk direct to the S3A Filesystem for more efficient operations, better 
> failure modes, and, most critically, as part of HADOOP-13345, atomic commit 
> of output. The normal committer relies on directory rename() being atomic for 
> this; for S3 we don't have that luxury.
> To support a custom committer, we need to be able to tell FileOutputFormat 
> (and implicitly, all subclasses which don't have their own custom committer), 
> to use our new {{S3AOutputCommitter}}.
> I propose: 
> # {{FileOutputFormat}} takes a factory to create committers.
> # The factory to take a URI and {{TaskAttemptContext}} and return a committer
> # the default implementation always returns a {{FileOutputCommitter}}
> # A configuration option allows a new factory to be named
> # An {{S3AOutputCommitterFactory}} to return a  {{FileOutputCommitter}} or 
> new {{S3AOutputCommitter}} depending upon the URI of the destination.
> Note that MRv1 already supports configurable committers; this is only the V2 
> API



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Updated] (MAPREDUCE-6961) Pull up FileOutputCommitter.getOutputPath to PathOutputCommitter

2018-03-21 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6961:
---
Fix Version/s: (was: 3.1.0)

Removing 3.1.0 fix-version from all JIRAs which are Invalid / Won't Fix / 
Duplicate / Cannot Reproduce.

> Pull up FileOutputCommitter.getOutputPath to PathOutputCommitter
> 
>
> Key: MAPREDUCE-6961
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6961
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv2
>Affects Versions: 3.0.0-beta1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>
> SPARK-21549 has shown that downstream code is relying on the internal 
> property 
> if we pulled {{FileOutputCommitter.getOutputPath}} to the 
> {{PathOutputCommitter}} of MAPREDUCE-6956, then there'd be a public/stable 
> way to get this. Admittedly, it does imply that the committer will always 
> have *some* output path, but FileOutputFormat depends on that anyway.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-6754) Container Ids for an yarn application should be monotonically increasing in the scope of the application

2016-08-24 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15436130#comment-15436130
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-6754:


bq. Approach A: Include attempt id as part of the JvmId. This is a viable 
solution, however, there is a change in the format of the JVMid. Changing 
something that has existed so long for an optional feature is not persuasive.
bq. I don't understand the concern about changing the JvmID. It's not really 
public and only used within the scope of a single job
Agreed.

[~srikanth.sampath], JvmID originally was added to implement JVM reuse in 
Hadoop 1 MapReduce. When we moved to YARN + MR, we lost JVM reuse and I doubt 
if we are going to implement that now. So, I'd argue that we can completely 
remove JvmID. But if that's too much, like [~jlowe] says we can simply change 
the format of JvmID - it is not a public API.

> Container Ids for an yarn application should be monotonically increasing in 
> the scope of the application
> 
>
> Key: MAPREDUCE-6754
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6754
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>Reporter: Srikanth Sampath
>Assignee: Srikanth Sampath
>
> Currently across application attempts, container Ids are reused.  The 
> container id is stored in AppSchedulingInfo and it is reinitialized with 
> every application attempt.  So the containerId scope is limited to the 
> application attempt.
> In the MR Framework, It is important to note that the containerId is being 
> used as part of the JvmId.  JvmId has 3 components  containerId>.  The JvmId is used in datastructures in TaskAttemptListener and 
> is passed between the AppMaster and the individual tasks.  For an application 
> attempt, no two tasks have the same JvmId.
> This works well currently, since inflight tasks get killed if the AppMaster 
> goes down.  However, if we want to enable WorkPreserving nature for the AM, 
> containers (and hence containerIds) live across application attempts.  If we 
> recycle containerIds across attempts, then two independent tasks (one 
> inflight from a previous attempt  and another new in a succeeding attempt) 
> can have the same JvmId and cause havoc.
> This can be solved in two ways:
> *Approach A*: Include attempt id as part of the JvmId. This is a viable 
> solution, however, there is a change in the format of the JVMid. Changing 
> something that has existed so long for an optional feature is not persuasive.
> *Approach B*: Keep the container id to be a monotonically increasing id for 
> the life of an application. So, container ids are not reused across 
> application attempts containers should be able to outlive an application 
> attempt. This is the preferred approach as it is clean, logical and is 
> backwards compatible. Nothing changes for existing applications or the 
> internal workings.  
> *How this is achieved:*
> Currently, we maintain latest containerId only for application attempts and 
> reinitialize for new attempts.  With this approach, we retrieve the latest 
> containerId from the just-failed attempt and initialize the new attempt with 
> the latest containerId (instead of 0).   I can provide the patch if it helps. 
>  It currently exists in MAPREDUCE-6726



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Updated] (MAPREDUCE-6315) Implement retrieval of logs for crashed MR-AM via jhist in the staging directory

2016-08-22 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6315:
---
Status: Open  (was: Patch Available)

> Implement retrieval of logs for crashed MR-AM via jhist in the staging 
> directory
> 
>
> Key: MAPREDUCE-6315
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6315
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: client, mr-am
>Affects Versions: 2.7.0
>Reporter: Gera Shegalov
>Priority: Critical
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-6315.001.patch, MAPREDUCE-6315.002.patch, 
> MAPREDUCE-6315.003.patch
>
>
> When all AM attempts crash, there is no record of them in JHS. Thus no easy 
> way to get the logs. This JIRA automates the procedure by utilizing the jhist 
> file in the staging directory. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Updated] (MAPREDUCE-6315) Implement retrieval of logs for crashed MR-AM via jhist in the staging directory

2016-08-22 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6315:
---
Target Version/s: 2.9.0  (was: 2.8.0)

I applied the patch and looked at it. {{mapred job -logs}} is an arcane command 
that I don't know of a lot of users using. It definitely isn't the first place 
our users go - that would be the web UI. Given that, I also agree that putting 
this in the JobHistory-UI is a minimum requirement.

Even without that, this patch needs more work as commented above.

Note that YARN's the ResourceManager does have these log-links against each 
app-attempt.

Given all of the above, I am moving this into 2.9 and unblocking 2.8.0. Please 
revert back if you disagree. Tx.

> Implement retrieval of logs for crashed MR-AM via jhist in the staging 
> directory
> 
>
> Key: MAPREDUCE-6315
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6315
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: client, mr-am
>Affects Versions: 2.7.0
>Reporter: Gera Shegalov
>Priority: Critical
>  Labels: BB2015-05-TBR
> Attachments: MAPREDUCE-6315.001.patch, MAPREDUCE-6315.002.patch, 
> MAPREDUCE-6315.003.patch
>
>
> When all AM attempts crash, there is no record of them in JHS. Thus no easy 
> way to get the logs. This JIRA automates the procedure by utilizing the jhist 
> file in the staging directory. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-6310) Add jdiff support to MapReduce

2016-08-19 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15429147#comment-15429147
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-6310:


Tx for your help [~leftnoteasy]!

> Add jdiff support to MapReduce
> --
>
> Key: MAPREDUCE-6310
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6310
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Li Lu
>Assignee: Li Lu
>Priority: Blocker
> Fix For: 2.8.0
>
> Attachments: MAPRED-6310-040615.patch, MAPREDUCE-6310-06132016.txt, 
> MAPREDUCE-6310-06132016.txt, MAPREDUCE-6310-06132018.txt
>
>
> Previously we used jdiff for Hadoop common and HDFS. Now we're extending the 
> support of jdiff to YARN. Probably we'd like to do similar things with 
> MapReduce? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Updated] (MAPREDUCE-6654) Possible NPE in JobHistoryEventHandler#handleEvent

2016-08-18 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6654:
---
Target Version/s: 2.8.1  (was: 2.8.0)

Moving to 2.8.1 while the discussion continues.

> Possible NPE in JobHistoryEventHandler#handleEvent
> --
>
> Key: MAPREDUCE-6654
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6654
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Xiao Chen
>Assignee: Junping Du
>Priority: Critical
> Attachments: MAPREDUCE-6654-v2.1.patch, MAPREDUCE-6654-v2.patch, 
> MAPREDUCE-6654.patch
>
>
> I have seen NPE thrown from {{JobHistoryEventHandler#handleEvent}}:
> {noformat}
> 2016-03-14 16:42:15,231 INFO [Thread-69] 
> org.apache.hadoop.service.AbstractService: Service JobHistoryEventHandler 
> failed in state STOPPED; cause: java.lang.NullPointerException
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:570)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.serviceStop(JobHistoryEventHandler.java:382)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
>   at 
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStop(MRAppMaster.java:1651)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.stop(MRAppMaster.java:1147)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.shutDownJob(MRAppMaster.java:573)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler$1.run(MRAppMaster.java:620)
> {noformat}
> In the version this exception is thrown, the 
> [line|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java#L586]
>  is:
> {code:java}mi.writeEvent(historyEvent);{code}
> IMHO, this may be caused by an exception in a previous step. Specifically, in 
> the kerberized environment, when creating event writer which calls to decrypt 
> EEK, the connection to KMS failed. Exception below:
> {noformat} 
> 2016-03-14 16:41:57,559 ERROR [eventHandlingThread] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Error 
> JobHistoryEventHandler in handleEvent: EventType: AM_STARTED
> java.net.SocketTimeoutException: Read timed out
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.read(SocketInputStream.java:152)
>   at java.net.SocketInputStream.read(SocketInputStream.java:122)
>   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>   at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
>   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
>   at 
> java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468)
>   at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:520)
>   at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:505)
>   at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.decryptEncryptedKey(KMSClientProvider.java:779)
>   at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$3.call(LoadBalancingKMSClientProvider.java:185)
>   at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider$3.call(LoadBalancingKMSClientProvider.java:181)
>   at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.doOp(LoadBalancingKMSClientProvider.java:94)
>   at 
> org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.decryptEncryptedKey(LoadBalancingKMSClientProvider.java:181)
>   at 
> org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.decryptEncryptedKey(KeyProviderCryptoExtension.java:388)
>   at 
> org.apache.hadoop.hdfs.DFSClient.decryptEncryptedDataEncryptionKey(DFSClient.java:1420)
>   at 
>

[jira] [Updated] (MAPREDUCE-6310) Add jdiff support to MapReduce

2016-08-18 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6310:
---
Status: Patch Available  (was: Open)

> Add jdiff support to MapReduce
> --
>
> Key: MAPREDUCE-6310
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6310
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Li Lu
>Assignee: Li Lu
>Priority: Blocker
> Attachments: MAPRED-6310-040615.patch, MAPREDUCE-6310-06132016.txt, 
> MAPREDUCE-6310-06132016.txt, MAPREDUCE-6310-06132018.txt
>
>
> Previously we used jdiff for Hadoop common and HDFS. Now we're extending the 
> support of jdiff to YARN. Probably we'd like to do similar things with 
> MapReduce? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Updated] (MAPREDUCE-6310) Add jdiff support to MapReduce

2016-08-18 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6310:
---
Attachment: MAPREDUCE-6310-06132018.txt

Updated patch addressing the whitespace issues and ASF warnings.

> Add jdiff support to MapReduce
> --
>
> Key: MAPREDUCE-6310
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6310
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Li Lu
>Assignee: Li Lu
>Priority: Blocker
> Attachments: MAPRED-6310-040615.patch, MAPREDUCE-6310-06132016.txt, 
> MAPREDUCE-6310-06132016.txt, MAPREDUCE-6310-06132018.txt
>
>
> Previously we used jdiff for Hadoop common and HDFS. Now we're extending the 
> support of jdiff to YARN. Probably we'd like to do similar things with 
> MapReduce? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Updated] (MAPREDUCE-6310) Add jdiff support to MapReduce

2016-08-18 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6310:
---
Status: Open  (was: Patch Available)

> Add jdiff support to MapReduce
> --
>
> Key: MAPREDUCE-6310
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6310
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Li Lu
>Assignee: Li Lu
>Priority: Blocker
> Attachments: MAPRED-6310-040615.patch, MAPREDUCE-6310-06132016.txt, 
> MAPREDUCE-6310-06132016.txt
>
>
> Previously we used jdiff for Hadoop common and HDFS. Now we're extending the 
> support of jdiff to YARN. Probably we'd like to do similar things with 
> MapReduce? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Updated] (MAPREDUCE-6711) JobImpl fails to handle preemption events on state COMMITTING

2016-08-17 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6711:
---
Target Version/s: 2.7.4  (was: 2.7.3)

2.7.3 is under release process, changing target-version to 2.7.4.

> JobImpl fails to handle preemption events on state COMMITTING
> -
>
> Key: MAPREDUCE-6711
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6711
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Li Lu
>Assignee: Prabhu Joseph
> Attachments: MAPREDUCE-6711.1.patch, MAPREDUCE-6711.patch
>
>
> When a MR app being preempted on COMMITTING state, we saw the following 
> exceptions in its log:
> {code}
> ERROR [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event 
> at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> JOB_TASK_ATTEMPT_COMPLETED at COMMITTING
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:996)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:138)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1289)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1285)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
> at java.lang.Thread.run(Thread.java:744)
> {code}
> and 
> {code}
> ERROR [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event 
> at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> JOB_MAP_TASK_RESCHEDULED at COMMITTING
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:996)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:138)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1289)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1285)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
> at java.lang.Thread.run(Thread.java:744)
> {code}
> Seems like we need to handle those preemption related events when the job is 
> being committed? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Updated] (MAPREDUCE-6362) History Plugin should be updated

2016-08-17 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6362:
---
Target Version/s: 2.7.4  (was: 2.7.3)

2.7.3 is under release process, changing target-version to 2.7.4.

> History Plugin should be updated
> 
>
> Key: MAPREDUCE-6362
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6362
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.6.1
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: MAPREDUCE-6362.patch
>
>
> As applications complete, the RM tracks their IDs in a completed list. This 
> list is routinely truncated to limit the total number of application 
> remembered by the RM.
> When a user clicks the History for a job, either the browser is redirected to 
> the application's tracking link obtained from the stored application 
> instance. But when the application has been purged from the RM, an error is 
> displayed.
> In very busy clusters the rate at which applications complete can cause 
> applications to be purged from the RM's internal list within hours, which 
> breaks the proxy URLs users have saved for their jobs.
> We would like the RM to provide valid tracking links persist so that users 
> are not frustrated by broken links.
> With the current plugin in place, redirections for the Mapreduce jobs works 
> but we need the add functionality for tez jobs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Updated] (MAPREDUCE-6541) Exclude scheduled reducer memory when calculating available mapper slots from headroom to avoid deadlock

2016-08-17 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6541:
---
Target Version/s: 2.7.4  (was: 2.7.3)

2.7.3 is under release process, changing target-version to 2.7.4.

> Exclude scheduled reducer memory when calculating available mapper slots from 
> headroom to avoid deadlock 
> -
>
> Key: MAPREDUCE-6541
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6541
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Wangda Tan
>Assignee: Varun Saxena
> Attachments: MAPREDUCE-6541.01.patch
>
>
> We saw a MR deadlock recently:
> - When NM restarted by framework without enable recovery, containers running 
> on these nodes will be identified as "ABORTED", and MR AM will try to 
> reschedule "ABORTED" mapper containers.
> - Since such lost mappers are "ABORTED" container, MR AM gives normal mapper 
> priority (priority=20) to such mapper requests. If there's any pending 
> reducer (priority=10) at the same time, mapper requests need to wait for 
> reducer requests satisfied.
> - In our test, one mapper needs 700+ MB, reducer needs 1000+ MB, and RM 
> available resource = mapper-request = (700+ MB), only one job was running in 
> the system so scheduler cannot allocate more reducer containers AND MR-AM 
> thinks there're enough headroom for mapper so reducer containers will not be 
> preempted.
> MAPREDUCE-6302 can solve most of the problems, but in the other hand, I think 
> we may need to exclude scheduled reducers resource when calculating 
> #available-mapper-slots from headroom. Which we can avoid excessive reducer 
> preemption.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Updated] (MAPREDUCE-6310) Add jdiff support to MapReduce

2016-08-01 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6310:
---
Attachment: MAPREDUCE-6310-06132016.txt

> Add jdiff support to MapReduce
> --
>
> Key: MAPREDUCE-6310
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6310
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Li Lu
>Assignee: Li Lu
>Priority: Blocker
> Attachments: MAPRED-6310-040615.patch, MAPREDUCE-6310-06132016.txt, 
> MAPREDUCE-6310-06132016.txt
>
>
> Previously we used jdiff for Hadoop common and HDFS. Now we're extending the 
> support of jdiff to YARN. Probably we'd like to do similar things with 
> MapReduce? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Updated] (MAPREDUCE-6310) Add jdiff support to MapReduce

2016-08-01 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6310:
---
Status: Patch Available  (was: Open)

> Add jdiff support to MapReduce
> --
>
> Key: MAPREDUCE-6310
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6310
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Li Lu
>Assignee: Li Lu
>Priority: Blocker
> Attachments: MAPRED-6310-040615.patch, MAPREDUCE-6310-06132016.txt, 
> MAPREDUCE-6310-06132016.txt
>
>
> Previously we used jdiff for Hadoop common and HDFS. Now we're extending the 
> support of jdiff to YARN. Probably we'd like to do similar things with 
> MapReduce? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Updated] (MAPREDUCE-6310) Add jdiff support to MapReduce

2016-08-01 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6310:
---
Status: Open  (was: Patch Available)

> Add jdiff support to MapReduce
> --
>
> Key: MAPREDUCE-6310
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6310
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Li Lu
>Assignee: Li Lu
>Priority: Blocker
> Attachments: MAPRED-6310-040615.patch, MAPREDUCE-6310-06132016.txt
>
>
> Previously we used jdiff for Hadoop common and HDFS. Now we're extending the 
> support of jdiff to YARN. Probably we'd like to do similar things with 
> MapReduce? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-6654) Possible NPE in JobHistoryEventHandler#handleEvent

2016-08-01 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402538#comment-15402538
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-6654:


bq. To be clear, the events are not lost - it get tracked down with proper log 
messages which is exactly the same as writing other events (with event writer 
setup successfully before) when NN cannot be connected. The bottom line here is 
all event failures should be tracked with error log and get isolated properly 
so won't affect other following up events (and won't cause AM failed).
[~djp] / [~vvasudev], I am not sure if we are getting this right. We depend on 
reliable persistence of these events both in the UI as well as during Job 
recovery after AM restarts.

IIUC, before this patch, the job fails because it couldn't persist the 
information to the history. I think we are better off keeping the events in the 
queue in the same order and keep retrying till we can reconnect back to the 
FileSystem. Which reminds me, why isn't the DFSClient not looping till it 
connects back to the FileSystem?

> Possible NPE in JobHistoryEventHandler#handleEvent
> --
>
> Key: MAPREDUCE-6654
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6654
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Xiao Chen
>Assignee: Junping Du
>Priority: Critical
> Attachments: MAPREDUCE-6654-v2.1.patch, MAPREDUCE-6654-v2.patch, 
> MAPREDUCE-6654.patch
>
>
> I have seen NPE thrown from {{JobHistoryEventHandler#handleEvent}}:
> {noformat}
> 2016-03-14 16:42:15,231 INFO [Thread-69] 
> org.apache.hadoop.service.AbstractService: Service JobHistoryEventHandler 
> failed in state STOPPED; cause: java.lang.NullPointerException
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:570)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.serviceStop(JobHistoryEventHandler.java:382)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   at 
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
>   at 
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
>   at 
> org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
>   at 
> org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStop(MRAppMaster.java:1651)
>   at 
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.stop(MRAppMaster.java:1147)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.shutDownJob(MRAppMaster.java:573)
>   at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobFinishEventHandler$1.run(MRAppMaster.java:620)
> {noformat}
> In the version this exception is thrown, the 
> [line|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/jobhistory/JobHistoryEventHandler.java#L586]
>  is:
> {code:java}mi.writeEvent(historyEvent);{code}
> IMHO, this may be caused by an exception in a previous step. Specifically, in 
> the kerberized environment, when creating event writer which calls to decrypt 
> EEK, the connection to KMS failed. Exception below:
> {noformat} 
> 2016-03-14 16:41:57,559 ERROR [eventHandlingThread] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Error 
> JobHistoryEventHandler in handleEvent: EventType: AM_STARTED
> java.net.SocketTimeoutException: Read timed out
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.read(SocketInputStream.java:152)
>   at java.net.SocketInputStream.read(SocketInputStream.java:122)
>   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>   at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>   at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
>   at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
>   at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1323)
>   at 
> java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468)
>   at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:520)
>   at 
> org.apache.hadoop.crypto.key.kms.KMSClientProvider.call(KMSClientProvider.java:505)
>   at 
>

[jira] [Updated] (MAPREDUCE-6310) Add jdiff support to MapReduce

2016-07-26 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6310:
---
Status: Open  (was: Patch Available)

The existing patch continues to work. I'll just rerun this through Jenkins.

> Add jdiff support to MapReduce
> --
>
> Key: MAPREDUCE-6310
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6310
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Li Lu
>Assignee: Li Lu
>Priority: Blocker
> Attachments: MAPRED-6310-040615.patch, MAPREDUCE-6310-06132016.txt
>
>
> Previously we used jdiff for Hadoop common and HDFS. Now we're extending the 
> support of jdiff to YARN. Probably we'd like to do similar things with 
> MapReduce? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Updated] (MAPREDUCE-6310) Add jdiff support to MapReduce

2016-07-26 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6310:
---
Status: Patch Available  (was: Open)

> Add jdiff support to MapReduce
> --
>
> Key: MAPREDUCE-6310
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6310
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Li Lu
>Assignee: Li Lu
>Priority: Blocker
> Attachments: MAPRED-6310-040615.patch, MAPREDUCE-6310-06132016.txt
>
>
> Previously we used jdiff for Hadoop common and HDFS. Now we're extending the 
> support of jdiff to YARN. Probably we'd like to do similar things with 
> MapReduce? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Updated] (MAPREDUCE-6733) MapReduce JerseyTest tests failing with "java.net.BindException: Address already in use"

2016-07-15 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6733:
---
Attachment: MAPREDUCE-6733-trunk-07-15-2016.txt

Okay, that was for branch-2. Here's the trunk patch.

> MapReduce JerseyTest tests failing with "java.net.BindException: Address 
> already in use"
> 
>
> Key: MAPREDUCE-6733
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6733
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Critical
> Attachments: MAPREDUCE-6733-07-15-2016.txt, 
> MAPREDUCE-6733-trunk-07-15-2016.txt
>
>
> Similar to YARN-2912 / YARN-3433, MR JerseyTests fail when port 9998 is in 
> external use. We should fix the MR tests too similar to YARN-2912.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Updated] (MAPREDUCE-6733) MapReduce JerseyTest tests failing with "java.net.BindException: Address already in use"

2016-07-15 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6733:
---
Status: Patch Available  (was: Open)

> MapReduce JerseyTest tests failing with "java.net.BindException: Address 
> already in use"
> 
>
> Key: MAPREDUCE-6733
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6733
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Critical
> Attachments: MAPREDUCE-6733-07-15-2016.txt, 
> MAPREDUCE-6733-trunk-07-15-2016.txt
>
>
> Similar to YARN-2912 / YARN-3433, MR JerseyTests fail when port 9998 is in 
> external use. We should fix the MR tests too similar to YARN-2912.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Updated] (MAPREDUCE-6733) MapReduce JerseyTest tests failing with "java.net.BindException: Address already in use"

2016-07-15 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6733:
---
Status: Open  (was: Patch Available)

> MapReduce JerseyTest tests failing with "java.net.BindException: Address 
> already in use"
> 
>
> Key: MAPREDUCE-6733
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6733
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Critical
> Attachments: MAPREDUCE-6733-07-15-2016.txt
>
>
> Similar to YARN-2912 / YARN-3433, MR JerseyTests fail when port 9998 is in 
> external use. We should fix the MR tests too similar to YARN-2912.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-6733) MapReduce JerseyTest tests failing with "java.net.BindException: Address already in use"

2016-07-15 Thread Vinod Kumar Vavilapalli (JIRA)

Vinod Kumar Vavilapalli created MAPREDUCE-6733:
--

 Summary: MapReduce JerseyTest tests failing with 
"java.net.BindException: Address already in use"
 Key: MAPREDUCE-6733
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6733
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli
Priority: Critical


Similar to YARN-2912 / YARN-3433, MR JerseyTests fail when port 9998 is in 
external use. We should fix the MR tests too similar to YARN-2912.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Updated] (MAPREDUCE-6733) MapReduce JerseyTest tests failing with "java.net.BindException: Address already in use"

2016-07-15 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6733:
---
Attachment: MAPREDUCE-6733-07-15-2016.txt

A straight forward patch replacing usage of JerseyTest with YARN Common's 
JerseyTestBase.

> MapReduce JerseyTest tests failing with "java.net.BindException: Address 
> already in use"
> 
>
> Key: MAPREDUCE-6733
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6733
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Critical
> Attachments: MAPREDUCE-6733-07-15-2016.txt
>
>
> Similar to YARN-2912 / YARN-3433, MR JerseyTests fail when port 9998 is in 
> external use. We should fix the MR tests too similar to YARN-2912.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Updated] (MAPREDUCE-6733) MapReduce JerseyTest tests failing with "java.net.BindException: Address already in use"

2016-07-15 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6733:
---
Status: Patch Available  (was: Open)

> MapReduce JerseyTest tests failing with "java.net.BindException: Address 
> already in use"
> 
>
> Key: MAPREDUCE-6733
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6733
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: Vinod Kumar Vavilapalli
>Assignee: Vinod Kumar Vavilapalli
>Priority: Critical
> Attachments: MAPREDUCE-6733-07-15-2016.txt
>
>
> Similar to YARN-2912 / YARN-3433, MR JerseyTests fail when port 9998 is in 
> external use. We should fix the MR tests too similar to YARN-2912.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-6711) JobImpl fails to handle preemption events on state COMMITTING

2016-06-15 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332148#comment-15332148
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-6711:


[~Prabhu Joseph], I added you as a contributor, you can now assign tickets to 
yourselves.

> JobImpl fails to handle preemption events on state COMMITTING
> -
>
> Key: MAPREDUCE-6711
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6711
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Li Lu
>
> When a MR app being preempted on COMMITTING state, we saw the following 
> exceptions in its log:
> {code}
> ERROR [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event 
> at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> JOB_TASK_ATTEMPT_COMPLETED at COMMITTING
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:996)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:138)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1289)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1285)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
> at java.lang.Thread.run(Thread.java:744)
> {code}
> and 
> {code}
> ERROR [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl: Can't handle this event 
> at current state
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> JOB_MAP_TASK_RESCHEDULED at COMMITTING
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:996)
> at 
> org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:138)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1289)
> at 
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1285)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
> at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
> at java.lang.Thread.run(Thread.java:744)
> {code}
> Seems like we need to handle those preemption related events when the job is 
> being committed? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Updated] (MAPREDUCE-6310) Add jdiff support to MapReduce

2016-06-13 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6310:
---
Labels:   (was: BB2015-05-TBR)
Status: Patch Available  (was: Open)

> Add jdiff support to MapReduce
> --
>
> Key: MAPREDUCE-6310
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6310
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Li Lu
>Assignee: Li Lu
>Priority: Blocker
> Attachments: MAPRED-6310-040615.patch, MAPREDUCE-6310-06132016.txt
>
>
> Previously we used jdiff for Hadoop common and HDFS. Now we're extending the 
> support of jdiff to YARN. Probably we'd like to do similar things with 
> MapReduce? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Updated] (MAPREDUCE-6310) Add jdiff support to MapReduce

2016-06-13 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6310:
---
Attachment: MAPREDUCE-6310-06132016.txt

Uploading an updated patch for this. Also generating the 2.7.2 files and 
bumping up the latest stable version to 2.7.2 (vs 2.6.0).

[~leftnoteasy], can you look at this too?

> Add jdiff support to MapReduce
> --
>
> Key: MAPREDUCE-6310
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6310
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Li Lu
>Assignee: Li Lu
>Priority: Blocker
> Attachments: MAPRED-6310-040615.patch, MAPREDUCE-6310-06132016.txt
>
>
> Previously we used jdiff for Hadoop common and HDFS. Now we're extending the 
> support of jdiff to YARN. Probably we'd like to do similar things with 
> MapReduce? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Updated] (MAPREDUCE-6310) Add jdiff support to MapReduce

2016-06-13 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6310:
---
Status: Open  (was: Patch Available)

> Add jdiff support to MapReduce
> --
>
> Key: MAPREDUCE-6310
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6310
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Li Lu
>Assignee: Li Lu
>Priority: Blocker
>  Labels: BB2015-05-TBR
> Attachments: MAPRED-6310-040615.patch
>
>
> Previously we used jdiff for Hadoop common and HDFS. Now we're extending the 
> support of jdiff to YARN. Probably we'd like to do similar things with 
> MapReduce? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Updated] (MAPREDUCE-6514) Job hangs as ask is not updated after ramping down of all reducers

2016-05-05 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6514:
---
  Resolution: Fixed
Hadoop Flags: Reviewed
   Fix Version/s: 2.6.5
  2.7.3
Target Version/s: 2.6.4, 2.8.0, 2.7.3  (was: 2.8.0, 2.7.3, 2.6.4)
  Status: Resolved  (was: Patch Available)

Okay, makes sense.

The 2.7 and 2.6 commits ran into conflicts and needed trivial fixes - I did 
them during commit time.

Committed this to trunk, branch-2, branch-2.8, branch-2.7 and branch-2.6.  
Thanks [~varun_saxena] and [~leftnoteasy]!

> Job hangs as ask is not updated after ramping down of all reducers
> --
>
> Key: MAPREDUCE-6514
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6514
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>Priority: Blocker
> Fix For: 2.7.3, 2.6.5
>
> Attachments: MAPREDUCE-6514.01.patch, MAPREDUCE-6514.02.patch
>
>
> In RMContainerAllocator#preemptReducesIfNeeded, we simply clear the scheduled 
> reduces map and put these reducers to pending. This is not updated in ask. So 
> RM keeps on assigning and AM is not able to assign as no reducer is 
> scheduled(check logs below the code).
> If this is updated immediately, RM will be able to schedule mappers 
> immediately which anyways is the intention when we ramp down reducers.
> Scheduler need not allocate for ramped down reducers
> This if not handled can lead to map starvation as pointed out in 
> MAPREDUCE-6513
> {code}
>  LOG.info("Ramping down all scheduled reduces:"
> + scheduledRequests.reduces.size());
> for (ContainerRequest req : scheduledRequests.reduces.values()) {
>   pendingReduces.add(req);
> }
> scheduledRequests.reduces.clear();
> {code}
> {noformat}
> 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container not 
> assigned : container_1437451211867_1485_01_000215
> 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Cannot assign 
> container Container: [ContainerId: container_1437451211867_1485_01_000216, 
> NodeId: hdszzdcxdat6g06u04p:26009, NodeHttpAddress: 
> hdszzdcxdat6g06u04p:26010, Resource: , Priority: 10, 
> Token: Token { kind: ContainerToken, service: 10.2.33.236:26009 }, ] for a 
> reduce as either  container memory less than required 4096 or no pending 
> reduce tasks - reduces.isEmpty=true
> 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container not 
> assigned : container_1437451211867_1485_01_000216
> 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Cannot assign 
> container Container: [ContainerId: container_1437451211867_1485_01_000217, 
> NodeId: hdszzdcxdat6g06u06p:26009, NodeHttpAddress: 
> hdszzdcxdat6g06u06p:26010, Resource: , Priority: 10, 
> Token: Token { kind: ContainerToken, service: 10.2.33.239:26009 }, ] for a 
> reduce as either  container memory less than required 4096 or no pending 
> reduce tasks - reduces.isEmpty=true
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-6514) Job hangs as ask is not updated after ramping down of all reducers

2016-05-05 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15273338#comment-15273338
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-6514:


The patch looks good to me now. Tx [~leftnoteasy] for finishing the patch on 
behalf of [~varun_saxena]!

BTW, why is this a blocker on the 2.6 / 2.7 maint lines?

> Job hangs as ask is not updated after ramping down of all reducers
> --
>
> Key: MAPREDUCE-6514
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6514
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>Priority: Blocker
> Attachments: MAPREDUCE-6514.01.patch, MAPREDUCE-6514.02.patch
>
>
> In RMContainerAllocator#preemptReducesIfNeeded, we simply clear the scheduled 
> reduces map and put these reducers to pending. This is not updated in ask. So 
> RM keeps on assigning and AM is not able to assign as no reducer is 
> scheduled(check logs below the code).
> If this is updated immediately, RM will be able to schedule mappers 
> immediately which anyways is the intention when we ramp down reducers.
> Scheduler need not allocate for ramped down reducers
> This if not handled can lead to map starvation as pointed out in 
> MAPREDUCE-6513
> {code}
>  LOG.info("Ramping down all scheduled reduces:"
> + scheduledRequests.reduces.size());
> for (ContainerRequest req : scheduledRequests.reduces.values()) {
>   pendingReduces.add(req);
> }
> scheduledRequests.reduces.clear();
> {code}
> {noformat}
> 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container not 
> assigned : container_1437451211867_1485_01_000215
> 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Cannot assign 
> container Container: [ContainerId: container_1437451211867_1485_01_000216, 
> NodeId: hdszzdcxdat6g06u04p:26009, NodeHttpAddress: 
> hdszzdcxdat6g06u04p:26010, Resource: , Priority: 10, 
> Token: Token { kind: ContainerToken, service: 10.2.33.236:26009 }, ] for a 
> reduce as either  container memory less than required 4096 or no pending 
> reduce tasks - reduces.isEmpty=true
> 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container not 
> assigned : container_1437451211867_1485_01_000216
> 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Cannot assign 
> container Container: [ContainerId: container_1437451211867_1485_01_000217, 
> NodeId: hdszzdcxdat6g06u06p:26009, NodeHttpAddress: 
> hdszzdcxdat6g06u06p:26010, Resource: , Priority: 10, 
> Token: Token { kind: ContainerToken, service: 10.2.33.239:26009 }, ] for a 
> reduce as either  container memory less than required 4096 or no pending 
> reduce tasks - reduces.isEmpty=true
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

[jira] [Commented] (MAPREDUCE-6608) Work Preserving AM Restart for MapReduce

2016-04-15 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243948#comment-15243948
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-6608:


[~srikanth.sampath] / [~djp],

Got around to reading the design doc attached - there are a few important 
details that aren't covered in the doc, besides the AM discovery problem itself

h4. Output Commit of previous tasks
The new AM needs to make sure that output of previously running containers can 
be safely committed. IIRC, with today's FileOutputCommitter, new AM will only 
promote task-outputs that are present in 
$jobOutput/_temporary/$currentAttemptID/

Similar changes may be needed for other OutputCommitters out there.

h4. Task Output Commit races

It doesn't look like we record task-commit in JobHistory, so it is possible 
that the previous AM gave a commit go-ahead to a taskAttempt which is either 
(a) in the process of committing output or (b) committed the output but fails 
to report to either of the AMs. In this case, two taskAttempts can be 
committing at the same time!

In the same line, without recording the success of a commit after a task 
finishes committing, we will run into issues.

h4. Conflicting TaskAttemptIDs

Today, we launch containers first and then record it in JobHistory. Because of 
this, if the previous AM started a TaskAttempt but crashed before recording it 
in JobHistory, and this oldTaskAttempt somehow cannot get reconnected to the 
new AM due to network issues, the new AM generates the same TaskAttemptID for a 
newer attempt and they both will collide on HDFS and/or the local NM output 
directories if they both happen to run on the same machine.

The above problem will be worse when speculative tasks are involved.

h4. Security
AM should use the same job-token as the previous incarnation otherwise the old 
running tasks will get authentication failures. I quickly checked and it seems 
like the AM itself generates the token, which means the second AM will generate 
a different one and all running tasks will fail to sync back!

h4. Others
bq. In the WP case, upon a loss of connection to the AM the tasks will try and 
reestablish the connection with the new AM.
This will not suffice. It is possible even today, but when a network partition 
occurs and two AMs end up running at the same time and give commit-go 
permission to two TaskAttempts of the same task, they will collide on the 
output-commit.

h4. General comments
This stuff is hard. Even if we forget about the AM discovery problem, I am sure 
others will find a bunch of other design considerations you may be missing now.

I'd suggest spending more time on the design, atleast on some of the areas I 
pointed above and then create a branch, create sub-tasks, do some prototypes 
etc.

> Work Preserving AM Restart for MapReduce
> 
>
> Key: MAPREDUCE-6608
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6608
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Srikanth Sampath
>Assignee: Srikanth Sampath
> Attachments: Patch1.patch, WorkPreservingMRAppMaster-1.pdf, 
> WorkPreservingMRAppMaster-2.pdf, WorkPreservingMRAppMaster.pdf
>
>
> Providing a framework for work preserving AM is achieved in 
> [YARN-1489|https://issues.apache.org/jira/browse/YARN-1489].  We would like 
> to take advantage of this for MapReduce(MR) applications.  There are some 
> challenges which have been described in the attached document and few options 
> discussed.  We solicit feedback from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6670) TestJobListCache#testEviction sometimes fails on Windows with timeout

2016-04-12 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15237499#comment-15237499
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-6670:


[~djp], can this be put on older releases too - 2.8.x, 2.7.x etc?

> TestJobListCache#testEviction sometimes fails on Windows with timeout
> -
>
> Key: MAPREDUCE-6670
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6670
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.7.0, 2.8.0, 2.7.1, 2.7.2, 2.7.3
> Environment: OS: Windows Server 2012
> JDK: 1.7.0_79
>Reporter: Gergely Novák
>Assignee: Gergely Novák
>Priority: Minor
> Fix For: 2.9.0
>
> Attachments: MAPREDUCE-6670.001.patch, MAPREDUCE-6670.002.patch
>
>
> TestJobListCache#testEviction often needs more than 1000 ms to finish in 
> Windows environment. Increasing the timeout solves the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6513) MR job got hanged forever when one NM unstable for some time

2016-04-11 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15235650#comment-15235650
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-6513:


I'm doing a final pass of the review, in the mean while, [~leftnoteasy] can you 
look too?

> MR job got hanged forever when one NM unstable for some time
> 
>
> Key: MAPREDUCE-6513
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6513
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Bob.zhao
>Assignee: Varun Saxena
>Priority: Critical
> Attachments: MAPREDUCE-6513.01.patch, MAPREDUCE-6513.02.patch, 
> MAPREDUCE-6513.03.patch
>
>
> when job is in-progress which is having more tasks,one node became unstable 
> due to some OS issue.After the node became unstable, the map on this node 
> status changed to KILLED state. 
> Currently maps which were running on unstable node are rescheduled, and all 
> are in scheduled state and wait for RM assign container.Seen ask requests for 
> map till Node is good (all those failed), there are no ask request after 
> this. But AM keeps on preempting the reducers (it's recycling).
> Finally reducers are waiting for complete mappers and mappers did n't get 
> container..
> My Question Is:
> 
> why map requests did not sent AM ,once after node recovery.?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6513) MR job got hanged forever when one NM unstable for some time

2016-04-06 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15229405#comment-15229405
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-6513:


[~varun_saxena], let me know if you can update this soon enough for 2.7.3 in a 
couple of days. Otherwise, we can simply move this to 2.8 in few weeks.

> MR job got hanged forever when one NM unstable for some time
> 
>
> Key: MAPREDUCE-6513
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6513
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Bob.zhao
>Assignee: Varun Saxena
>Priority: Critical
> Attachments: MAPREDUCE-6513.01.patch
>
>
> when job is in-progress which is having more tasks,one node became unstable 
> due to some OS issue.After the node became unstable, the map on this node 
> status changed to KILLED state. 
> Currently maps which were running on unstable node are rescheduled, and all 
> are in scheduled state and wait for RM assign container.Seen ask requests for 
> map till Node is good (all those failed), there are no ask request after 
> this. But AM keeps on preempting the reducers (it's recycling).
> Finally reducers are waiting for complete mappers and mappers did n't get 
> container..
> My Question Is:
> 
> why map requests did not sent AM ,once after node recovery.?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6513) MR job got hanged forever when one NM unstable for some time

2016-04-05 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6513:
---
Status: Open  (was: Patch Available)

Tx for the update, [~varun_saxena]!

Apologies for missing your updated patch for this long!

(Reviewing an MR patch after a looong time!)

First up, the patch doesn't apply anymore, can you please update?

I tried to review it despite the conflicts, some comments:
 - The logic looks good overall! You are right that user initiated kill should 
not lead to a higher priority.
 - We want to be sure that existing semantics in RMContainerAllocator about 
failed-maps are really about task-attempts that need to be rescheduled and not 
just failed-maps. I briefly looked, but it will be good for you to also 
reverify!
 - TestTaskAttempt.java
-- Most (all?) of code in can be reused between testContainerKillOnNew and 
testContainerKillOnUnassigned.
-- Also in existing tests, we should leave rescheduleAttempt to be false 
except in the new one testKillMapTaskAfterSuccess. You have enough coverage 
elsewhere that we should simply drop these changes except for the new tests.
 - TestMRApp.java.testUpdatedNodes: Instead of checking for reschedule events, 
is it possible to explicitly check for the higher priority of the corresponding 
request?

> MR job got hanged forever when one NM unstable for some time
> 
>
> Key: MAPREDUCE-6513
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6513
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Bob.zhao
>Assignee: Varun Saxena
>Priority: Critical
> Attachments: MAPREDUCE-6513.01.patch
>
>
> when job is in-progress which is having more tasks,one node became unstable 
> due to some OS issue.After the node became unstable, the map on this node 
> status changed to KILLED state. 
> Currently maps which were running on unstable node are rescheduled, and all 
> are in scheduled state and wait for RM assign container.Seen ask requests for 
> map till Node is good (all those failed), there are no ask request after 
> this. But AM keeps on preempting the reducers (it's recycling).
> Finally reducers are waiting for complete mappers and mappers did n't get 
> container..
> My Question Is:
> 
> why map requests did not sent AM ,once after node recovery.?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6341) Fix typo in mapreduce tutorial

2016-02-16 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6341:
---
Assignee: John Michael Luy  (was: Tsuyoshi Ozawa)

[~ozawa], it seems like [~jmluy] reopened this for additional changes. Can you 
please follow up? Thanks.

> Fix typo in mapreduce tutorial
> --
>
> Key: MAPREDUCE-6341
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6341
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: John Michael Luy
>Assignee: John Michael Luy
>Priority: Trivial
> Fix For: 2.8.0
>
> Attachments: HADOOP-11879.patch, MAPREDUCE-6341.patch
>
>
> There are some typos in the converted tutorial in markdown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6608) Work Preserving AM Restart for MapReduce

2016-02-11 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15143761#comment-15143761
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-6608:


bq. I agree that storing state in zookeeper may have scalability issues. I am 
just thinking that will it be ended up having too many small files in hdfs if 
we are planning to store AM information in HDFS.
A solution for this is already given at YARN-1489 by [~bikassaha]. See this 
comment: 
https://issues.apache.org/jira/browse/YARN-1489?focusedCommentId=13862359=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13862359.

The solution is essentially a combination of registry with YARN acting as a 
distributed readers solution: Registry owns the write path and storage, RM/NMs 
take care of providing scalable reads.

> Work Preserving AM Restart for MapReduce
> 
>
> Key: MAPREDUCE-6608
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6608
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Srikanth Sampath
>Assignee: Srikanth Sampath
> Attachments: Patch1.patch, WorkPreservingMRAppMaster-1.pdf, 
> WorkPreservingMRAppMaster-2.pdf, WorkPreservingMRAppMaster.pdf
>
>
> Providing a framework for work preserving AM is achieved in 
> [YARN-1489|https://issues.apache.org/jira/browse/YARN-1489].  We would like 
> to take advantage of this for MapReduce(MR) applications.  There are some 
> challenges which have been described in the attached document and few options 
> discussed.  We solicit feedback from the community.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (MAPREDUCE-6566) Add retry support to mapreduce CLI tool

2016-02-04 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved MAPREDUCE-6566.

  Resolution: Fixed
Hadoop Flags: Reviewed

[~suda], this JIRA went on for a while, I am going to close this one as fixed. 
Please file a follow up ticket for the test flakiness. Tx.

> Add retry support to mapreduce CLI tool
> ---
>
> Key: MAPREDUCE-6566
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6566
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.7.1
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Fix For: 2.8.0
>
> Attachments: MAPREDUCE-6566.001.patch, MAPREDUCE-6566.002.patch
>
>
> MAPREDUCE-6251 added support for retries to JobClient. However the MR CLI 
> class doesn't use the JobClient. It would be useful to add support for 
> retries to the CLI class as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6451) DistCp has incorrect chunkFilePath for multiple jobs when strategy is dynamic

2016-01-26 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6451:
---
Fix Version/s: (was: 3.0.0)

> DistCp has incorrect chunkFilePath for multiple jobs when strategy is dynamic
> -
>
> Key: MAPREDUCE-6451
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6451
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: distcp
>Affects Versions: 2.6.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Fix For: 2.7.2
>
> Attachments: MAPREDUCE-6451-v1.patch, MAPREDUCE-6451-v2.patch, 
> MAPREDUCE-6451-v3.patch, MAPREDUCE-6451-v4.patch, MAPREDUCE-6451-v5.patch
>
>
> DistCp when used with dynamic strategy does not update the chunkFilePath and 
> other static variables any time other than for the first job. This is seen 
> when DistCp::run() is used. 
> A single copy succeeds but multiple jobs finish successfully without any real 
> copying. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6497) Fix wrong value of JOB_FINISHED event in JobHistoryEventHandler

2016-01-26 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6497:
---
Fix Version/s: (was: 2.8.0)

> Fix wrong value of JOB_FINISHED event in JobHistoryEventHandler
> ---
>
> Key: MAPREDUCE-6497
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6497
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.1
>Reporter: Shinichi Yamashita
>Assignee: Shinichi Yamashita
> Fix For: 2.7.2, 2.6.2
>
> Attachments: MAPREDUCE-6497.001.patch
>
>
> It seems that "MAP_COUNTER_GROUPS" values use total_counter value.
> We should fix to use map_counter value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-5649) Reduce cannot use more than 2G memory for the final merge

2016-01-26 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-5649:
---
Fix Version/s: (was: 2.8.0)

> Reduce cannot use more than 2G memory  for the final merge
> --
>
> Key: MAPREDUCE-5649
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5649
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Reporter: stanley shi
>Assignee: Gera Shegalov
>  Labels: 2.6.1-candidate, 2.7.2-candidate
> Fix For: 2.6.1, 2.7.2
>
> Attachments: MAPREDUCE-5649.001.patch, MAPREDUCE-5649.002.patch, 
> MAPREDUCE-5649.003.patch
>
>
> In the org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.java file, in 
> the finalMerge method: 
>  int maxInMemReduce = (int)Math.min(
> Runtime.getRuntime().maxMemory() * maxRedPer, Integer.MAX_VALUE);
>  
> This means no matter how much memory user has, reducer will not retain more 
> than 2G data in memory before the reduce phase starts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6540) TestMRTimelineEventHandling fails

2016-01-13 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6540:
---
Fix Version/s: (was: 2.7.3)
   (was: 2.8.0)
   2.7.2

> TestMRTimelineEventHandling fails
> -
>
> Key: MAPREDUCE-6540
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6540
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Reporter: Sangjin Lee
>Assignee: Sangjin Lee
> Fix For: 2.7.2, 2.6.3
>
> Attachments: MAPREDUCE-6540.001.patch
>
>
> TestMRTimelineEventHandling fails after YARN-2859 is merged because it 
> changed the port the AHS binds to in a mini cluster.
> {noformat}
> Running org.apache.hadoop.mapred.TestMRTimelineEventHandling
> Tests run: 3, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 184.38 sec 
> <<< FAILURE! - in org.apache.hadoop.mapred.TestMRTimelineEventHandling
> testMRTimelineEventHandling(org.apache.hadoop.mapred.TestMRTimelineEventHandling)
>   Time elapsed: 70.528 sec  <<< ERROR!
> java.io.IOException: Job didn't finish in 30 seconds
>   at 
> org.apache.hadoop.mapred.UtilsForTests.runJobSucceed(UtilsForTests.java:622)
>   at 
> org.apache.hadoop.mapred.TestMRTimelineEventHandling.testMRTimelineEventHandling(TestMRTimelineEventHandling.java:99)
> testMapreduceJobTimelineServiceEnabled(org.apache.hadoop.mapred.TestMRTimelineEventHandling)
>   Time elapsed: 84.312 sec  <<< ERROR!
> java.io.IOException: Job didn't finish in 30 seconds
>   at 
> org.apache.hadoop.mapred.UtilsForTests.runJobSucceed(UtilsForTests.java:622)
>   at 
> org.apache.hadoop.mapred.TestMRTimelineEventHandling.testMapreduceJobTimelineServiceEnabled(TestMRTimelineEventHandling.java:162)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6377) JHS sorting on state column not working in webUi

2016-01-13 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6377:
---
Fix Version/s: (was: 2.7.3)
   2.7.2

Pulled this into 2.7.2 to keep the release up-to-date with 2.6.3. Changing 
fix-versions to reflect the same.

> JHS sorting on state column not working in webUi
> 
>
> Key: MAPREDUCE-6377
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6377
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Affects Versions: 2.7.0
> Environment: 2 NM, JHS
>Reporter: Bibin A Chundatt
>Assignee: zhihai xu
>Priority: Minor
> Fix For: 2.7.2, 2.6.3
>
> Attachments: MAPREDUCE-6377.000.patch, Sorting Issue.png, 
> state_sorted1.pdf, state_sorted2.pdf
>
>
> Steps to reproduce
> 
> 1. Install and setup HA cluster with JHS
> 2.Create state in in JHS where few jobs are killed and Success
> Check sorting State in JHS WebUI
> Actual
> =
> Sorting on state column  not working in JHS
> Expected
> ==
> Sorting on state column should be working



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-5883) "Total megabyte-seconds" in job counters is slightly misleading

2016-01-13 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-5883:
---
Fix Version/s: (was: 2.7.3)
   2.7.2

Pulled this into 2.7.2 to keep the release up-to-date with 2.6.3. Changing 
fix-versions to reflect the same.

> "Total megabyte-seconds" in job counters is slightly misleading
> ---
>
> Key: MAPREDUCE-5883
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5883
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.0.0, 2.4.0
>Reporter: Nathan Roberts
>Assignee: Nathan Roberts
>Priority: Minor
>  Labels: BB2015-05-TBR
> Fix For: 2.7.2, 2.6.3
>
> Attachments: MAPREDUCE-5883.patch
>
>
> The following counters are in milliseconds so "megabyte-seconds" might be 
> better stated as "megabyte-milliseconds"
> MB_MILLIS_MAPS.name=   Total megabyte-seconds taken by all map 
> tasks
> MB_MILLIS_REDUCES.name=Total megabyte-seconds taken by all reduce 
> tasks
> VCORES_MILLIS_MAPS.name=   Total vcore-seconds taken by all map tasks
> VCORES_MILLIS_REDUCES.name=Total vcore-seconds taken by all reduce 
> tasks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6549) multibyte delimiters with LineRecordReader cause duplicate records

2016-01-13 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6549:
---
Fix Version/s: (was: 2.7.3)
   (was: 2.8.0)
   2.7.2

Pulled this into 2.7.2 to keep the release up-to-date with 2.6.3. Changing 
fix-versions to reflect the same.

> multibyte delimiters with LineRecordReader cause duplicate records
> --
>
> Key: MAPREDUCE-6549
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6549
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv1, mrv2
>Affects Versions: 2.7.2
>Reporter: Dustin Cote
>Assignee: Wilfred Spiegelenburg
> Fix For: 2.7.2, 2.6.3
>
> Attachments: MAPREDUCE-6549-1.patch, MAPREDUCE-6549-2.patch, 
> MAPREDUCE-6549.3.patch
>
>
> LineRecorderReader currently produces duplicate records under certain 
> scenarios such as:
> 1) input string: "abc+++def++ghi++" 
> delimiter string: "+++" 
> test passes with all sizes of the split 
> 2) input string: "abc++def+++ghi++" 
> delimiter string: "+++" 
> test fails with a split size of 4 
> 2) input string: "abc+++def++ghi++" 
> delimiter string: "++" 
> test fails with a split size of 5 
> 3) input string "abc+++defg++hij++" 
> delimiter string: "++" 
> test fails with a split size of 4 
> 4) input string "abc++def+++ghi++" 
> delimiter string: "++" 
> test fails with a split size of 9 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6557) Some tests in mapreduce-client-app are writing outside of target

2015-12-22 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15068780#comment-15068780
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-6557:


Was doing an audit of 2.8 commits. Found that the commit message missed 
including the JIRA number. Here's the commit for posterity, it is already in 
trunk, branch-2 and branch-2.8..
{code}
commit 15d577bfbb3f18fc95251d22378b53aa4210115f
Author: Junping Du 
Date:   Wed Nov 25 09:15:26 2015 -0800

Tests in mapreduce-client-app are writing outside of target. Contributed by 
Akira AJISAKA.
{code}

> Some tests in mapreduce-client-app are writing outside of target
> 
>
> Key: MAPREDUCE-6557
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6557
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: build
>Affects Versions: 3.0.0
>Reporter: Allen Wittenauer
>Assignee: Akira AJISAKA
>Priority: Blocker
> Fix For: 2.8.0
>
> Attachments: MAPREDUCE-6555.03.patch, MAPREDUCE-6557.00.patch, 
> MAPREDUCE-6557.004.patch, MAPREDUCE-6557.04.patch
>
>
> There is a staging directory appearing. It should not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6566) Add retry support to mapreduce CLI tool

2015-12-16 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6566:
---
Fix Version/s: (was: 2.7.2)

> Add retry support to mapreduce CLI tool
> ---
>
> Key: MAPREDUCE-6566
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6566
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.7.1
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: MAPREDUCE-6566.001.patch, MAPREDUCE-6566.002.patch
>
>
> MAPREDUCE-6251 added support for retries to JobClient. However the MR CLI 
> class doesn't use the JobClient. It would be useful to add support for 
> retries to the CLI class as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Reopened] (MAPREDUCE-6566) Add retry support to mapreduce CLI tool

2015-12-16 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli reopened MAPREDUCE-6566:


[~xgong], there are many major issues with how this patch got committed.
 - This is an improvement. It should never have been in 2.7.2. Why did you put 
it in 2.7.2?
 - Even if I give you that, it actually doesn't exist in 2.7.2 branch.
 - Neither does it exist in 2.8.0 branch.
 - What exactly happened with that trunk commit? You made too many CHANGES.txt 
modifications there.

I'm reverting both the commits and reopening this JIRA to get this fixed 
correctly.

> Add retry support to mapreduce CLI tool
> ---
>
> Key: MAPREDUCE-6566
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6566
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.7.1
>Reporter: Varun Vasudev
>Assignee: Varun Vasudev
> Attachments: MAPREDUCE-6566.001.patch, MAPREDUCE-6566.002.patch
>
>
> MAPREDUCE-6251 added support for retries to JobClient. However the MR CLI 
> class doesn't use the JobClient. It would be useful to add support for 
> retries to the CLI class as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Reopened] (MAPREDUCE-3065) ApplicationMaster killed by NodeManager due to excessive virtual memory consumption

2015-11-13 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli reopened MAPREDUCE-3065:


> ApplicationMaster killed by NodeManager due to excessive virtual memory 
> consumption
> ---
>
> Key: MAPREDUCE-3065
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3065
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.0.0-alpha
>Reporter: Chris Riccomini
>
> > Hey Vinod,
> > 
> > OK, so I have a little more clarity into this.
> > 
> > When I bump my resource request for my AM to 4096, it runs. The important 
> > line in the NM logs is:
> > 
> > 2011-09-21 13:43:44,366 INFO  monitor.ContainersMonitorImpl 
> > (ContainersMonitorImpl.java:run(402)) - Memory usage of ProcessTree 25656 
> > for container-id container_1316637655278_0001_01_01 : Virtual 
> > 2260938752 bytes, limit : 4294967296 bytes; Physical 120860672 bytes, limit 
> > -1 bytes
> > 
> > The thing to note is the virtual memory, which is off the charts, even 
> > though my physical memory is almost nothing (12 megs). I'm still poking 
> > around the code, but I am noticing that there are two checks in the NM, one 
> > for virtual mem, and one for physical mem. The virtual memory check appears 
> > to be toggle-able, but is presumably defaulted to on.
> > 
> > At this point I'm trying to figure out exactly what the VMEM check is for, 
> > why YARN thinks my app is taking 2 gigs, and how to fix this.
> > 
> > Cheers,
> > Chris
> > 
> > From: Chris Riccomini [criccom...@linkedin.com]
> > Sent: Wednesday, September 21, 2011 1:42 PM
> > To: mapreduce-...@hadoop.apache.org
> > Subject: Re: ApplicationMaster Memory Usage
> > 
> > For the record, I bumped to 4096 for memory resource request, and it works.
> > :(
> > 
> > 
> > On 9/21/11 1:32 PM, "Chris Riccomini"  wrote:
> > 
> >> Hey Vinod,
> >> 
> >> So, I ran my application master directly from the CLI. I commented out the
> >> YARN-specific code. It runs fine without leaking memory.
> >> 
> >> I then ran it from YARN, with all YARN-specific code commented it. It again
> >> ran fine.
> >> 
> >> I then uncommented JUST my registerWithResourceManager call. It then fails
> >> with OOM after a few seconds. I call registerWithResourceManager, and then 
> >> go
> >> into a while(true) { println("yeh") sleep(1000) }. Doing this prints:
> >> 
> >> yeh
> >> yeh
> >> yeh
> >> yeh
> >> yeh
> >> 
> >> At which point, it dies, and, in the NodeManager,I see:
> >> 
> >> 2011-09-21 13:24:51,036 WARN  monitor.ContainersMonitorImpl
> >> (ContainersMonitorImpl.java:isProcessTreeOverLimit(289)) - Process tree for
> >> container: container_1316626117280_0005_01_01 has processes older than 
> >> 1
> >> iteration running over the configured limit. Limit=2147483648, current 
> >> usage =
> >> 2192773120
> >> 2011-09-21 13:24:51,037 WARN  monitor.ContainersMonitorImpl
> >> (ContainersMonitorImpl.java:run(453)) - Container
> >> [pid=23852,containerID=container_1316626117280_0005_01_01] is running
> >> beyond memory-limits. Current usage : 2192773120bytes. Limit :
> >> 2147483648bytes. Killing container.
> >> Dump of the process-tree for container_1316626117280_0005_01_01 :
> >> |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) 
> >> SYSTEM_TIME(MILLIS)
> >> VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
> >> |- 23852 20570 23852 23852 (bash) 0 0 108638208 303 /bin/bash -c java 
> >> -Xmx512M
> >> -cp './package/*' kafka.yarn.ApplicationMaster
> >> /home/criccomi/git/kafka-yarn/dist/kafka-streamer.tgz 5 1 1316626117280
> >> com.linkedin.TODO 1
> >> 1>/tmp/logs/application_1316626117280_0005/container_1316626117280_0005_01_000
> >> 001/stdout
> >> 2>/tmp/logs/application_1316626117280_0005/container_1316626117280_0005_01_000
> >> 001/stderr
> >> |- 23855 23852 23852 23852 (java) 81 4 2084134912 14772 java -Xmx512M -cp
> >> ./package/* kafka.yarn.ApplicationMaster
> >> /home/criccomi/git/kafka-yarn/dist/kafka-streamer.tgz 5 1 1316626117280
> >> com.linkedin.TODO 1
> >> 2011-09-21 13:24:51,037 INFO  monitor.ContainersMonitorImpl
> >> (ContainersMonitorImpl.java:run(463)) - Removed ProcessTree with root 23852
> >> 
> >> Either something is leaking in YARN, or my registerWithResourceManager code
> >> (see below) is doing something funky.
> >> 
> >> I'm trying to avoid going through all the pain of attaching a remote 
> >> debugger.
> >> Presumably things aren't leaking in YARN, which means it's likely that I'm
> >> doing something wrong in my registration code.
> >> 
> >> Incidentally, my NodeManager is running with 1000 megs. My application 
> >> master
> >> memory is set to 2048, and my -Xmx setting is

[jira] [Resolved] (MAPREDUCE-3065) ApplicationMaster killed by NodeManager due to excessive virtual memory consumption

2015-11-13 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved MAPREDUCE-3065.

Resolution: Duplicate

Resolving correctly as a dup of MAPREDUCE-3068.

> ApplicationMaster killed by NodeManager due to excessive virtual memory 
> consumption
> ---
>
> Key: MAPREDUCE-3065
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3065
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.0.0-alpha
>Reporter: Chris Riccomini
>
> > Hey Vinod,
> > 
> > OK, so I have a little more clarity into this.
> > 
> > When I bump my resource request for my AM to 4096, it runs. The important 
> > line in the NM logs is:
> > 
> > 2011-09-21 13:43:44,366 INFO  monitor.ContainersMonitorImpl 
> > (ContainersMonitorImpl.java:run(402)) - Memory usage of ProcessTree 25656 
> > for container-id container_1316637655278_0001_01_01 : Virtual 
> > 2260938752 bytes, limit : 4294967296 bytes; Physical 120860672 bytes, limit 
> > -1 bytes
> > 
> > The thing to note is the virtual memory, which is off the charts, even 
> > though my physical memory is almost nothing (12 megs). I'm still poking 
> > around the code, but I am noticing that there are two checks in the NM, one 
> > for virtual mem, and one for physical mem. The virtual memory check appears 
> > to be toggle-able, but is presumably defaulted to on.
> > 
> > At this point I'm trying to figure out exactly what the VMEM check is for, 
> > why YARN thinks my app is taking 2 gigs, and how to fix this.
> > 
> > Cheers,
> > Chris
> > 
> > From: Chris Riccomini [criccom...@linkedin.com]
> > Sent: Wednesday, September 21, 2011 1:42 PM
> > To: mapreduce-...@hadoop.apache.org
> > Subject: Re: ApplicationMaster Memory Usage
> > 
> > For the record, I bumped to 4096 for memory resource request, and it works.
> > :(
> > 
> > 
> > On 9/21/11 1:32 PM, "Chris Riccomini"  wrote:
> > 
> >> Hey Vinod,
> >> 
> >> So, I ran my application master directly from the CLI. I commented out the
> >> YARN-specific code. It runs fine without leaking memory.
> >> 
> >> I then ran it from YARN, with all YARN-specific code commented it. It again
> >> ran fine.
> >> 
> >> I then uncommented JUST my registerWithResourceManager call. It then fails
> >> with OOM after a few seconds. I call registerWithResourceManager, and then 
> >> go
> >> into a while(true) { println("yeh") sleep(1000) }. Doing this prints:
> >> 
> >> yeh
> >> yeh
> >> yeh
> >> yeh
> >> yeh
> >> 
> >> At which point, it dies, and, in the NodeManager,I see:
> >> 
> >> 2011-09-21 13:24:51,036 WARN  monitor.ContainersMonitorImpl
> >> (ContainersMonitorImpl.java:isProcessTreeOverLimit(289)) - Process tree for
> >> container: container_1316626117280_0005_01_01 has processes older than 
> >> 1
> >> iteration running over the configured limit. Limit=2147483648, current 
> >> usage =
> >> 2192773120
> >> 2011-09-21 13:24:51,037 WARN  monitor.ContainersMonitorImpl
> >> (ContainersMonitorImpl.java:run(453)) - Container
> >> [pid=23852,containerID=container_1316626117280_0005_01_01] is running
> >> beyond memory-limits. Current usage : 2192773120bytes. Limit :
> >> 2147483648bytes. Killing container.
> >> Dump of the process-tree for container_1316626117280_0005_01_01 :
> >> |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) 
> >> SYSTEM_TIME(MILLIS)
> >> VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
> >> |- 23852 20570 23852 23852 (bash) 0 0 108638208 303 /bin/bash -c java 
> >> -Xmx512M
> >> -cp './package/*' kafka.yarn.ApplicationMaster
> >> /home/criccomi/git/kafka-yarn/dist/kafka-streamer.tgz 5 1 1316626117280
> >> com.linkedin.TODO 1
> >> 1>/tmp/logs/application_1316626117280_0005/container_1316626117280_0005_01_000
> >> 001/stdout
> >> 2>/tmp/logs/application_1316626117280_0005/container_1316626117280_0005_01_000
> >> 001/stderr
> >> |- 23855 23852 23852 23852 (java) 81 4 2084134912 14772 java -Xmx512M -cp
> >> ./package/* kafka.yarn.ApplicationMaster
> >> /home/criccomi/git/kafka-yarn/dist/kafka-streamer.tgz 5 1 1316626117280
> >> com.linkedin.TODO 1
> >> 2011-09-21 13:24:51,037 INFO  monitor.ContainersMonitorImpl
> >> (ContainersMonitorImpl.java:run(463)) - Removed ProcessTree with root 23852
> >> 
> >> Either something is leaking in YARN, or my registerWithResourceManager code
> >> (see below) is doing something funky.
> >> 
> >> I'm trying to avoid going through all the pain of attaching a remote 
> >> debugger.
> >> Presumably things aren't leaking in YARN, which means it's likely that I'm
> >> doing something wrong in my registration code.
> >> 
> >> Incidentally, my NodeManager is running with 1000 megs. My

[jira] [Resolved] (MAPREDUCE-1901) Jobs should not submit the same jar files over and over again

2015-11-13 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved MAPREDUCE-1901.

Resolution: Duplicate

YARN-1492 implemented a solution reasonably close to what [~jsensarma] 
proposed. Closing this very old JIRA as a dup, please reopen if you disagree.

> Jobs should not submit the same jar files over and over again
> -
>
> Key: MAPREDUCE-1901
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-1901
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Joydeep Sen Sarma
> Attachments: 1901.PATCH, 1901.PATCH
>
>
> Currently each Hadoop job uploads the required resources 
> (jars/files/archives) to a new location in HDFS. Map-reduce nodes involved in 
> executing this job would then download these resources into local disk.
> In an environment where most of the users are using a standard set of jars 
> and files (because they are using a framework like Hive/Pig) - the same jars 
> keep getting uploaded and downloaded repeatedly. The overhead of this 
> protocol (primarily in terms of end-user latency) is significant when:
> - the jobs are small (and conversantly - large in number)
> - Namenode is under load (meaning hdfs latencies are high and made worse, in 
> part, by this protocol)
> Hadoop should provide a way for jobs in a cooperative environment to not 
> submit the same files over and again. Identifying and caching execution 
> resources by a content signature (md5/sha) would be a good alternative to 
> have available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6542) HistoryViewer use SimpleDateFormat,But SimpleDateFormat is not threadsafe

2015-11-10 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6542:
---
Target Version/s: 2.6.3, 2.7.3  (was: 2.7.2)
   Fix Version/s: (was: 2.7.2)

[~piaoyu zhang], FYI, Fix-version is set by committers at commit time, so 
please only use Target-Version to express your intended releases. Thanks.

Also, 2.7.2 is ready for release, moving this to 2.7.3.

Fixing this myself for now and also adding it to 2.6.3 per comment above.

> HistoryViewer use SimpleDateFormat,But SimpleDateFormat is not threadsafe
> -
>
> Key: MAPREDUCE-6542
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6542
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Affects Versions: 2.2.0, 2.7.1
> Environment: CentOS6.5 Hadoop  
>Reporter: zhangyubiao
>Assignee: zhangyubiao
> Attachments: MAPREDUCE-6542-v2.patch, MAPREDUCE-6542.patch
>
>
> I use SimpleDateFormat to Parse the JobHistory File before 
> private static final SimpleDateFormat dateFormat =
> new SimpleDateFormat("-MM-dd HH:mm:ss");
>  public static String getJobDetail(JobInfo job) {
> StringBuffer jobDetails = new StringBuffer("");
> SummarizedJob ts = new SummarizedJob(job);
> jobDetails.append(job.getJobId().toString().trim()).append("\t");
> jobDetails.append(job.getUsername()).append("\t");
> jobDetails.append(job.getJobname().replaceAll("\\n", 
> "")).append("\t");
> jobDetails.append(job.getJobQueueName()).append("\t");
> jobDetails.append(job.getPriority()).append("\t");
> jobDetails.append(job.getJobConfPath()).append("\t");
> jobDetails.append(job.getUberized()).append("\t");
> 
> jobDetails.append(dateFormat.format(job.getSubmitTime())).append("\t");
> 
> jobDetails.append(dateFormat.format(job.getLaunchTime())).append("\t");
> 
> jobDetails.append(dateFormat.format(job.getFinishTime())).append("\t");
>return jobDetails.toString();
> }
> But I find I query the SubmitTime and LaunchTime in hive and compare 
> JobHistory File time , I find that the submitTime  and launchTime was wrong.
> Finally,I chang to use the FastDateFormat to parse the time format and the 
> time become right 
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6362) History Plugin should be updated

2015-11-03 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6362:
---
Target Version/s: 2.7.3  (was: 2.7.2)

Moving out all non-critical / non-blocker issues that didn't make it out of 
2.7.2 into 2.7.3.

> History Plugin should be updated
> 
>
> Key: MAPREDUCE-6362
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6362
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.6.1
>Reporter: Mit Desai
>Assignee: Mit Desai
> Attachments: MAPREDUCE-6362.patch
>
>
> As applications complete, the RM tracks their IDs in a completed list. This 
> list is routinely truncated to limit the total number of application 
> remembered by the RM.
> When a user clicks the History for a job, either the browser is redirected to 
> the application's tracking link obtained from the stored application 
> instance. But when the application has been purged from the RM, an error is 
> displayed.
> In very busy clusters the rate at which applications complete can cause 
> applications to be purged from the RM's internal list within hours, which 
> breaks the proxy URLs users have saved for their jobs.
> We would like the RM to provide valid tracking links persist so that users 
> are not frustrated by broken links.
> With the current plugin in place, redirections for the Mapreduce jobs works 
> but we need the add functionality for tez jobs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (MAPREDUCE-6355) 2.5 client cannot communicate with 2.5 job on 2.6 cluster

2015-11-03 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli resolved MAPREDUCE-6355.

Resolution: Won't Fix

Similar to YARN-3575, this token format-change (YARN-668) was a _"necessary 
evil"_ to support rolling-upgrades starting Hadoop 2.6. I requested offline 
that this be filed largely for documentation concerns.

There is only one way sites can avoid this incompatibility: all apps are to be 
upgraded to 2.6+ once you migrate your cluster to 2.6+ from < 2.6.

We *can* have an elaborate fix where the 2.5 clients tells YARN of its version 
so that RM can generate and propagate the right token-format, but at this 
stage, I am not sure of its value. Closing this for now as won't fix. Please 
reopen if you disagree. Thanks.


> 2.5 client cannot communicate with 2.5 job on 2.6 cluster
> -
>
> Key: MAPREDUCE-6355
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6355
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>
> Trying to run a job on a Hadoop 2.6 cluster from a Hadoop 2.5 client 
> submitting a job that uses Hadoop 2.5 jars results in a job that succeeds but 
> the client cannot communicate with the AM while the job is running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6514) Job hangs as ask is not updated after ramping down of all reducers

2015-10-30 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6514:
---
Status: Open  (was: Patch Available)

h4. Comment on current patch
You should look at {{rampDownReduces()}} API and use it instead of hand-rolling 
{{decContainerReq}}. I actually think once we do this, you should remove 
{{clearAllPendingReduceRequests()}} altogether.

I am looking at branch-2 and I think the current patch is better served on top 
of MAPREDUCE-6302 (and this only in 2.8+) given the numerous changes made 
there. The patch obviously doesn't apply on branch-2.7 which you set the 
target-version as (2.7.2). Canceling the patch.

h4. Meta thought
If MAPREDUCE-6513 goes through per my latest proposal there, there is no need 
for canceling all the reduce asks and thus this patch, no? 

h4. Release
IAC, this has been a long-standing problem (though I'm very surprised nobody 
caught this till now), so I'd propose we move this out into 2.7.3 or better 
2.8+ so I can make progress on the 2.7.2 release. Thoughts?

> Job hangs as ask is not updated after ramping down of all reducers
> --
>
> Key: MAPREDUCE-6514
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6514
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster
>Affects Versions: 2.7.1
>Reporter: Varun Saxena
>Assignee: Varun Saxena
>Priority: Critical
> Attachments: MAPREDUCE-6514.01.patch
>
>
> In RMContainerAllocator#preemptReducesIfNeeded, we simply clear the scheduled 
> reduces map and put these reducers to pending. This is not updated in ask. So 
> RM keeps on assigning and AM is not able to assign as no reducer is 
> scheduled(check logs below the code).
> If this is updated immediately, RM will be able to schedule mappers 
> immediately which anyways is the intention when we ramp down reducers.
> Scheduler need not allocate for ramped down reducers
> This if not handled can lead to map starvation as pointed out in 
> MAPREDUCE-6513
> {code}
>  LOG.info("Ramping down all scheduled reduces:"
> + scheduledRequests.reduces.size());
> for (ContainerRequest req : scheduledRequests.reduces.values()) {
>   pendingReduces.add(req);
> }
> scheduledRequests.reduces.clear();
> {code}
> {noformat}
> 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container not 
> assigned : container_1437451211867_1485_01_000215
> 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Cannot assign 
> container Container: [ContainerId: container_1437451211867_1485_01_000216, 
> NodeId: hdszzdcxdat6g06u04p:26009, NodeHttpAddress: 
> hdszzdcxdat6g06u04p:26010, Resource: , Priority: 10, 
> Token: Token { kind: ContainerToken, service: 10.2.33.236:26009 }, ] for a 
> reduce as either  container memory less than required 4096 or no pending 
> reduce tasks - reduces.isEmpty=true
> 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container not 
> assigned : container_1437451211867_1485_01_000216
> 2015-10-13 04:55:04,912 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Cannot assign 
> container Container: [ContainerId: container_1437451211867_1485_01_000217, 
> NodeId: hdszzdcxdat6g06u06p:26009, NodeHttpAddress: 
> hdszzdcxdat6g06u06p:26010, Resource: , Priority: 10, 
> Token: Token { kind: ContainerToken, service: 10.2.33.239:26009 }, ] for a 
> reduce as either  container memory less than required 4096 or no pending 
> reduce tasks - reduces.isEmpty=true
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6513) MR job got hanged forever when one NM unstable for some time

2015-10-30 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14983421#comment-14983421
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-6513:


Went through the discussion. Here's what we should do, mostly agreeing with 
what [~chen317] says.
 - Node failure should not be counted towards task-attempt count. So, yes, 
let's continue to mark such tasks as killed.
 - Rescheduling of this killed task can (and must) take higher priority 
independent of whether it is marked as killed or failed. In fact, this was how 
we originally designed the failed-map-should-have-higher-priority concept. In 
sprit, fail-fast-map actually meant maps which retroactively failed, like in 
this case.

[~varun_saxena], I can take a stab at this if you don't have cycles. Let me 
know either-ways.

IAC, this has been a long-standing problem (though I'm very surprised nobody 
caught this till now), so I'd propose we move this out into 2.7.3 so I can make 
progress on the 2.7.2 release. Thoughts? /cc [~Jobo]

> MR job got hanged forever when one NM unstable for some time
> 
>
> Key: MAPREDUCE-6513
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6513
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster, resourcemanager
>Affects Versions: 2.7.0
>Reporter: Bob
>Assignee: Varun Saxena
>Priority: Critical
>
> when job is in-progress which is having more tasks,one node became unstable 
> due to some OS issue.After the node became unstable, the map on this node 
> status changed to KILLED state. 
> Currently maps which were running on unstable node are rescheduled, and all 
> are in scheduled state and wait for RM assign container.Seen ask requests for 
> map till Node is good (all those failed), there are no ask request after 
> this. But AM keeps on preempting the reducers (it's recycling).
> Finally reducers are waiting for complete mappers and mappers did n't get 
> container..
> My Question Is:
> 
> why map requests did not sent AM ,once after node recovery.?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6528) Memory leak for HistoryFileManager.getJobSummary()

2015-10-29 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14981508#comment-14981508
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-6528:


I'd like to get this put in 2.6.3 too, so it'd be great if we can make this 
work across JDKs. Thanks.

> Memory leak for HistoryFileManager.getJobSummary()
> --
>
> Key: MAPREDUCE-6528
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6528
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
> Attachments: MAPREDUCE-6528.patch
>
>
> We meet memory leak issues for JHS in a large cluster which is caused by code 
> below doesn't release FSDataInputStream in exception case. MAPREDUCE-6273 
> should fix most cases that exceptions get thrown. However, we still need to 
> fix the memory leak for occasional case.
> {code} 
> private String getJobSummary(FileContext fc, Path path) throws IOException {
> Path qPath = fc.makeQualified(path);
> FSDataInputStream in = fc.open(qPath);
> String jobSummaryString = in.readUTF();
> in.close();
> return jobSummaryString;
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (MAPREDUCE-5649) Reduce cannot use more than 2G memory for the final merge

2015-09-28 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli closed MAPREDUCE-5649.
--

> Reduce cannot use more than 2G memory  for the final merge
> --
>
> Key: MAPREDUCE-5649
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5649
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Reporter: stanley shi
>Assignee: Gera Shegalov
>  Labels: 2.6.1-candidate, 2.7.2-candidate
> Fix For: 2.6.1, 2.8.0, 2.7.2
>
> Attachments: MAPREDUCE-5649.001.patch, MAPREDUCE-5649.002.patch, 
> MAPREDUCE-5649.003.patch
>
>
> In the org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.java file, in 
> the finalMerge method: 
>  int maxInMemReduce = (int)Math.min(
> Runtime.getRuntime().maxMemory() * maxRedPer, Integer.MAX_VALUE);
>  
> This means no matter how much memory user has, reducer will not retain more 
> than 2G data in memory before the reduce phase starts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-4462) Enhance readability of TestFairScheduler.java

2015-09-28 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-4462:
---
Fix Version/s: (was: 2.4.0)

Dropping fix-version.

> Enhance readability of TestFairScheduler.java
> -
>
> Key: MAPREDUCE-4462
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4462
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: scheduler, test
>Reporter: Ryan Hennig
>Priority: Minor
>  Labels: comments, test
> Attachments: MAPREDUCE-4462.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> While reading over the unit tests for the Fair Scheduler introduced by 
> MAPREDUCE-3451, I added comments to make the logic of the test easier to grok 
> quickly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6099) Adding getSplits(JobContext job, List stats) to mapreduce CombineFileInputFormat

2015-09-28 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6099:
---
Fix Version/s: (was: 2.4.1)

Dropping fix-version.

> Adding  getSplits(JobContext job, List stats) to mapreduce 
> CombineFileInputFormat
> -
>
> Key: MAPREDUCE-6099
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6099
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: client
>Affects Versions: 2.4.1
>Reporter: Pankit Thapar
>Priority: Critical
> Attachments: MAPREDUCE-6099.patch
>
>
> Currently we have getSplits(JobContext job) in CombineFileInputFormat. 
> This api does not give freedom to the client to create a list if file status 
> it self and then create splits on the resultant List stats.
> The client might be able to perform some filtering on its end on the File 
> sets in the input paths. For the reasons, above it would be a good idea to 
> have getSplits(JobContext, List).
> Please let me know what you think about this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-5559) Reconsidering the policy of ignoring the blacklist after reaching the threshold

2015-09-28 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-5559:
---
Fix Version/s: (was: 2.4.0)

Dropping fix-version.

> Reconsidering the policy of ignoring the blacklist after reaching the 
> threshold
> ---
>
> Key: MAPREDUCE-5559
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5559
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 2.1.1-beta
>Reporter: Zhijie Shen
>Assignee: Zhijie Shen
>
> Nowadays, when MR AM find the number of blacklisted nodes reaches one 
> threshold, the blacklist will be totally ignored. The newly assigned 
> containers on the blacklisted nodes will be allocated. This may be not the 
> best practice. We need to reconsider of it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-4468) Encapsulate FairScheduler preemption logic into helper class

2015-09-28 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-4468:
---
Fix Version/s: (was: 2.4.0)

Dropping fix-version.

> Encapsulate FairScheduler preemption logic into helper class
> 
>
> Key: MAPREDUCE-4468
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4468
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: scheduler
>Reporter: Ryan Hennig
>Priority: Minor
>  Labels: refactoring, scheduler
> Attachments: MAPREDUCE-4468.patch
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> I've extracted the preemption logic from the Fair Scheduler into a helper 
> class so that FairScheduler is closer to following the Single Responsibility 
> Principle.  This may eventually evolve into a generalized preemption module 
> which could be leveraged by other schedulers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-5188) error when verify FileType of RS_SOURCE in getCompanionBlocks in BlockPlacementPolicyRaid.java

2015-09-28 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-5188:
---
Fix Version/s: (was: 2.0.2-alpha)

Please use "Target Version" for your intention. Dropping fix-version as it is 
only supposed to be set at patch commit time.

> error when verify FileType of RS_SOURCE in getCompanionBlocks  in 
> BlockPlacementPolicyRaid.java
> ---
>
> Key: MAPREDUCE-5188
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5188
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: contrib/raid
>Affects Versions: 2.0.2-alpha
>Reporter: junjin
>Assignee: junjin
>Priority: Critical
>  Labels: BB2015-05-TBR, contrib/raid
> Attachments: MAPREDUCE-5188.patch
>
>
> error when verify FileType of RS_SOURCE in getCompanionBlocks  in 
> BlockPlacementPolicyRaid.java
> need change xorParityLength in line #379 to rsParityLength since it's for 
> verifying RS_SOURCE  type



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6334) Fetcher#copyMapOutput is leaking usedMemory upon IOException during InMemoryMapOutput shuffle handler

2015-09-23 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6334:
---
Target Version/s: 2.7.1, 2.6.2  (was: 2.7.1)

Targeting 2.6.2 per Eric's comment in the mailing lists.

> Fetcher#copyMapOutput is leaking usedMemory upon IOException during 
> InMemoryMapOutput shuffle handler
> -
>
> Key: MAPREDUCE-6334
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6334
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Eric Payne
>Assignee: Eric Payne
>Priority: Blocker
> Fix For: 2.7.1
>
> Attachments: MAPREDUCE-6334.001.patch, MAPREDUCE-6334.002.patch
>
>
> We are seeing this happen when
> - an NM's disk goes bad during the creation of map output(s)
> - the reducer's fetcher can read the shuffle header and reserve the memory
> - but gets an IOException when trying to shuffle for InMemoryMapOutput
> - shuffle fetch retry is enabled



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6355) 2.5 client cannot communicate with 2.5 job on 2.6 cluster

2015-09-23 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6355:
---
  Labels:   (was: 2.6.1-candidate)
Target Version/s: 2.7.2, 2.6.2

Dropping 2.6.1-candidate label, 2.6.1 is out now. Targetting 2.6.2 / 2.7.2.

> 2.5 client cannot communicate with 2.5 job on 2.6 cluster
> -
>
> Key: MAPREDUCE-6355
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6355
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>
> Trying to run a job on a Hadoop 2.6 cluster from a Hadoop 2.5 client 
> submitting a job that uses Hadoop 2.5 jars results in a job that succeeds but 
> the client cannot communicate with the AM while the job is running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-5649) Reduce cannot use more than 2G memory for the final merge

2015-09-10 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-5649:
---
Fix Version/s: 2.7.2

Just pulled this into branch-2.7 (release 2.7.2) as it already exists in 2.6.1.

branch-2 patch applies cleanly. Ran compilation and TestMergeManager before the 
push.

> Reduce cannot use more than 2G memory  for the final merge
> --
>
> Key: MAPREDUCE-5649
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5649
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Reporter: stanley shi
>Assignee: Gera Shegalov
>  Labels: 2.6.1-candidate, 2.7.2-candidate
> Fix For: 2.6.1, 2.8.0, 2.7.2
>
> Attachments: MAPREDUCE-5649.001.patch, MAPREDUCE-5649.002.patch, 
> MAPREDUCE-5649.003.patch
>
>
> In the org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.java file, in 
> the finalMerge method: 
>  int maxInMemReduce = (int)Math.min(
> Runtime.getRuntime().maxMemory() * maxRedPer, Integer.MAX_VALUE);
>  
> This means no matter how much memory user has, reducer will not retain more 
> than 2G data in memory before the reduce phase starts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host

2015-09-03 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6361:
---
Fix Version/s: 2.6.1

Pulled this into 2.6.1, after fixing a minor merge conflict in 
TestShuffleScheduler.

Ran compilation and TestShuffleScheduler before the push.

> NPE issue in shuffle caused by concurrent issue between copySucceeded() in 
> one thread and copyFailed() in another thread on the same host
> -
>
> Key: MAPREDUCE-6361
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Junping Du
>Assignee: Junping Du
>Priority: Critical
>  Labels: 2.6.1-candidate
> Fix For: 2.6.1, 2.7.1
>
> Attachments: MAPREDUCE-6361-v1.patch
>
>
> The failure in log:
> 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
> shuffle in fetcher#25
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
>  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
>  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:415)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.NullPointerException
>  at 
> org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267)
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308)
>  at 
> org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6267) Refactor JobSubmitter#copyAndConfigureFiles into it's own class

2015-09-02 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6267:
---
   Labels: 2.6.1-candidate  (was: )
Fix Version/s: 2.6.1

Added this to 2.6.1 as a dependency for MAPREDUCE-6238. 

Pulled this into 2.6.1, the patch had conflicts in Job.java and 
JobSumitter.java which I fixed.

Ran compilation before the push. Patch applied cleanly.

> Refactor JobSubmitter#copyAndConfigureFiles into it's own class
> ---
>
> Key: MAPREDUCE-6267
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6267
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Chris Trezzo
>Assignee: Chris Trezzo
>Priority: Minor
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0, 2.6.1
>
> Attachments: MAPREDUCE-6267-trunk-v1.patch
>
>
> Refactor the uploading logic in JobSubmitter#copyAndConfigureFiles into it's 
> own class. This makes the JobSubmitter class more readable and isolates the 
> logic that is actually uploading the job resources to HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (MAPREDUCE-6238) MR2 can't run local jobs with -libjars command options which is a regression from MR1

2015-09-02 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14727828#comment-14727828
 ] 

Vinod Kumar Vavilapalli edited comment on MAPREDUCE-6238 at 9/2/15 6:43 PM:


Added MAPREDUCE-6267 as a dependency for this patch.

This patch applied cleanly.

Pulled this into 2.6.1. Ran compilation and TestLocalJobSubmission before the 
push.


was (Author: vinodkv):
Added this to 2.6.1 as a dependency for MAPREDUCE-6267. Patch applied cleanly.

Pulled this into 2.6.1. Ran compilation and TestLocalJobSubmission before the 
push.

> MR2 can't run local jobs with -libjars command options which is a regression 
> from MR1
> -
>
> Key: MAPREDUCE-6238
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6238
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
>  Labels: 2.6.1-candidate
> Fix For: 2.6.1, 2.7.1
>
> Attachments: MAPREDUCE-6238.000.patch
>
>
> MR2 can't run local jobs with -libjars command options which is a regression 
> from MR1. 
> When run MR2 job with -jt local and -libjars, the job fails with 
> java.io.FileNotFoundException: File does not exist: 
> hdfs://XXX.jar.
> But the same command is working in MR1.
> I find the problem is
> 1.
> because when MR2 run local job using  LocalJobRunner
> from JobSubmitter, the JobSubmitter#jtFs is local filesystem,
> So copyRemoteFiles will return from [the middle of the 
> function|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobSubmitter.java#L138]
> because source and destination file system are same.
> {code}
> if (compareFs(remoteFs, jtFs)) {
>   return originalPath;
> }
> {code}
> The following code at 
> [JobSubmitter.java|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobSubmitter.java#L219]
> try to add the destination file to DistributedCache which introduce a bug for 
> local job.
> {code}
> Path newPath = copyRemoteFiles(libjarsDir, tmp, conf, replication);
> DistributedCache.addFileToClassPath(
> new Path(newPath.toUri().getPath()), conf);
> {code}
> Because new Path(newPath.toUri().getPath()) will lose the filesystem 
> information from newPath, the file added to DistributedCache will use the 
> default Uri filesystem hdfs based on the following code. This causes the 
>  FileNotFoundException when we access the file later at 
>  
> [determineTimestampsAndCacheVisibilities|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobSubmitter.java#L270]
> {code}
>   public static void addFileToClassPath(Path file, Configuration conf)
> throws IOException {
> addFileToClassPath(file, conf, file.getFileSystem(conf));
>   }
>   public static void addFileToClassPath
>(Path file, Configuration conf, FileSystem fs)
> throws IOException {
> String classpath = conf.get(MRJobConfig.CLASSPATH_FILES);
> conf.set(MRJobConfig.CLASSPATH_FILES, classpath == null ? file.toString()
>  : classpath + "," + file.toString());
> URI uri = fs.makeQualified(file).toUri();
> addCacheFile(uri, conf);
>   }
> {code}
> Compare to the following [MR1 
> code|https://github.com/apache/hadoop/blob/branch-1/src/mapred/org/apache/hadoop/mapred/JobClient.java#L811]:
> {code}
> Path newPath = copyRemoteFiles(fs, libjarsDir, tmp, job, replication);
> DistributedCache.addFileToClassPath(
>   new Path(newPath.toUri().getPath()), job, fs);
> {code}
> You will see why MR1 doesn't have this issue.
> because it passes the local filesystem into  
> DistributedCache#addFileToClassPath instead of using the default Uri 
> filesystem hdfs.
> 2.
> Another incompatible change in MR2 is in 
> [LocalDistributedCacheManager#setup|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapred/LocalDistributedCacheManager.java#L113]
> {code}
> // Find which resources are to be put on the local classpath
> Map classpaths = new HashMap();
> Path[] archiveClassPaths = DistributedCache.getArchiveClassPaths(conf);
> if (archiveClassPaths != null) {
>   for (Path p : archiveClassPaths) {
> FileSystem remoteFS = p.getFileSystem(conf);
> p =

[jira] [Updated] (MAPREDUCE-6238) MR2 can't run local jobs with -libjars command options which is a regression from MR1

2015-09-02 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6238:
---
Fix Version/s: 2.6.1

> MR2 can't run local jobs with -libjars command options which is a regression 
> from MR1
> -
>
> Key: MAPREDUCE-6238
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6238
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
>  Labels: 2.6.1-candidate
> Fix For: 2.6.1, 2.7.1
>
> Attachments: MAPREDUCE-6238.000.patch
>
>
> MR2 can't run local jobs with -libjars command options which is a regression 
> from MR1. 
> When run MR2 job with -jt local and -libjars, the job fails with 
> java.io.FileNotFoundException: File does not exist: 
> hdfs://XXX.jar.
> But the same command is working in MR1.
> I find the problem is
> 1.
> because when MR2 run local job using  LocalJobRunner
> from JobSubmitter, the JobSubmitter#jtFs is local filesystem,
> So copyRemoteFiles will return from [the middle of the 
> function|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobSubmitter.java#L138]
> because source and destination file system are same.
> {code}
> if (compareFs(remoteFs, jtFs)) {
>   return originalPath;
> }
> {code}
> The following code at 
> [JobSubmitter.java|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobSubmitter.java#L219]
> try to add the destination file to DistributedCache which introduce a bug for 
> local job.
> {code}
> Path newPath = copyRemoteFiles(libjarsDir, tmp, conf, replication);
> DistributedCache.addFileToClassPath(
> new Path(newPath.toUri().getPath()), conf);
> {code}
> Because new Path(newPath.toUri().getPath()) will lose the filesystem 
> information from newPath, the file added to DistributedCache will use the 
> default Uri filesystem hdfs based on the following code. This causes the 
>  FileNotFoundException when we access the file later at 
>  
> [determineTimestampsAndCacheVisibilities|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobSubmitter.java#L270]
> {code}
>   public static void addFileToClassPath(Path file, Configuration conf)
> throws IOException {
> addFileToClassPath(file, conf, file.getFileSystem(conf));
>   }
>   public static void addFileToClassPath
>(Path file, Configuration conf, FileSystem fs)
> throws IOException {
> String classpath = conf.get(MRJobConfig.CLASSPATH_FILES);
> conf.set(MRJobConfig.CLASSPATH_FILES, classpath == null ? file.toString()
>  : classpath + "," + file.toString());
> URI uri = fs.makeQualified(file).toUri();
> addCacheFile(uri, conf);
>   }
> {code}
> Compare to the following [MR1 
> code|https://github.com/apache/hadoop/blob/branch-1/src/mapred/org/apache/hadoop/mapred/JobClient.java#L811]:
> {code}
> Path newPath = copyRemoteFiles(fs, libjarsDir, tmp, job, replication);
> DistributedCache.addFileToClassPath(
>   new Path(newPath.toUri().getPath()), job, fs);
> {code}
> You will see why MR1 doesn't have this issue.
> because it passes the local filesystem into  
> DistributedCache#addFileToClassPath instead of using the default Uri 
> filesystem hdfs.
> 2.
> Another incompatible change in MR2 is in 
> [LocalDistributedCacheManager#setup|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapred/LocalDistributedCacheManager.java#L113]
> {code}
> // Find which resources are to be put on the local classpath
> Map classpaths = new HashMap();
> Path[] archiveClassPaths = DistributedCache.getArchiveClassPaths(conf);
> if (archiveClassPaths != null) {
>   for (Path p : archiveClassPaths) {
> FileSystem remoteFS = p.getFileSystem(conf);
> p = remoteFS.resolvePath(p.makeQualified(remoteFS.getUri(),
> remoteFS.getWorkingDirectory()));
> classpaths.put(p.toUri().getPath().toString(), p);
>   }
> }
> Path[] fileClassPaths = DistributedCache.getFileClassPaths(conf);
> if (fileClassPaths != null) {
>   for (Path p : fileClassPaths) {
> FileSystem remoteFS = p.getFileSystem(conf);
> p =

[jira] [Updated] (MAPREDUCE-6238) MR2 can't run local jobs with -libjars command options which is a regression from MR1

2015-09-02 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6238:
---

Added this to 2.6.1 as a dependency for MAPREDUCE-6267. Patch applied cleanly.

Pulled this into 2.6.1. Ran compilation and TestLocalJobSubmission before the 
push.

> MR2 can't run local jobs with -libjars command options which is a regression 
> from MR1
> -
>
> Key: MAPREDUCE-6238
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6238
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Reporter: zhihai xu
>Assignee: zhihai xu
>Priority: Critical
>  Labels: 2.6.1-candidate
> Fix For: 2.7.1
>
> Attachments: MAPREDUCE-6238.000.patch
>
>
> MR2 can't run local jobs with -libjars command options which is a regression 
> from MR1. 
> When run MR2 job with -jt local and -libjars, the job fails with 
> java.io.FileNotFoundException: File does not exist: 
> hdfs://XXX.jar.
> But the same command is working in MR1.
> I find the problem is
> 1.
> because when MR2 run local job using  LocalJobRunner
> from JobSubmitter, the JobSubmitter#jtFs is local filesystem,
> So copyRemoteFiles will return from [the middle of the 
> function|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobSubmitter.java#L138]
> because source and destination file system are same.
> {code}
> if (compareFs(remoteFs, jtFs)) {
>   return originalPath;
> }
> {code}
> The following code at 
> [JobSubmitter.java|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobSubmitter.java#L219]
> try to add the destination file to DistributedCache which introduce a bug for 
> local job.
> {code}
> Path newPath = copyRemoteFiles(libjarsDir, tmp, conf, replication);
> DistributedCache.addFileToClassPath(
> new Path(newPath.toUri().getPath()), conf);
> {code}
> Because new Path(newPath.toUri().getPath()) will lose the filesystem 
> information from newPath, the file added to DistributedCache will use the 
> default Uri filesystem hdfs based on the following code. This causes the 
>  FileNotFoundException when we access the file later at 
>  
> [determineTimestampsAndCacheVisibilities|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobSubmitter.java#L270]
> {code}
>   public static void addFileToClassPath(Path file, Configuration conf)
> throws IOException {
> addFileToClassPath(file, conf, file.getFileSystem(conf));
>   }
>   public static void addFileToClassPath
>(Path file, Configuration conf, FileSystem fs)
> throws IOException {
> String classpath = conf.get(MRJobConfig.CLASSPATH_FILES);
> conf.set(MRJobConfig.CLASSPATH_FILES, classpath == null ? file.toString()
>  : classpath + "," + file.toString());
> URI uri = fs.makeQualified(file).toUri();
> addCacheFile(uri, conf);
>   }
> {code}
> Compare to the following [MR1 
> code|https://github.com/apache/hadoop/blob/branch-1/src/mapred/org/apache/hadoop/mapred/JobClient.java#L811]:
> {code}
> Path newPath = copyRemoteFiles(fs, libjarsDir, tmp, job, replication);
> DistributedCache.addFileToClassPath(
>   new Path(newPath.toUri().getPath()), job, fs);
> {code}
> You will see why MR1 doesn't have this issue.
> because it passes the local filesystem into  
> DistributedCache#addFileToClassPath instead of using the default Uri 
> filesystem hdfs.
> 2.
> Another incompatible change in MR2 is in 
> [LocalDistributedCacheManager#setup|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapred/LocalDistributedCacheManager.java#L113]
> {code}
> // Find which resources are to be put on the local classpath
> Map classpaths = new HashMap();
> Path[] archiveClassPaths = DistributedCache.getArchiveClassPaths(conf);
> if (archiveClassPaths != null) {
>   for (Path p : archiveClassPaths) {
> FileSystem remoteFS = p.getFileSystem(conf);
> p = remoteFS.resolvePath(p.makeQualified(remoteFS.getUri(),
> remoteFS.getWorkingDirectory()));
> classpaths.put(p.toUri().getPath().toString(), p);
>   }
> }
> Path[] fileClassPaths = DistributedCache.getFileClassPaths(conf);
> if (fileClassPaths != null) {
>

[jira] [Updated] (MAPREDUCE-6324) Uber jobs fail to update AMRM token when it rolls over

2015-09-02 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6324:
---
Fix Version/s: 2.6.1

Pulled this into 2.6.1. Ran compilation and TestLocalContainerAllocator before 
the push. Patch applied cleanly.

> Uber jobs fail to update AMRM token when it rolls over
> --
>
> Key: MAPREDUCE-6324
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6324
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mr-am
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Blocker
>  Labels: 2.6.1-candidate
> Fix For: 2.6.1, 2.7.1
>
> Attachments: MAPREDUCE-6324.001.patch, MAPREDUCE-6324.002.patch
>
>
> When the RM rolls a new AMRM master key the AMs are supposed to receive a new 
> AMRM token on subsequent heartbeats between the time when the new key is 
> rolled and when it is activated.  This is not occurring for uber jobs.  If 
> the connection to the RM needs to be re-established after the new key is 
> activated (e.g.: RM restart or network hiccup) then the uber job AM will be 
> unable to reconnect to the RM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-5649) Reduce cannot use more than 2G memory for the final merge

2015-09-02 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-5649:
---
Fix Version/s: 2.6.1

Pulled this into 2.6.1. Ran compilation and TestMergeManager before the push. 
Patch applied cleanly.

> Reduce cannot use more than 2G memory  for the final merge
> --
>
> Key: MAPREDUCE-5649
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5649
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Reporter: stanley shi
>Assignee: Gera Shegalov
>  Labels: 2.6.1-candidate, 2.7.2-candidate
> Fix For: 2.6.1, 2.8.0
>
> Attachments: MAPREDUCE-5649.001.patch, MAPREDUCE-5649.002.patch, 
> MAPREDUCE-5649.003.patch
>
>
> In the org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.java file, in 
> the finalMerge method: 
>  int maxInMemReduce = (int)Math.min(
> Runtime.getRuntime().maxMemory() * maxRedPer, Integer.MAX_VALUE);
>  
> This means no matter how much memory user has, reducer will not retain more 
> than 2G data in memory before the reduce phase starts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6303) Read timeout when retrying a fetch error can be fatal to a reducer

2015-09-01 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6303:
---
Fix Version/s: 2.6.1

Pulled this into 2.6.1. Ran compilation and TestFetcher before the push. Patch 
applied cleanly.



> Read timeout when retrying a fetch error can be fatal to a reducer
> --
>
> Key: MAPREDUCE-6303
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6303
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Blocker
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0, 2.6.1
>
> Attachments: MAPREDUCE-6303.001.patch
>
>
> If a reducer encounters an error trying to fetch from a node then encounters 
> a read timeout when trying to re-establish the connection then the reducer 
> can fail.  The read timeout exception can leak to the top of the Fetcher 
> thread which will cause the reduce task to teardown.  This type of error can 
> repeat across reducer attempts causing jobs to fail due to a single bad node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6300) Task list sort by task id broken

2015-09-01 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6300:
---
Target Version/s: 2.7.1, 2.8.0  (was: 2.8.0, 2.7.1)
   Fix Version/s: 2.6.1

Pulled this into 2.6.1. Ran compilation before the push. Patch applied cleanly.


> Task list sort by task id broken
> 
>
> Key: MAPREDUCE-6300
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6300
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: Siqi Li
>Assignee: Siqi Li
>Priority: Minor
>  Labels: 2.6.1-candidate
> Fix For: 2.6.1, 2.8.0, 2.7.1
>
> Attachments: MAPREDUCE-6300.v1.patch, MAPREDUCE-6300.v2.patch, 
> MAPREDUCE-6300.v3.patch, MAPREDUCE-6300.v4.patch, MAPREDUCE-6300.v5.patch, 
> screenshot-1.png, sorting app ID in fair scheduler page.png, sorting app 
> ID.png, sorting app attempt ID.png, sorting by app ID in AHS.png, sorting 
> task ID.png
>
>
> If you go to the MapReduce ApplicationMaster or HistoryServer UI and open the 
> list of tasks, then try to sort by the task name/id, it does nothing.
> Note that if you go to the task attempts, that seem to sort fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6230) MR AM does not survive RM restart if RM activated a new AMRM secret key

2015-08-31 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6230:
---
Fix Version/s: 2.6.1

Pulled this into 2.6.1. Ran compilation and TestRMContainerAllocator before the 
push. Patch applied cleanly.

> MR AM does not survive RM restart if RM activated a new AMRM secret key
> ---
>
> Key: MAPREDUCE-6230
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6230
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mr-am
>Reporter: Jason Lowe
>Assignee: Jason Lowe
>Priority: Blocker
>  Labels: 2.6.1-candidate
> Fix For: 2.7.0, 2.6.1
>
> Attachments: MAPREDUCE-6230.001.patch
>
>
> A MapReduce AM will fail to reconnect to an RM that performed restart in the 
> following scenario:
> # MapReduce job launched with AMRM token generated from AMRM secret X
> # RM rolls new AMRM secret Y and activates the new key
> # RM performs a work-preserving restart
> # MapReduce job AM now unable to connect to RM with "Invalid AMRMToken" 
> exception



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6166) Reducers do not validate checksum of map outputs when fetching directly to disk

2015-08-30 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6166:
---
Target Version/s: 2.7.0, 3.0.0  (was: 3.0.0, 2.7.0)
   Fix Version/s: 2.6.1

Pulled this into 2.6.1. Ran compilation and TestFetcher before the push. Patch 
applied cleanly.

 Reducers do not validate checksum of map outputs when fetching directly to 
 disk
 ---

 Key: MAPREDUCE-6166
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6166
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.6.0
Reporter: Eric Payne
Assignee: Eric Payne
  Labels: 2.6.1-candidate
 Fix For: 2.7.0, 2.6.1

 Attachments: MAPREDUCE-6166.v1.201411221941.txt, 
 MAPREDUCE-6166.v2.201411251627.txt, MAPREDUCE-6166.v3.txt, 
 MAPREDUCE-6166.v4.txt, MAPREDUCE-6166.v5.txt


 In very large map/reduce jobs (5 maps, 2500 reducers), the intermediate 
 map partition output gets corrupted on disk on the map side. If this 
 corrupted map output is too large to shuffle in memory, the reducer streams 
 it to disk without validating the checksum. In jobs this large, it could take 
 hours before the reducer finally tries to read the corrupted file and fails. 
 Since retries of the failed reduce attempt will also take hours, this delay 
 in discovering the failure is multiplied greatly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6454) MapReduce doesn't set the HADOOP_CLASSPATH for jar lib in distributed cache.

2015-08-21 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707305#comment-14707305
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-6454:


bq. Just so it's on record so when someone hits this problem: this is fragile 
and subject to breakage, regardless of the version of hadoop in play. It all 
depends upon how users have HADOOP_CLASSPATH configured in hadoop-env.sh and 
yarn-env.sh.
It is a bit fragile, for sure, but it doesn't by default depend on what is 
configured in *-env.sh like you said. This is because HADOOP_CLASSPATH is not 
part of the default white-listed environment that goes from YARN to the apps.

 MapReduce doesn't set the HADOOP_CLASSPATH for jar lib in distributed cache.
 

 Key: MAPREDUCE-6454
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6454
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Junping Du
Assignee: Junping Du
Priority: Critical
 Fix For: 2.7.2, 2.6.2

 Attachments: MAPREDUCE-6454-v2.1.patch, MAPREDUCE-6454-v2.patch, 
 MAPREDUCE-6454-v3.1.patch, MAPREDUCE-6454-v3.patch, MAPREDUCE-6454.patch


 We already set lib jars on distributed-cache to CLASSPATH. However, in some 
 corner cases (like: MR local mode, Hive Map side local join, etc.), we need 
 these jars on HADOOP_CLASSPATH so hadoop scripts can take it in launching 
 runjar process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6454) MapReduce doesn't set the HADOOP_CLASSPATH for jar lib in distributed cache.

2015-08-20 Thread Vinod Kumar Vavilapalli (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-6454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14706005#comment-14706005
]

Vinod Kumar Vavilapalli commented on MAPREDUCE-6454:

bq. If the original problem is that the bin commands don't pick up this up,
this fix makes the assumption that HADOOP_CLASSPATH allows inheritance from the
parent shell. The default examples in trunk specifically don't do this to
prevent the double settings problem that plagues prior releases.
The original problem is that the commands don't pick up CLASSPATH set by users
on the shell - Owen tells me offline that this went long back.

bq. In fact, since HADOOP_CLASSPATH is really intended for users to use, why
are we overloading it? For trunk at least, it would probably be better to have
a different var that is handled via mapreduce's shellprofile.d bit.
The specific scenario here user is spawning bin/hadoop commands within his/her
tasks - so this blurs the lines between interactive vs non-interactive usecases
- and the user is expecting distributed-cache to be accessed in the spawned
shells.

I am going to get this in only into branch-2 and branch-2.7/2.6. We will have
to think more about the right-approach for trunk. Will open a separate ticket
for this.

bq. -1 whitespace 0m 1s The patch has 1 line(s) that end in whitespace.
Use git apply --whitespace=fix.
Will do jenkins. Ty.

MapReduce doesn't set the HADOOP_CLASSPATH for jar lib in distributed cache.

Key: MAPREDUCE-6454
URL: https://issues.apache.org/jira/browse/MAPREDUCE-6454
Project: Hadoop Map/Reduce
Issue Type: Bug
Reporter: Junping Du
Assignee: Junping Du
Priority: Critical
Attachments: MAPREDUCE-6454-v2.1.patch, MAPREDUCE-6454-v2.patch,
MAPREDUCE-6454-v3.1.patch, MAPREDUCE-6454-v3.patch, MAPREDUCE-6454.patch

We already set lib jars on distributed-cache to CLASSPATH. However, in some
corner cases (like: MR local mode, Hive Map side local join, etc.), we need
these jars on HADOOP_CLASSPATH so hadoop scripts can take it in launching
runjar process.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6454) MapReduce doesn't set the HADOOP_CLASSPATH for jar lib in distributed cache.

2015-08-20 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6454:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.6.2
   2.7.2
   Status: Resolved  (was: Patch Available)

Committed this to trunk, branch-2, 2.7 and 2.6. Thanks Junping.

[~djp], can we please followup on the trunk changes? Thanks..

 MapReduce doesn't set the HADOOP_CLASSPATH for jar lib in distributed cache.
 

 Key: MAPREDUCE-6454
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6454
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Junping Du
Assignee: Junping Du
Priority: Critical
 Fix For: 2.7.2, 2.6.2

 Attachments: MAPREDUCE-6454-v2.1.patch, MAPREDUCE-6454-v2.patch, 
 MAPREDUCE-6454-v3.1.patch, MAPREDUCE-6454-v3.patch, MAPREDUCE-6454.patch


 We already set lib jars on distributed-cache to CLASSPATH. However, in some 
 corner cases (like: MR local mode, Hive Map side local join, etc.), we need 
 these jars on HADOOP_CLASSPATH so hadoop scripts can take it in launching 
 runjar process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6454) MapReduce doesn't set the HADOOP_CLASSPATH for jar lib in distributed cache.

2015-08-20 Thread Vinod Kumar Vavilapalli (JIRA)

[
https://issues.apache.org/jira/browse/MAPREDUCE-6454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705850#comment-14705850
]

Vinod Kumar Vavilapalli commented on MAPREDUCE-6454:

I debugged this offline with [~djp].

Some offline notes
- The original problem was tracked at MAPREDUCE-5490 for Hadoop 1, which
didn't go in. Essentially, things like hive running as part of Oozie actions do
not get the distributed-cache files in their classpath.
- In Hadoop 1, the MapReduce child wouldn't set CLASSPATH and HADOOP_CLASSPATH
to distributed-cache files.
- In Hadoop 2, the situation got better, CLASSPATH is set correctly to have
distributed-cache files. But HADOOP_CLASSPATH doesn't. Unfortunately, hadoop
scripts don't respect CLASSPATH that is set, so we have to explicitly set
HADOOP_CLASSPATH to also point to distributed cache files.

The solution for this is to have MapReduce set distributed-cache files in
HADOOP_CLASSPATH in addition to setting in CLASSPATH.

Junping Du, the patch looks good.
- We are not setting this for the AM, but that sounds okay.
- We are making sure that any HADOOP_CLASSPATH already set is inherited
instead of replaced - good!
- We are only put the distributed-cache entries, and skipping MR framework etc
- good again!

The patch looks good to me. Been reviewing this offline. Will check this in if
Jenkins says okay.

MapReduce doesn't set the HADOOP_CLASSPATH for jar lib in distributed cache.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MAPREDUCE-6454) MapReduce doesn't set the HADOOP_CLASSPATH for jar lib in distributed cache.

2015-08-20 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705894#comment-14705894
 ] 

Vinod Kumar Vavilapalli commented on MAPREDUCE-6454:


bq. What happens if the user has HADOOP_CLASSPATH set in hadoop-env.sh?
 - If the user has this on the submission node, that is not getting inherited - 
this is similar to what we do fro CLASSPATH.
 - If the admin sets it in the daemons, and also configures yarn to white-list 
those envs, we are inheriting them into the task. Again similar to CLASSPATH.

 MapReduce doesn't set the HADOOP_CLASSPATH for jar lib in distributed cache.
 

 Key: MAPREDUCE-6454
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6454
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Junping Du
Assignee: Junping Du
Priority: Critical
 Attachments: MAPREDUCE-6454-v2.1.patch, MAPREDUCE-6454-v2.patch, 
 MAPREDUCE-6454-v3.1.patch, MAPREDUCE-6454-v3.patch, MAPREDUCE-6454.patch


 We already set lib jars on distributed-cache to CLASSPATH. However, in some 
 corner cases (like: MR local mode, Hive Map side local join, etc.), we need 
 these jars on HADOOP_CLASSPATH so hadoop scripts can take it in launching 
 runjar process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6238) MR2 can't run local jobs with -libjars command options which is a regression from MR1

2015-08-03 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6238:
---
Labels: 2.6.1-candidate  (was: )

 MR2 can't run local jobs with -libjars command options which is a regression 
 from MR1
 -

 Key: MAPREDUCE-6238
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6238
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Reporter: zhihai xu
Assignee: zhihai xu
Priority: Critical
  Labels: 2.6.1-candidate
 Fix For: 2.7.1

 Attachments: MAPREDUCE-6238.000.patch


 MR2 can't run local jobs with -libjars command options which is a regression 
 from MR1. 
 When run MR2 job with -jt local and -libjars, the job fails with 
 java.io.FileNotFoundException: File does not exist: 
 hdfs://XXX.jar.
 But the same command is working in MR1.
 I find the problem is
 1.
 because when MR2 run local job using  LocalJobRunner
 from JobSubmitter, the JobSubmitter#jtFs is local filesystem,
 So copyRemoteFiles will return from [the middle of the 
 function|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobSubmitter.java#L138]
 because source and destination file system are same.
 {code}
 if (compareFs(remoteFs, jtFs)) {
   return originalPath;
 }
 {code}
 The following code at 
 [JobSubmitter.java|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobSubmitter.java#L219]
 try to add the destination file to DistributedCache which introduce a bug for 
 local job.
 {code}
 Path newPath = copyRemoteFiles(libjarsDir, tmp, conf, replication);
 DistributedCache.addFileToClassPath(
 new Path(newPath.toUri().getPath()), conf);
 {code}
 Because new Path(newPath.toUri().getPath()) will lose the filesystem 
 information from newPath, the file added to DistributedCache will use the 
 default Uri filesystem hdfs based on the following code. This causes the 
  FileNotFoundException when we access the file later at 
  
 [determineTimestampsAndCacheVisibilities|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/JobSubmitter.java#L270]
 {code}
   public static void addFileToClassPath(Path file, Configuration conf)
 throws IOException {
 addFileToClassPath(file, conf, file.getFileSystem(conf));
   }
   public static void addFileToClassPath
(Path file, Configuration conf, FileSystem fs)
 throws IOException {
 String classpath = conf.get(MRJobConfig.CLASSPATH_FILES);
 conf.set(MRJobConfig.CLASSPATH_FILES, classpath == null ? file.toString()
  : classpath + , + file.toString());
 URI uri = fs.makeQualified(file).toUri();
 addCacheFile(uri, conf);
   }
 {code}
 Compare to the following [MR1 
 code|https://github.com/apache/hadoop/blob/branch-1/src/mapred/org/apache/hadoop/mapred/JobClient.java#L811]:
 {code}
 Path newPath = copyRemoteFiles(fs, libjarsDir, tmp, job, replication);
 DistributedCache.addFileToClassPath(
   new Path(newPath.toUri().getPath()), job, fs);
 {code}
 You will see why MR1 doesn't have this issue.
 because it passes the local filesystem into  
 DistributedCache#addFileToClassPath instead of using the default Uri 
 filesystem hdfs.
 2.
 Another incompatible change in MR2 is in 
 [LocalDistributedCacheManager#setup|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapred/LocalDistributedCacheManager.java#L113]
 {code}
 // Find which resources are to be put on the local classpath
 MapString, Path classpaths = new HashMapString, Path();
 Path[] archiveClassPaths = DistributedCache.getArchiveClassPaths(conf);
 if (archiveClassPaths != null) {
   for (Path p : archiveClassPaths) {
 FileSystem remoteFS = p.getFileSystem(conf);
 p = remoteFS.resolvePath(p.makeQualified(remoteFS.getUri(),
 remoteFS.getWorkingDirectory()));
 classpaths.put(p.toUri().getPath().toString(), p);
   }
 }
 Path[] fileClassPaths = DistributedCache.getFileClassPaths(conf);
 if (fileClassPaths != null) {
   for (Path p : fileClassPaths) {
 FileSystem remoteFS = p.getFileSystem(conf);
 p = remoteFS.resolvePath(p.makeQualified(remoteFS.getUri(),
 remoteFS.getWorkingDirectory()));

[jira] [Updated] (MAPREDUCE-5649) Reduce cannot use more than 2G memory for the final merge

2015-08-03 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-5649:
---
Labels: 2.6.1-candidate 2.7.2-candidate  (was: )

 Reduce cannot use more than 2G memory  for the final merge
 --

 Key: MAPREDUCE-5649
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5649
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Reporter: stanley shi
Assignee: Gera Shegalov
  Labels: 2.6.1-candidate, 2.7.2-candidate
 Fix For: 2.8.0

 Attachments: MAPREDUCE-5649.001.patch, MAPREDUCE-5649.002.patch, 
 MAPREDUCE-5649.003.patch


 In the org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.java file, in 
 the finalMerge method: 
  int maxInMemReduce = (int)Math.min(
 Runtime.getRuntime().maxMemory() * maxRedPer, Integer.MAX_VALUE);
  
 This means no matter how much memory user has, reducer will not retain more 
 than 2G data in memory before the reduce phase starts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6300) Task list sort by task id broken

2015-08-03 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6300:
---
Labels: 2.6.1-candidate  (was: )

 Task list sort by task id broken
 

 Key: MAPREDUCE-6300
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6300
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Siqi Li
Assignee: Siqi Li
Priority: Minor
  Labels: 2.6.1-candidate
 Fix For: 2.8.0, 2.7.1

 Attachments: MAPREDUCE-6300.v1.patch, MAPREDUCE-6300.v2.patch, 
 MAPREDUCE-6300.v3.patch, MAPREDUCE-6300.v4.patch, MAPREDUCE-6300.v5.patch, 
 screenshot-1.png, sorting app ID in fair scheduler page.png, sorting app 
 ID.png, sorting app attempt ID.png, sorting by app ID in AHS.png, sorting 
 task ID.png


 If you go to the MapReduce ApplicationMaster or HistoryServer UI and open the 
 list of tasks, then try to sort by the task name/id, it does nothing.
 Note that if you go to the task attempts, that seem to sort fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6166) Reducers do not validate checksum of map outputs when fetching directly to disk

2015-08-03 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6166:
---
Labels: 2.6.1-candidate  (was: )

 Reducers do not validate checksum of map outputs when fetching directly to 
 disk
 ---

 Key: MAPREDUCE-6166
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6166
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.6.0
Reporter: Eric Payne
Assignee: Eric Payne
  Labels: 2.6.1-candidate
 Fix For: 2.7.0

 Attachments: MAPREDUCE-6166.v1.201411221941.txt, 
 MAPREDUCE-6166.v2.201411251627.txt, MAPREDUCE-6166.v3.txt, 
 MAPREDUCE-6166.v4.txt, MAPREDUCE-6166.v5.txt


 In very large map/reduce jobs (5 maps, 2500 reducers), the intermediate 
 map partition output gets corrupted on disk on the map side. If this 
 corrupted map output is too large to shuffle in memory, the reducer streams 
 it to disk without validating the checksum. In jobs this large, it could take 
 hours before the reducer finally tries to read the corrupted file and fails. 
 Since retries of the failed reduce attempt will also take hours, this delay 
 in discovering the failure is multiplied greatly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6324) Uber jobs fail to update AMRM token when it rolls over

2015-07-22 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6324:
---
Labels: 2.6.1-candidate  (was: )

 Uber jobs fail to update AMRM token when it rolls over
 --

 Key: MAPREDUCE-6324
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6324
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
  Labels: 2.6.1-candidate
 Fix For: 2.7.1

 Attachments: MAPREDUCE-6324.001.patch, MAPREDUCE-6324.002.patch


 When the RM rolls a new AMRM master key the AMs are supposed to receive a new 
 AMRM token on subsequent heartbeats between the time when the new key is 
 rolled and when it is activated.  This is not occurring for uber jobs.  If 
 the connection to the RM needs to be re-established after the new key is 
 activated (e.g.: RM restart or network hiccup) then the uber job AM will be 
 unable to reconnect to the RM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6230) MR AM does not survive RM restart if RM activated a new AMRM secret key

2015-07-17 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6230:
---
Labels: 2.6.1-candidate  (was: )

 MR AM does not survive RM restart if RM activated a new AMRM secret key
 ---

 Key: MAPREDUCE-6230
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6230
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mr-am
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
  Labels: 2.6.1-candidate
 Fix For: 2.7.0

 Attachments: MAPREDUCE-6230.001.patch


 A MapReduce AM will fail to reconnect to an RM that performed restart in the 
 following scenario:
 # MapReduce job launched with AMRM token generated from AMRM secret X
 # RM rolls new AMRM secret Y and activates the new key
 # RM performs a work-preserving restart
 # MapReduce job AM now unable to connect to RM with Invalid AMRMToken 
 exception



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6361) NPE issue in shuffle caused by concurrent issue between copySucceeded() in one thread and copyFailed() in another thread on the same host

2015-07-17 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6361:
---
Labels: 2.6.1-candidate  (was: )

 NPE issue in shuffle caused by concurrent issue between copySucceeded() in 
 one thread and copyFailed() in another thread on the same host
 -

 Key: MAPREDUCE-6361
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6361
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Junping Du
Assignee: Junping Du
Priority: Critical
  Labels: 2.6.1-candidate
 Fix For: 2.7.1

 Attachments: MAPREDUCE-6361-v1.patch


 The failure in log:
 2015-05-08 21:00:00,513 WARN [main] org.apache.hadoop.mapred.YarnChild: 
 Exception running child : 
 org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
 shuffle in fetcher#25
  at 
 org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:415)
  at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
 Caused by: java.lang.NullPointerException
  at 
 org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:267)
  at 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:308)
  at 
 org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6303) Read timeout when retrying a fetch error can be fatal to a reducer

2015-07-15 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6303:
---
Labels: 2.6.1-candidate  (was: )

 Read timeout when retrying a fetch error can be fatal to a reducer
 --

 Key: MAPREDUCE-6303
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6303
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Jason Lowe
Assignee: Jason Lowe
Priority: Blocker
  Labels: 2.6.1-candidate
 Fix For: 2.7.0

 Attachments: MAPREDUCE-6303.001.patch


 If a reducer encounters an error trying to fetch from a node then encounters 
 a read timeout when trying to re-establish the connection then the reducer 
 can fail.  The read timeout exception can leak to the top of the Fetcher 
 thread which will cause the reduce task to teardown.  This type of error can 
 repeat across reducer attempts causing jobs to fail due to a single bad node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (MAPREDUCE-6410) Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings in secure cluster

2015-06-23 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6410:
---
   Resolution: Fixed
Fix Version/s: 2.7.1
   Status: Resolved  (was: Patch Available)

Committed this to trunk, branch-2 and branch-2.7. Thanks Varun!

 Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings 
 in secure cluster
 --

 Key: MAPREDUCE-6410
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6410
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.7.0
 Environment: mrV2, secure mode
Reporter: Zhang Wei
Assignee: Varun Saxena
Priority: Critical
  Labels: historyserver
 Fix For: 2.7.1

 Attachments: MAPREDUCE-6410.04.patch, MAPREDUCE-6410.05.patch, 
 YARN-3779.01.patch, YARN-3779.02.patch, YARN-3779.03.patch, 
 log_aggr_deletion_on_refresh_error.log, log_aggr_deletion_on_refresh_fix.log


 {{GSSException}} is thrown everytime log aggregation deletion is attempted 
 after executing bin/mapred hsadmin -refreshLogRetentionSettings in a secure 
 cluster.
 The problem can be reproduced by following steps:
 1. startup historyserver in secure cluster.
 2. Log deletion happens as per expectation. 
 3. execute {{mapred hsadmin -refreshLogRetentionSettings}} command to refresh 
 the configuration value.
 4. All the subsequent attempts of log deletion fail with {{GSSException}}
 Following exception can be found in historyserver's log if log deletion is 
 enabled. 
 {noformat}
 2015-06-04 14:14:40,070 | ERROR | Timer-3 | Error reading root log dir this 
 deletion attempt is being aborted | AggregatedLogDeletionService.java:127
 java.io.IOException: Failed on local exception: java.io.IOException: 
 javax.security.sasl.SaslException: GSS initiate failed [Caused by 
 GSSException: No valid credentials provided (Mechanism level: Failed to find 
 any Kerberos tgt)]; Host Details : local host is: vm-31/9.91.12.31; 
 destination host is: vm-33:25000; 
 at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
 at org.apache.hadoop.ipc.Client.call(Client.java:1414)
 at org.apache.hadoop.ipc.Client.call(Client.java:1363)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
 at com.sun.proxy.$Proxy9.getListing(Unknown Source)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:519)
 at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
 at com.sun.proxy.$Proxy10.getListing(Unknown Source)
 at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1767)
 at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1750)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:691)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:102)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:753)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:749)
 at 
 org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:749)
 at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogDeletionService$LogDeletionTask.run(AggregatedLogDeletionService.java:68)
 at java.util.TimerThread.mainLoop(Timer.java:555)
 at java.util.TimerThread.run(Timer.java:505)
 Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS 
 initiate failed [Caused by GSSException: No valid credentials provided 
 (Mechanism level: Failed to find any Kerberos tgt)]
 at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:677)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1641)
 at 
 org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:640)
 at

[jira] [Updated] (MAPREDUCE-6410) Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings in secure cluster

2015-06-22 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6410:
---
Status: Open  (was: Patch Available)

Looks good. Fixing the white-space issue myself..

 Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings 
 in secure cluster
 --

 Key: MAPREDUCE-6410
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6410
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.7.0
 Environment: mrV2, secure mode
Reporter: Zhang Wei
Assignee: Varun Saxena
Priority: Critical
  Labels: historyserver
 Attachments: MAPREDUCE-6410.04.patch, YARN-3779.01.patch, 
 YARN-3779.02.patch, YARN-3779.03.patch, 
 log_aggr_deletion_on_refresh_error.log, log_aggr_deletion_on_refresh_fix.log


 {{GSSException}} is thrown everytime log aggregation deletion is attempted 
 after executing bin/mapred hsadmin -refreshLogRetentionSettings in a secure 
 cluster.
 The problem can be reproduced by following steps:
 1. startup historyserver in secure cluster.
 2. Log deletion happens as per expectation. 
 3. execute {{mapred hsadmin -refreshLogRetentionSettings}} command to refresh 
 the configuration value.
 4. All the subsequent attempts of log deletion fail with {{GSSException}}
 Following exception can be found in historyserver's log if log deletion is 
 enabled. 
 {noformat}
 2015-06-04 14:14:40,070 | ERROR | Timer-3 | Error reading root log dir this 
 deletion attempt is being aborted | AggregatedLogDeletionService.java:127
 java.io.IOException: Failed on local exception: java.io.IOException: 
 javax.security.sasl.SaslException: GSS initiate failed [Caused by 
 GSSException: No valid credentials provided (Mechanism level: Failed to find 
 any Kerberos tgt)]; Host Details : local host is: vm-31/9.91.12.31; 
 destination host is: vm-33:25000; 
 at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
 at org.apache.hadoop.ipc.Client.call(Client.java:1414)
 at org.apache.hadoop.ipc.Client.call(Client.java:1363)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
 at com.sun.proxy.$Proxy9.getListing(Unknown Source)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:519)
 at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
 at com.sun.proxy.$Proxy10.getListing(Unknown Source)
 at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1767)
 at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1750)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:691)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:102)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:753)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:749)
 at 
 org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:749)
 at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogDeletionService$LogDeletionTask.run(AggregatedLogDeletionService.java:68)
 at java.util.TimerThread.mainLoop(Timer.java:555)
 at java.util.TimerThread.run(Timer.java:505)
 Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS 
 initiate failed [Caused by GSSException: No valid credentials provided 
 (Mechanism level: Failed to find any Kerberos tgt)]
 at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:677)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1641)
 at 
 org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:640)
 at 
 org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:724)
 at 
 org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367)
 at

[jira] [Updated] (MAPREDUCE-6410) Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings in secure cluster

2015-06-22 Thread Vinod Kumar Vavilapalli (JIRA)


 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated MAPREDUCE-6410:
---
Attachment: MAPREDUCE-6410.05.patch

 Aggregated Logs Deletion doesnt work after refreshing Log Retention Settings 
 in secure cluster
 --

 Key: MAPREDUCE-6410
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6410
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 2.7.0
 Environment: mrV2, secure mode
Reporter: Zhang Wei
Assignee: Varun Saxena
Priority: Critical
  Labels: historyserver
 Attachments: MAPREDUCE-6410.04.patch, MAPREDUCE-6410.05.patch, 
 YARN-3779.01.patch, YARN-3779.02.patch, YARN-3779.03.patch, 
 log_aggr_deletion_on_refresh_error.log, log_aggr_deletion_on_refresh_fix.log


 {{GSSException}} is thrown everytime log aggregation deletion is attempted 
 after executing bin/mapred hsadmin -refreshLogRetentionSettings in a secure 
 cluster.
 The problem can be reproduced by following steps:
 1. startup historyserver in secure cluster.
 2. Log deletion happens as per expectation. 
 3. execute {{mapred hsadmin -refreshLogRetentionSettings}} command to refresh 
 the configuration value.
 4. All the subsequent attempts of log deletion fail with {{GSSException}}
 Following exception can be found in historyserver's log if log deletion is 
 enabled. 
 {noformat}
 2015-06-04 14:14:40,070 | ERROR | Timer-3 | Error reading root log dir this 
 deletion attempt is being aborted | AggregatedLogDeletionService.java:127
 java.io.IOException: Failed on local exception: java.io.IOException: 
 javax.security.sasl.SaslException: GSS initiate failed [Caused by 
 GSSException: No valid credentials provided (Mechanism level: Failed to find 
 any Kerberos tgt)]; Host Details : local host is: vm-31/9.91.12.31; 
 destination host is: vm-33:25000; 
 at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
 at org.apache.hadoop.ipc.Client.call(Client.java:1414)
 at org.apache.hadoop.ipc.Client.call(Client.java:1363)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
 at com.sun.proxy.$Proxy9.getListing(Unknown Source)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:519)
 at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
 at com.sun.proxy.$Proxy10.getListing(Unknown Source)
 at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1767)
 at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1750)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:691)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:102)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:753)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:749)
 at 
 org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
 at 
 org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:749)
 at 
 org.apache.hadoop.yarn.logaggregation.AggregatedLogDeletionService$LogDeletionTask.run(AggregatedLogDeletionService.java:68)
 at java.util.TimerThread.mainLoop(Timer.java:555)
 at java.util.TimerThread.run(Timer.java:505)
 Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS 
 initiate failed [Caused by GSSException: No valid credentials provided 
 (Mechanism level: Failed to find any Kerberos tgt)]
 at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:677)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1641)
 at 
 org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:640)
 at 
 org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:724)
 at 
 org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367)
 at

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1315 matches

Mail list logo