[jira] [Updated] (YARN-2409) InvalidStateTransitonException in ResourceManager after job recovery

2014-08-13 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-2409: - Attachment: YARN-2409.patch > InvalidStateTransitonException in ResourceManager after job recovery > -

[jira] [Updated] (YARN-2409) InvalidStateTransitonException in ResourceManager after job recovery

2014-08-13 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-2409: - Attachment: (was: YARN-2409.patch) > InvalidStateTransitonException in ResourceManager after job recovery > --

[jira] [Updated] (YARN-2409) InvalidStateTransitonException in ResourceManager after job recovery

2014-08-13 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-2409: - Attachment: YARN-2409.patch Attached the patch. Please review.. I have verified patch for 1. Thread Leak : Switch

[jira] [Commented] (YARN-2409) InvalidStateTransitonException in ResourceManager after job recovery

2014-08-13 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14095466#comment-14095466 ] Rohith commented on YARN-2409: -- I looked into issue (got logs from [~nishan] offline), there i

[jira] [Assigned] (YARN-2409) InvalidStateTransitonException in ResourceManager after job recovery

2014-08-13 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith reassigned YARN-2409: Assignee: Rohith > InvalidStateTransitonException in ResourceManager after job recovery > --

[jira] [Commented] (YARN-2209) Replace AM resync/shutdown command with corresponding exceptions

2014-07-28 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14075973#comment-14075973 ] Rohith commented on YARN-2209: -- +1 patch looks good to me > Replace AM resync/shutdown comman

[jira] [Commented] (YARN-2209) Replace allocate#resync command with ApplicationMasterNotRegisteredException to indicate AM to re-register on RM restart

2014-07-27 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14075877#comment-14075877 ] Rohith commented on YARN-2209: -- Thanks Jian He for updating patch. It looks good overall to me

[jira] [Commented] (YARN-2209) Replace allocate#resync command with ApplicationMasterNotRegisteredException to indicate AM to re-register on RM restart

2014-07-25 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14074366#comment-14074366 ] Rohith commented on YARN-2209: -- Hi [~jianhe], I reviewed patch and found some comments 1. Mis

[jira] [Commented] (YARN-2350) TestApplicationMasterServiceOnHA fails with InvalidToken exception

2014-07-24 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14073973#comment-14073973 ] Rohith commented on YARN-2350: -- This issue is because of YARN-2208 check in. As a wholse solut

[jira] [Assigned] (YARN-2349) InvalidStateTransitonException after RM switch

2014-07-24 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith reassigned YARN-2349: Assignee: Rohith > InvalidStateTransitonException after RM switch >

[jira] [Commented] (YARN-2349) InvalidStateTransitonException after RM switch

2014-07-24 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14073158#comment-14073158 ] Rohith commented on YARN-2349: -- This is basically configurations in capacity-scheduler.xml of

[jira] [Commented] (YARN-1779) Handle AMRMTokens across RM failover

2014-07-23 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071772#comment-14071772 ] Rohith commented on YARN-1779: -- This is critical issue for work preserving restart feature. AM

[jira] [Updated] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down

2014-07-08 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-1366: - Attachment: YARN-1366.13.patch Thanks Jian He and Bikas Saha for reviewing patch. I updated patch with changes,ple

[jira] [Updated] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down

2014-07-02 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-1366: - Attachment: YARN-1366.12.patch Updating patch with below changes. 1. Making second allocate() call on resync comm

[jira] [Commented] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down

2014-07-02 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051013#comment-14051013 ] Rohith commented on YARN-1366: -- bq. can you add some documentation about this Shall I add in

[jira] [Updated] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down

2014-07-01 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-1366: - Attachment: YARN-1366.11.patch Updated patch fix findbug warning. > AM should implement Resync with the Applicati

[jira] [Updated] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down

2014-07-01 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-1366: - Attachment: YARN-1366.10.patch I updated the patch with addressing comments. Please review.. > AM should impleme

[jira] [Commented] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down

2014-07-01 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14049563#comment-14049563 ] Rohith commented on YARN-1366: -- bq. These two synchronized block can be merged into one ? Thi

[jira] [Updated] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down

2014-06-30 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-1366: - Attachment: YARN-1366.9.patch Update patch for test cases correction > AM should implement Resync with the Applic

[jira] [Updated] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down

2014-06-30 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-1366: - Attachment: (was: YARN-1366.9.patch) > AM should implement Resync with the ApplicationMasterService instead of

[jira] [Updated] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down

2014-06-30 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-1366: - Attachment: YARN-1366.9.patch Thank for reviewing patch.. I updated patch as per comments. Please review update p

[jira] [Updated] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down

2014-06-30 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-1366: - Attachment: YARN-1366.8.patch I updated the patch addressing comments. bq. isApplicationMasterRegistered is actua

[jira] [Commented] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down

2014-06-29 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14047412#comment-14047412 ] Rohith commented on YARN-1366: -- Thank you for reviewing patch. I will update patch soon. One

[jira] [Updated] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down

2014-06-28 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-1366: - Attachment: YARN-1366.7.patch > AM should implement Resync with the ApplicationMasterService instead of > shuttin

[jira] [Updated] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down

2014-06-28 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-1366: - Attachment: YARN-1366.6.patch Attached updated the patch. Please review the patch > AM should implement Resync wi

[jira] [Commented] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down

2014-06-28 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046826#comment-14046826 ] Rohith commented on YARN-1366: -- Looking into fix findbug warning and test case. Will update pa

[jira] [Updated] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down

2014-06-27 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-1366: - Attachment: YARN-1366.5.patch I updated the patch for following incremental change. 1. Reregister for AmRMClient i

[jira] [Commented] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down

2014-06-24 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14043059#comment-14043059 ] Rohith commented on YARN-1366: -- Thank you [~jianhe] for looking into patch :-) In current pat

[jira] [Updated] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down

2014-05-30 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-1366: - Attachment: YARN-1366.4.patch Attached updated patch that address Anubhav's all comments. This patch contains onl

[jira] [Commented] (YARN-1366) AM should implement Resync with the ApplicationMasterService instead of shutting down

2014-05-30 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14013607#comment-14013607 ] Rohith commented on YARN-1366: -- Let this jira keep only for Yarn Client. I created MAPREDUCE-5

[jira] [Commented] (YARN-1366) ApplicationMasterService should Resync with the AM upon allocate call after restart

2014-05-29 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14013256#comment-14013256 ] Rohith commented on YARN-1366: -- Hi [~vinodkv], I agree that both issues title looks simil

[jira] [Commented] (YARN-1365) ApplicationMasterService to allow Register and Unregister of an app that was running before restart

2014-05-27 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010721#comment-14010721 ] Rohith commented on YARN-1365: -- bq. The option is see is we pass in a flag to AppAttemptAddedS

[jira] [Updated] (YARN-1366) ApplicationMasterService should Resync with the AM upon allocate call after restart

2014-05-23 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-1366: - Attachment: YARN-1366.3.patch I updated patch with below changes. bq. Pending releases - AM forgets about a re

[jira] [Commented] (YARN-1365) ApplicationMasterService to allow Register and Unregister of an app that was running before restart

2014-05-23 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14007024#comment-14007024 ] Rohith commented on YARN-1365: -- Hi Anubhav, One comment on the patch. * Notifying to s

[jira] [Commented] (YARN-1366) ApplicationMasterService should Resync with the AM upon allocate call after restart

2014-05-22 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005863#comment-14005863 ] Rohith commented on YARN-1366: -- bq. I mean what will go wrong is we allow unregister without r

[jira] [Commented] (YARN-2094) how to enable job counters for mapreduce or applications

2014-05-21 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14005575#comment-14005575 ] Rohith commented on YARN-2094: -- Hi Nikhil, Welcome to Hadoop community. bq. When I

[jira] [Commented] (YARN-1366) ApplicationMasterService should Resync with the AM upon allocate call after restart

2014-05-20 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004279#comment-14004279 ] Rohith commented on YARN-1366: -- bq. Catching incorrect unregistration before registration shou

[jira] [Commented] (YARN-1366) ApplicationMasterService should Resync with the AM upon allocate call after restart

2014-05-19 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14002873#comment-14002873 ] Rohith commented on YARN-1366: -- Adding to above point, enfource AMRMClient to handle unregistr

[jira] [Commented] (YARN-1366) ApplicationMasterService should Resync with the AM upon allocate call after restart

2014-05-19 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14002858#comment-14002858 ] Rohith commented on YARN-1366: -- bq. If there's no RM restart, a normal app only calling unregi

[jira] [Commented] (YARN-1366) ApplicationMasterService should Resync with the AM upon allocate call after restart

2014-05-19 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14002753#comment-14002753 ] Rohith commented on YARN-1366: -- Overall patch would contain MR and Yarn. 1. MapReduce change f

[jira] [Commented] (YARN-1366) ApplicationMasterService should Resync with the AM upon allocate call after restart

2014-05-19 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14002752#comment-14002752 ] Rohith commented on YARN-1366: -- bq. Rohith let me know if you mind if we add these as well to

[jira] [Commented] (YARN-1366) ApplicationMasterService should Resync with the AM upon allocate call after restart

2014-05-18 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001385#comment-14001385 ] Rohith commented on YARN-1366: -- Thats good point to discuss and take descision should RESYNC t

[jira] [Updated] (YARN-1366) ApplicationMasterService should Resync with the AM upon allocate call after restart

2014-05-14 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-1366: - Attachment: YARN-1366.2.patch Synched up offline with Anubhav for doubts mentioned in previous comment. I made

[jira] [Updated] (YARN-1366) ApplicationMasterService should Resync with the AM upon allocate call after restart

2014-05-13 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-1366: - Attachment: YARN-1366.1.patch I updated the patch for follwing changes in AMRMClient(MapReduce is not considered

[jira] [Commented] (YARN-1366) ApplicationMasterService should Resync with the AM upon allocate call after restart

2014-05-12 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13994920#comment-13994920 ] Rohith commented on YARN-1366: -- Thank you for offering! It was just wait to finsih prototype b

[jira] [Assigned] (YARN-1366) ApplicationMasterService should Resync with the AM upon allocate call after restart

2014-05-12 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith reassigned YARN-1366: Assignee: Rohith > ApplicationMasterService should Resync with the AM upon allocate call after > restart >

[jira] [Commented] (YARN-2010) RM can't transition to active if it can't recover an app attempt

2014-05-05 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13990221#comment-13990221 ] Rohith commented on YARN-2010: -- Thank you [~kasha] for reviewing patch. I update the patch for

[jira] [Updated] (YARN-2010) RM can't transition to active if it can't recover an app attempt

2014-05-02 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-2010: - Attachment: YARN-2010.patch Uploading patch without test written. Thinking of how to write test, should complete

[jira] [Assigned] (YARN-2010) RM can't transition to active if it can't recover an app attempt

2014-05-02 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith reassigned YARN-2010: Assignee: Rohith > RM can't transition to active if it can't recover an app attempt > --

[jira] [Commented] (YARN-2010) RM can't transition to active if it can't recover an app attempt

2014-05-02 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987590#comment-13987590 ] Rohith commented on YARN-2010: -- For completed applications before starting in secured mode, cl

[jira] [Commented] (YARN-1963) Support priorities across applications within the same queue

2014-05-02 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987553#comment-13987553 ] Rohith commented on YARN-1963: -- Added to Sunil thoughts, priority of jobs can also be displaye

[jira] [Commented] (YARN-1934) Potential NPE in ZKRMStateStore caused by handling Disconnected event from ZK.

2014-04-14 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969225#comment-13969225 ] Rohith commented on YARN-1934: -- +1 patch looks good to me :-) > Potential NPE in ZKRMStateSto

[jira] [Commented] (YARN-1861) Both RM stuck in standby mode when automatic failover is enabled

2014-04-14 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13968211#comment-13968211 ] Rohith commented on YARN-1861: -- Oops, I too encounterd with both RM is standy by state forever

[jira] [Commented] (YARN-1934) Potential NPE in ZKRMStateStore caused by handling Disconnected event from ZK.

2014-04-12 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13967446#comment-13967446 ] Rohith commented on YARN-1934: -- Lets say zkClient session connected to server X. Killing/Stopp

[jira] [Commented] (YARN-1934) Potential NPE in ZKRMStateStore caused by handling Disconnected event from ZK.

2014-04-12 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13967437#comment-13967437 ] Rohith commented on YARN-1934: -- Call flow is attached file issue is 1. Disconnected event whi

[jira] [Updated] (YARN-1934) Potential NPE in ZKRMStateStore caused by handling Disconnected event from ZK.

2014-04-12 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-1934: - Attachment: RM.txt Attached log file. > Potential NPE in ZKRMStateStore caused by handling Disconnected event fro

[jira] [Commented] (YARN-1924) STATE_STORE_OP_FAILED happens when ZKRMStateStore tries to update app(attempt) before storing it

2014-04-12 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13967432#comment-13967432 ] Rohith commented on YARN-1924: -- I raised new Jira YARN-1934 for NPE. > STATE_STORE_OP_FAILED

[jira] [Created] (YARN-1934) Potential NPE in ZKRMStateStore caused by handling Disconnected event from ZK.

2014-04-12 Thread Rohith (JIRA)
Rohith created YARN-1934: Summary: Potential NPE in ZKRMStateStore caused by handling Disconnected event from ZK. Key: YARN-1934 URL: https://issues.apache.org/jira/browse/YARN-1934 Project: Hadoop YARN

[jira] [Commented] (YARN-1924) STATE_STORE_OP_FAILED happens when ZKRMStateStore tries to update app(attempt) before storing it

2014-04-11 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13967396#comment-13967396 ] Rohith commented on YARN-1924: -- [~jianhe], sorry that I cleared off the envrionment last night

[jira] [Commented] (YARN-1929) DeadLock in RM when automatic failover is enabled.

2014-04-11 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13966540#comment-13966540 ] Rohith commented on YARN-1929: -- Current deadlock is involved between *EmbeddedElectorService*

[jira] [Commented] (YARN-1929) DeadLock in RM when automatic failover is enabled.

2014-04-11 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13966538#comment-13966538 ] Rohith commented on YARN-1929: -- Complete stack trace {noformat} Found one Java-level deadlock:

[jira] [Created] (YARN-1929) DeadLock in RM when automatic failover is enabled.

2014-04-11 Thread Rohith (JIRA)
Rohith created YARN-1929: Summary: DeadLock in RM when automatic failover is enabled. Key: YARN-1929 URL: https://issues.apache.org/jira/browse/YARN-1929 Project: Hadoop YARN Issue Type: Bug

[jira] [Commented] (YARN-1924) STATE_STORE_OP_FAILED happens when ZKRMStateStore tries to update app(attempt) before storing it

2014-04-11 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13966354#comment-13966354 ] Rohith commented on YARN-1924: -- Hi, I applied this patch and testing. I found below NPE. ZK cl

[jira] [Commented] (YARN-1890) Too many unnecessary logs are logged while accessing applicationMaster web UI.

2014-04-01 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13957340#comment-13957340 ] Rohith commented on YARN-1890: -- Since WebAppProxyServlet.doGet() is called for every fetch dat

[jira] [Updated] (YARN-1890) Too many unnecessary logs are logged while accessing applicationMaster web UI.

2014-04-01 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-1890: - Attachment: YARN-1890.patch Simple patch to clean excessing loggin on every refresh. Log priority is moved to DEB

[jira] [Commented] (YARN-1890) Too many unnecessary logs are logged while accessing applicationMaster web UI.

2014-03-28 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13950566#comment-13950566 ] Rohith commented on YARN-1890: -- Should log priority change to DEBUG ? I do not understand why

[jira] [Commented] (YARN-1890) Too many unnecessary logs are logged while accessing applicationMaster web UI.

2014-03-28 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13950564#comment-13950564 ] Rohith commented on YARN-1890: -- Below logs are logging at one shot on refresh. {noformat} 201

[jira] [Created] (YARN-1890) Too many unnecessary logs are logged while accessing applicationMaster web UI.

2014-03-28 Thread Rohith (JIRA)
Rohith created YARN-1890: Summary: Too many unnecessary logs are logged while accessing applicationMaster web UI. Key: YARN-1890 URL: https://issues.apache.org/jira/browse/YARN-1890 Project: Hadoop YARN

[jira] [Updated] (YARN-1703) Too many connections are opened for proxy server when applicationMaster UI is accessed.

2014-03-28 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-1703: - Attachment: YARN-1703.2.patch Updated patch rebasing to latest code. I verified after this change, it is wroking.

[jira] [Commented] (YARN-1703) Too many connections are opened for proxy server when applicationMaster UI is accessed.

2014-03-28 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13950552#comment-13950552 ] Rohith commented on YARN-1703: -- I am attaching connection established for accesing 1 applicati

[jira] [Updated] (YARN-1703) Too many connections are opened for proxy server when applicationMaster UI is accessed.

2014-03-28 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-1703: - Priority: Critical (was: Major) > Too many connections are opened for proxy server when applicationMaster UI is

[jira] [Updated] (YARN-1703) Too many connections are opened for proxy server when applicationMaster UI is accessed.

2014-03-28 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-1703: - Summary: Too many connections are opened for proxy server when applicationMaster UI is accessed. (was: There many

[jira] [Commented] (YARN-1885) yarn logs command does not provide the application logs for some applications

2014-03-27 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13950342#comment-13950342 ] Rohith commented on YARN-1885: -- [~arpitgupta] can you please describe more on the issue. 1. Wh

[jira] [Commented] (YARN-1854) Race condition in TestRMHA#testStartAndTransitions

2014-03-25 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13947510#comment-13947510 ] Rohith commented on YARN-1854: -- bq. Rohith : The logs that I have submitted already has the 5s

[jira] [Updated] (YARN-1854) Race condition in TestRMHA#testStartAndTransitions

2014-03-25 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-1854: - Description: There is race in test. TestRMHA#testStartAndTransitions calls verifyClusterMetrics() immediately afte

[jira] [Commented] (YARN-1854) TestRMHA#testStartAndTransitions Fails

2014-03-25 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13946487#comment-13946487 ] Rohith commented on YARN-1854: -- [~mitdesai], I checked attached logs for while. It is very str

[jira] [Updated] (YARN-1852) Application recovery throws InvalidStateTransitonException for FAILED and KILLED jobs

2014-03-24 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-1852: - Attachment: YARN-1852.3.patch > Application recovery throws InvalidStateTransitonException for FAILED and > KILLE

[jira] [Updated] (YARN-1852) Application recovery throws InvalidStateTransitonException for FAILED and KILLED jobs

2014-03-24 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-1852: - Attachment: (was: YARN-1852.3patch) > Application recovery throws InvalidStateTransitonException for FAILED an

[jira] [Updated] (YARN-1852) Application recovery throws InvalidStateTransitonException for FAILED and KILLED jobs

2014-03-24 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-1852: - Attachment: YARN-1852.3patch bq. We may check against RMApp.recoveredFinalState state instead? Done Test is writt

[jira] [Updated] (YARN-1854) TestRMHA#testStartAndTransitions Fails

2014-03-21 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-1854: - Attachment: YARN-1854.1.patch Attaching patch. Please review.. I changed verifyClusterMetrics for retrying 5 tim

[jira] [Commented] (YARN-1854) TestRMHA#testStartAndTransitions Fails

2014-03-20 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942760#comment-13942760 ] Rohith commented on YARN-1854: -- Thank you [~vinodkv] for going through patch. I agree that Ka

[jira] [Commented] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected

2014-03-20 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942731#comment-13942731 ] Rohith commented on YARN-1198: -- bq. It's kind of related to "New node is added/removed from th

[jira] [Commented] (YARN-1198) Capacity Scheduler headroom calculation does not work as expected

2014-03-20 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941797#comment-13941797 ] Rohith commented on YARN-1198: -- Does this Jira handles scenario mentioned in YARN-1680 for he

[jira] [Updated] (YARN-1852) Application recovery throws InvalidStateTransitonException for FAILED and KILLED jobs

2014-03-20 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-1852: - Attachment: YARN-1852.patch +1 Jian, Attaching patch for handling KILLED/FAILED applications during recovery. I

[jira] [Commented] (YARN-1854) TestRMHA#testStartAndTransitions Fails

2014-03-19 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941462#comment-13941462 ] Rohith commented on YARN-1854: -- I ran multiple times in Linux and windows.I didnt fint any tes

[jira] [Updated] (YARN-1854) TestRMHA#testStartAndTransitions Fails

2014-03-19 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-1854: - Attachment: YARN-1854.patch > TestRMHA#testStartAndTransitions Fails > -- > >

[jira] [Commented] (YARN-1854) TestRMHA#testStartAndTransitions Fails

2014-03-19 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13941364#comment-13941364 ] Rohith commented on YARN-1854: -- I will look into Test Case Failure. > TestRMHA#testStartAndTr

[jira] [Assigned] (YARN-1854) TestRMHA#testStartAndTransitions Fails

2014-03-19 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith reassigned YARN-1854: Assignee: Rohith > TestRMHA#testStartAndTransitions Fails > -- > >

[jira] [Commented] (YARN-1852) Application recovery throws InvalidStateTransitonException for FAILED and KILLED jobs

2014-03-19 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13940394#comment-13940394 ] Rohith commented on YARN-1852: -- Here is the exception stack trace.. For Killed application st

[jira] [Created] (YARN-1852) Application recovery throws InvalidStateTransitonException for FAILED and KILLED jobs

2014-03-19 Thread Rohith (JIRA)
Rohith created YARN-1852: Summary: Application recovery throws InvalidStateTransitonException for FAILED and KILLED jobs Key: YARN-1852 URL: https://issues.apache.org/jira/browse/YARN-1852 Project: Hadoop YAR

[jira] [Commented] (YARN-1705) Cluster metrics are off after failover

2014-03-18 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13939010#comment-13939010 ] Rohith commented on YARN-1705: -- Attached patch for addressing comment. Please review. > Clust

[jira] [Updated] (YARN-1705) Cluster metrics are off after failover

2014-03-18 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-1705: - Attachment: YARN-1705.2.patch > Cluster metrics are off after failover > -- >

[jira] [Updated] (YARN-1206) Container logs link is broken on RM web UI after application finished

2014-03-17 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-1206: - Attachment: YARN-1206.1.patch I added comment in ContainerLogsUtils.getContainerLogDirs() as below. "It is not req

[jira] [Commented] (YARN-1206) Container logs link is broken on RM web UI after application finished

2014-03-16 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13937439#comment-13937439 ] Rohith commented on YARN-1206: -- Hi Jian, Thank you for looking into patch. There are 2 s

[jira] [Updated] (YARN-1705) Cluster metrics are off after failover

2014-03-13 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-1705: - Attachment: YARN-1705.1.patch Hi, I attached patch that handles 1. transtion Active->StandBy->A

[jira] [Commented] (YARN-1705) Cluster metrics are off after failover

2014-03-11 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13930309#comment-13930309 ] Rohith commented on YARN-1705: -- For understaing detail scope of Jira, 1. Currently , on R

[jira] [Assigned] (YARN-1705) Cluster metrics are off after failover

2014-03-10 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith reassigned YARN-1705: Assignee: Rohith (was: Karthik Kambatla) > Cluster metrics are off after failover > ---

[jira] [Commented] (YARN-1705) Cluster metrics are off after failover

2014-03-10 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13929921#comment-13929921 ] Rohith commented on YARN-1705: -- Thank you for offering :-) I will take up this Jira. > Cluste

[jira] [Commented] (YARN-1705) Cluster metrics are off after failover

2014-03-10 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13929905#comment-13929905 ] Rohith commented on YARN-1705: -- Hi Karthik, I started verifying RM HA in trunk. I got issue

[jira] [Updated] (YARN-1752) Unexpected Unregistered event at Attempt Launched state

2014-03-04 Thread Rohith (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohith updated YARN-1752: - Attachment: YARN-1752.5.patch bq. but just that there's still a typo in the code comment: tries to register more

<    4   5   6   7   8   9   10   >