[jira] [Commented] (YARN-1410) Handle client failover during 2 step client API's like app submission

2014-01-13 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13869932#comment-13869932 ] Bikas Saha commented on YARN-1410: -- Xuan, can you please verify Karthik's comment above an

[jira] [Commented] (YARN-1410) Handle client failover during 2 step client API's like app submission

2014-01-14 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13871751#comment-13871751 ] Bikas Saha commented on YARN-1410: -- Dont think I understood the failover policy wrt restar

[jira] [Commented] (YARN-1584) Support explicit failover when automatic failover is enabled

2014-01-14 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13871759#comment-13871759 ] Bikas Saha commented on YARN-1584: -- bq. Firstly, it requires manually checking the other

[jira] [Commented] (YARN-1584) Support explicit failover when automatic failover is enabled

2014-01-15 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13872514#comment-13872514 ] Bikas Saha commented on YARN-1584: -- And what use case does this solve other than make RM1

[jira] [Commented] (YARN-1602) All failed RMStateStore operations should not be RMFatalEvents

2014-01-16 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13873670#comment-13873670 ] Bikas Saha commented on YARN-1602: -- not all events are app related. some store secret key

[jira] [Commented] (YARN-1602) All failed RMStateStore operations should not be RMFatalEvents

2014-01-20 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13876808#comment-13876808 ] Bikas Saha commented on YARN-1602: -- If its a non-transient error then the RMs should go in

[jira] [Commented] (YARN-1618) ZKRMStateStore fails to handle updates to znodes not yet created

2014-01-20 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13877117#comment-13877117 ] Bikas Saha commented on YARN-1618: -- How can an app finish before it has been saved in the

[jira] [Commented] (YARN-1618) ZKRMStateStore fails to handle updates to znodes not yet created

2014-01-20 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13877214#comment-13877214 ] Bikas Saha commented on YARN-1618: -- What is the scenario? If the app is really going from

[jira] [Commented] (YARN-1618) ZKRMStateStore fails to handle updates to znodes not yet created

2014-01-20 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13877260#comment-13877260 ] Bikas Saha commented on YARN-1618: -- That is not the case. While the app is in NEW state YA

[jira] [Commented] (YARN-1618) Applications transition from NEW to FINAL_SAVING, and try to update non-existing entries in the state-store

2014-01-21 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13877681#comment-13877681 ] Bikas Saha commented on YARN-1618: -- Unless NEW->FINAL_SAVING upon failure, was added by a

[jira] [Commented] (YARN-1618) Applications transition from NEW to FINAL_SAVING, and try to update non-existing entries in the state-store

2014-01-23 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13880745#comment-13880745 ] Bikas Saha commented on YARN-1618: -- App goes from NEW->NEW_SAVING upon receiving START. It

[jira] [Commented] (YARN-1618) Applications transition from NEW to FINAL_SAVING, and try to update non-existing entries in the state-store

2014-01-24 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13880813#comment-13880813 ] Bikas Saha commented on YARN-1618: -- All we need to do is go from NEW->KILLED on KILL event

[jira] [Commented] (YARN-745) Move UnmanagedAMLauncher to yarn client package

2014-01-27 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883018#comment-13883018 ] Bikas Saha commented on YARN-745: - That was the original plan of action for the unmanaged AM

[jira] [Commented] (YARN-1618) Applications transition from NEW to FINAL_SAVING, and try to update non-existing entries in the state-store

2014-01-27 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883587#comment-13883587 ] Bikas Saha commented on YARN-1618: -- Is this related? Does not look like a compatible chang

[jira] [Updated] (YARN-1660) add the ability to set yarn.resourcemanager.hostname.rm-id instead of setting all the various host:port properties for RM

2014-01-28 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated YARN-1660: - Issue Type: Sub-task (was: Improvement) Parent: YARN-149 > add the ability to set yarn.resourcema

[jira] [Commented] (YARN-1660) add the ability to set yarn.resourcemanager.hostname.rm-id instead of setting all the various host:port properties for RM

2014-01-28 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884355#comment-13884355 ] Bikas Saha commented on YARN-1660: -- Please open HA jiras as sub-tasks of YARN-149. Thanks!

[jira] [Commented] (YARN-1660) add the ability to set yarn.resourcemanager.hostname.rm-id instead of setting all the various host:port properties for RM

2014-01-28 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884770#comment-13884770 ] Bikas Saha commented on YARN-1660: -- This already works in non-HA mode wherein the RM hostn

[jira] [Commented] (YARN-1660) add the ability to set yarn.resourcemanager.hostname.rm-id instead of setting all the various host:port properties for RM

2014-01-28 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884781#comment-13884781 ] Bikas Saha commented on YARN-1660: -- I mean in HA mode we read the rm-id and then set all a

[jira] [Commented] (YARN-1618) Applications transition from NEW to FINAL_SAVING, and try to update non-existing entries in the state-store

2014-01-28 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13884910#comment-13884910 ] Bikas Saha commented on YARN-1618: -- LGTM. Does this fix the original issue reported by tes

[jira] [Commented] (YARN-1618) Applications transition from NEW to FINAL_SAVING, and try to update non-existing entries in the state-store

2014-01-29 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13885576#comment-13885576 ] Bikas Saha commented on YARN-1618: -- yeah. lets do it in a separate jira for clarity. > Ap

[jira] [Commented] (YARN-1639) YARM RM HA requires different configs on different RM hosts

2014-02-03 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13889966#comment-13889966 ] Bikas Saha commented on YARN-1639: -- After this patch, if there are 4 RM's in HA setup, the

[jira] [Commented] (YARN-1660) add the ability to set yarn.resourcemanager.hostname.rm-id instead of setting all the various host:port properties for RM

2014-02-07 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13894787#comment-13894787 ] Bikas Saha commented on YARN-1660: -- Its great that we are making these ease of use changes

[jira] [Commented] (YARN-1490) RM should optionally not kill all containers when an ApplicationMaster exits

2014-02-08 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13895761#comment-13895761 ] Bikas Saha commented on YARN-1490: -- The option to kill container on AM exit should come fr

[jira] [Commented] (YARN-1490) RM should optionally not kill all containers when an ApplicationMaster exits

2014-02-08 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13895763#comment-13895763 ] Bikas Saha commented on YARN-1490: -- Also, no containers are allocated until the applicatio

[jira] [Created] (YARN-1722) AMRMProtocol should have a way of getting all the nodes in the cluster

2014-02-12 Thread Bikas Saha (JIRA)
Bikas Saha created YARN-1722: Summary: AMRMProtocol should have a way of getting all the nodes in the cluster Key: YARN-1722 URL: https://issues.apache.org/jira/browse/YARN-1722 Project: Hadoop YARN

[jira] [Updated] (YARN-1723) AMRMClientAsync missing blacklist addition and removal functionality

2014-02-12 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated YARN-1723: - Affects Version/s: 2.2.0 > AMRMClientAsync missing blacklist addition and removal functionality >

[jira] [Created] (YARN-1723) AMRMClientAsync missing blacklist addition and removal functionality

2014-02-12 Thread Bikas Saha (JIRA)
Bikas Saha created YARN-1723: Summary: AMRMClientAsync missing blacklist addition and removal functionality Key: YARN-1723 URL: https://issues.apache.org/jira/browse/YARN-1723 Project: Hadoop YARN

[jira] [Updated] (YARN-1723) AMRMClientAsync missing blacklist addition and removal functionality

2014-02-12 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated YARN-1723: - Fix Version/s: 2.4.0 > AMRMClientAsync missing blacklist addition and removal functionality >

[jira] [Created] (YARN-1725) RM should provide an easier way for the app to reject a bad allocation

2014-02-12 Thread Bikas Saha (JIRA)
Bikas Saha created YARN-1725: Summary: RM should provide an easier way for the app to reject a bad allocation Key: YARN-1725 URL: https://issues.apache.org/jira/browse/YARN-1725 Project: Hadoop YARN

[jira] [Commented] (YARN-1725) RM should provide an easier way for the app to reject a bad allocation

2014-02-13 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13900551#comment-13900551 ] Bikas Saha commented on YARN-1725: -- The only way to do it on AMRMClient would be to know w

[jira] [Commented] (YARN-1410) Handle client failover during 2 step client API's like app submission

2014-02-19 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13906703#comment-13906703 ] Bikas Saha commented on YARN-1410: -- What are the pros and cons of duplicate checking befor

[jira] [Commented] (YARN-1410) Handle client failover during 2 step client API's like app submission

2014-02-20 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907669#comment-13907669 ] Bikas Saha commented on YARN-1410: -- That is interesting because NameNode uses the same RPC

[jira] [Commented] (YARN-1410) Handle client failover during 2 step client API's like app submission

2014-02-20 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907694#comment-13907694 ] Bikas Saha commented on YARN-1410: -- I am repeatedly asking for this because its a problem

[jira] [Commented] (YARN-1410) Handle client failover during 2 step client API's like app submission

2014-02-21 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13908585#comment-13908585 ] Bikas Saha commented on YARN-1410: -- We are not going to save the retry cache anywhere. To

[jira] [Created] (YARN-1746) Both tez.runtime.intermediate-output.should-compress and tez.runtime.intermediate-input.is-compressed seem unnecessary

2014-02-21 Thread Bikas Saha (JIRA)
Bikas Saha created YARN-1746: Summary: Both tez.runtime.intermediate-output.should-compress and tez.runtime.intermediate-input.is-compressed seem unnecessary Key: YARN-1746 URL: https://issues.apache.org/jira/browse/Y

[jira] [Commented] (YARN-745) Move UnmanagedAMLauncher to yarn client package

2014-02-22 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13909521#comment-13909521 ] Bikas Saha commented on YARN-745: - Sorry for late response. Please go ahead in case you have

[jira] [Commented] (YARN-1410) Handle client failover during 2 step client API's like app submission

2014-02-24 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13910788#comment-13910788 ] Bikas Saha commented on YARN-1410: -- Yes. I would like to understand why we are proposing a

[jira] [Commented] (YARN-1410) Handle client failover during 2 step client API's like app submission

2014-02-24 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13910898#comment-13910898 ] Bikas Saha commented on YARN-1410: -- There is considerable confusion here. I havent seen th

[jira] [Commented] (YARN-1410) Handle client failover during 2 step client API's like app submission

2014-02-24 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13911156#comment-13911156 ] Bikas Saha commented on YARN-1410: -- Sounds good. Lets track 2) on a separate new jira. Xu

[jira] [Commented] (YARN-745) Move UnmanagedAMLauncher to yarn client package

2014-02-24 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13911159#comment-13911159 ] Bikas Saha commented on YARN-745: - Would this be backwards incompatible? [~vinodkv] [~kkamba

[jira] [Commented] (YARN-1410) Handle client failover during 2 step client API's like app submission

2014-02-25 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13911842#comment-13911842 ] Bikas Saha commented on YARN-1410: -- bq. getApplicationReport() is called, we will get an

[jira] [Commented] (YARN-1410) Handle client failover during 2 step client API's like app submission

2014-02-26 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13913612#comment-13913612 ] Bikas Saha commented on YARN-1410: -- Can we do this in the RM? Then if we are going to retr

[jira] [Commented] (YARN-1410) Handle RM fails over after getApplicationID() and before submitApplication().

2014-02-27 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13914801#comment-13914801 ] Bikas Saha commented on YARN-1410: -- Looks good overall. In testAppSubmissionWithoutApplic

[jira] [Created] (YARN-1773) ShuffleHeader should have a format that can inform about errors

2014-02-28 Thread Bikas Saha (JIRA)
Bikas Saha created YARN-1773: Summary: ShuffleHeader should have a format that can inform about errors Key: YARN-1773 URL: https://issues.apache.org/jira/browse/YARN-1773 Project: Hadoop YARN Is

[jira] [Commented] (YARN-1793) yarn application -kill doesn't kill UnmanagedAMs

2014-03-07 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13924763#comment-13924763 ] Bikas Saha commented on YARN-1793: -- isAppStateStored() -> isAppFinalStateStored() ? > yar

[jira] [Commented] (YARN-556) RM Restart phase 2 - Work preserving restart

2014-03-24 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13945437#comment-13945437 ] Bikas Saha commented on YARN-556: - Please align with the design doc while prototyping. If th

[jira] [Commented] (YARN-1521) Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation

2014-03-26 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13948184#comment-13948184 ] Bikas Saha commented on YARN-1521: -- How are the following idempotent? Looks like submitApp

[jira] [Commented] (YARN-1521) Mark appropriate protocol methods with the idempotent annotation or AtMostOnce annotation

2014-03-26 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13948457#comment-13948457 ] Bikas Saha commented on YARN-1521: -- OK. So we are requiring the moveAppAcrossQueues API to

[jira] [Commented] (YARN-1815) RM doesn't recover unmanaged AMs into its memory after restart

2014-03-26 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13948810#comment-13948810 ] Bikas Saha commented on YARN-1815: -- I am not up to date with the latest state of the code.

[jira] [Commented] (YARN-808) ApplicationReport does not clearly tell that the attempt is running or not

2014-03-31 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13955438#comment-13955438 ] Bikas Saha commented on YARN-808: - We should atleast change the app report response to inclu

[jira] [Commented] (YARN-1815) RM doesn't recover unmanaged AMs into its memory after restart

2014-04-03 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13959077#comment-13959077 ] Bikas Saha commented on YARN-1815: -- bq. The right fix should be to record the UnmanagedAM

[jira] [Commented] (YARN-1914) Test TestFSDownload.testDownloadPublicWithStatCache fails on Windows

2014-04-08 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13963710#comment-13963710 ] Bikas Saha commented on YARN-1914: -- [~cnauroth] [~ivanmi] Was this specific issue not alre

[jira] [Updated] (YARN-1773) ShuffleHeader should have a format that can inform about errors

2014-04-11 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated YARN-1773: - Description: Currently, the ShuffleHeader (which is a Writable) simply tries to read the successful heade

[jira] [Commented] (YARN-435) Make it easier to access cluster topology information in an AM

2014-04-14 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13968558#comment-13968558 ] Bikas Saha commented on YARN-435: - Pasting the description from YARN-1722 that was closed as

[jira] [Commented] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.

2014-04-16 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13972034#comment-13972034 ] Bikas Saha commented on YARN-1506: -- I could, though [~jianhe] has touched the RM the most

[jira] [Commented] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.

2014-04-16 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13972062#comment-13972062 ] Bikas Saha commented on YARN-1506: -- Does a node update forced by the admin need to be pers

[jira] [Commented] (YARN-1506) Replace set resource change on RMNode/SchedulerNode directly with event notification.

2014-04-21 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13975920#comment-13975920 ] Bikas Saha commented on YARN-1506: -- This sleep is too long for a test. Same for other slee

[jira] [Commented] (YARN-2001) Persist NMs info for RM restart

2014-04-30 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13985709#comment-13985709 ] Bikas Saha commented on YARN-2001: -- Requiring all NM's to re-register might to too constra

[jira] [Commented] (YARN-2006) Estimate Job Endtime

2014-04-30 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13985715#comment-13985715 ] Bikas Saha commented on YARN-2006: -- This needs to be in MAPREDUCE project. > Estimate Job

[jira] [Commented] (YARN-2019) Retrospect on decision of making RM crashed if any exception throw in ZKRMStateStore

2014-05-04 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13989059#comment-13989059 ] Bikas Saha commented on YARN-2019: -- That was the initial code since there was no HA states

[jira] [Commented] (YARN-2001) Persist NMs info for RM restart

2014-05-05 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13990220#comment-13990220 ] Bikas Saha commented on YARN-2001: -- What if users want to have multiple standbys for fault

[jira] [Commented] (YARN-2027) YARN ignores host-specific resource requests

2014-05-12 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995262#comment-13995262 ] Bikas Saha commented on YARN-2027: -- Was the relaxLocality flag set to false in order to ma

[jira] [Commented] (YARN-2001) Threshold for RM to accept requests from AM after failover

2014-05-12 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995316#comment-13995316 ] Bikas Saha commented on YARN-2001: -- I think the offline discussion agreement was that ther

[jira] [Commented] (YARN-1372) Ensure all completed containers are reported to the AMs across RM restart

2014-05-12 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995516#comment-13995516 ] Bikas Saha commented on YARN-1372: -- what happens when they are 100's of long jobs running

[jira] [Commented] (YARN-2039) Better reporting of finished containers to AMs

2014-05-12 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995241#comment-13995241 ] Bikas Saha commented on YARN-2039: -- Dupe of YARN-1372? > Better reporting of finished con

[jira] [Commented] (YARN-556) RM Restart phase 2 - Work preserving restart

2014-05-12 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995878#comment-13995878 ] Bikas Saha commented on YARN-556: - Folks please take the discussion for container id to its

[jira] [Commented] (YARN-556) RM Restart phase 2 - Work preserving restart

2014-05-12 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995829#comment-13995829 ] Bikas Saha commented on YARN-556: - bq. After the configurable wait-time, the RM starts accep

[jira] [Commented] (YARN-2052) ContainerId creation after work preserving restart is broken

2014-05-13 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13996675#comment-13996675 ] Bikas Saha commented on YARN-2052: -- The RM identifier is effectively the epoch for the RM.

[jira] [Updated] (YARN-2052) Container ID format and clustertimestamp for Work preserving restart

2014-05-13 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated YARN-2052: - Description: (was: We've been discussing whether container id format is changed to include cluster tim

[jira] [Created] (YARN-2047) RM should honor NM heartbeat expiry after RM restart

2014-05-13 Thread Bikas Saha (JIRA)
Bikas Saha created YARN-2047: Summary: RM should honor NM heartbeat expiry after RM restart Key: YARN-2047 URL: https://issues.apache.org/jira/browse/YARN-2047 Project: Hadoop YARN Issue Type: Su

[jira] [Updated] (YARN-2052) ContainerId creation after work preserving restart is broken

2014-05-13 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated YARN-2052: - Summary: ContainerId creation after work preserving restart is broken (was: Container ID format and clust

[jira] [Commented] (YARN-1366) ApplicationMasterService should Resync with the AM upon allocate call after restart

2014-05-14 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997722#comment-13997722 ] Bikas Saha commented on YARN-1366: -- Is there any value in combining the re-register and re

[jira] [Updated] (YARN-2052) ContainerId creation after work preserving restart is broken

2014-05-15 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikas Saha updated YARN-2052: - Description: Container ids are made unique by using the app identifier and appending a monotonically incre

[jira] [Commented] (YARN-2027) YARN ignores host-specific resource requests

2014-05-16 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998912#comment-13998912 ] Bikas Saha commented on YARN-2027: -- Yes. If strict node locality is needed then the rack s

[jira] [Commented] (YARN-1366) ApplicationMasterService should Resync with the AM upon allocate call after restart

2014-05-16 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000284#comment-14000284 ] Bikas Saha commented on YARN-1366: -- bq. Seems like we are going with no resync api for now

[jira] [Commented] (YARN-1366) ApplicationMasterService should Resync with the AM upon allocate call after restart

2014-05-19 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14002059#comment-14002059 ] Bikas Saha commented on YARN-1366: -- It would be easier for users if the RM would simply ac

[jira] [Commented] (YARN-1366) ApplicationMasterService should Resync with the AM upon allocate call after restart

2014-05-19 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14002393#comment-14002393 ] Bikas Saha commented on YARN-1366: -- bq. Seems we have a race that allocate call gets the r

[jira] [Commented] (YARN-1366) ApplicationMasterService should Resync with the AM upon allocate call after restart

2014-05-19 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14002437#comment-14002437 ] Bikas Saha commented on YARN-1366: -- Then what happens when there are 2 versions of the AM

[jira] [Commented] (YARN-1366) ApplicationMasterService should Resync with the AM upon allocate call after restart

2014-05-20 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14003678#comment-14003678 ] Bikas Saha commented on YARN-1366: -- bq.If there's no RM restart, a normal app only calling

[jira] [Commented] (YARN-1366) ApplicationMasterService should Resync with the AM upon allocate call after restart

2014-05-21 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004927#comment-14004927 ] Bikas Saha commented on YARN-1366: -- I mean what will go wrong is we allow unregister witho

[jira] [Created] (YARN-2091) Add ContainerExitStatus.KILL_EXCEECED_MEMORY and pass it to app masters

2014-05-21 Thread Bikas Saha (JIRA)
Bikas Saha created YARN-2091: Summary: Add ContainerExitStatus.KILL_EXCEECED_MEMORY and pass it to app masters Key: YARN-2091 URL: https://issues.apache.org/jira/browse/YARN-2091 Project: Hadoop YARN

[jira] [Commented] (YARN-796) Allow for (admin) labels on nodes and resource-requests

2014-05-23 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14007866#comment-14007866 ] Bikas Saha commented on YARN-796: - Thanks [~john.jian.fang] An interesting use case from you

[jira] [Commented] (YARN-2091) Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters

2014-05-27 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010585#comment-14010585 ] Bikas Saha commented on YARN-2091: -- Thats the missing pieces AFAIK. That exit reason needs

[jira] [Commented] (YARN-2091) Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters

2014-05-27 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010630#comment-14010630 ] Bikas Saha commented on YARN-2091: -- We are on the same page. The kill reason is directly a

[jira] [Commented] (YARN-2091) Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters

2014-05-29 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14012980#comment-14012980 ] Bikas Saha commented on YARN-2091: -- Instead of having the following if-else code everywher

[jira] [Commented] (YARN-2091) Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters

2014-05-29 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14013071#comment-14013071 ] Bikas Saha commented on YARN-2091: -- That would make sense if YARN would allow specifying p

[jira] [Commented] (YARN-2091) Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters

2014-05-30 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14014079#comment-14014079 ] Bikas Saha commented on YARN-2091: -- Why is "isAMAware" needed. All values in ContainerExit

[jira] [Commented] (YARN-2091) Add ContainerExitStatus.KILL_EXCEEDED_MEMORY and pass it to app masters

2014-05-30 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14014284#comment-14014284 ] Bikas Saha commented on YARN-2091: -- We can check all cases of ContainerKillEvent and add n

[jira] [Created] (YARN-1121) RMStateStore should flush all pending store events before closing

2013-08-28 Thread Bikas Saha (JIRA)
Bikas Saha created YARN-1121: Summary: RMStateStore should flush all pending store events before closing Key: YARN-1121 URL: https://issues.apache.org/jira/browse/YARN-1121 Project: Hadoop YARN

[jira] [Commented] (YARN-1098) Separate out stateless services from stateful services in the RM

2013-08-28 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13753368#comment-13753368 ] Bikas Saha commented on YARN-1098: -- Overall approach looks good. We probably need a disti

[jira] [Commented] (YARN-707) Add user info in the YARN ClientToken

2013-08-28 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13753378#comment-13753378 ] Bikas Saha commented on YARN-707: - please feel free to define this where it makes most sense

[jira] [Commented] (YARN-771) AMRMClient support for resource blacklisting

2013-08-29 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13753390#comment-13753390 ] Bikas Saha commented on YARN-771: - What if we add node to blacklist and them remove that nod

[jira] [Commented] (YARN-771) AMRMClient support for resource blacklisting

2013-08-29 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13753806#comment-13753806 ] Bikas Saha commented on YARN-771: - The RM must handle anything input that the user sends to

[jira] [Commented] (YARN-1098) Separate out stateless services from stateful services in the RM

2013-08-29 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13754102#comment-13754102 ] Bikas Saha commented on YARN-1098: -- Are there services that need to run in Standby mode bu

[jira] [Commented] (YARN-771) AMRMClient support for resource blacklisting

2013-08-29 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13754397#comment-13754397 ] Bikas Saha commented on YARN-771: - The cases listed above are possible. However, IMO, an API

[jira] [Commented] (YARN-1065) NM should provide AuxillaryService data to the container

2013-08-30 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13755088#comment-13755088 ] Bikas Saha commented on YARN-1065: -- There seems to be some misunderstanding here. The NM a

[jira] [Commented] (YARN-1127) reservation exchange and excess reservation is not working for capacity scheduler

2013-08-30 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13755090#comment-13755090 ] Bikas Saha commented on YARN-1127: -- Isnt this similar to a jira opened by you already? The

[jira] [Commented] (YARN-1127) reservation exchange and excess reservation is not working for capacity scheduler

2013-08-30 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13755095#comment-13755095 ] Bikas Saha commented on YARN-1127: -- How is this different from YARN-957 >

[jira] [Commented] (YARN-1127) reservation exchange and excess reservation is not working for capacity scheduler

2013-08-30 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13755110#comment-13755110 ] Bikas Saha commented on YARN-1127: -- Then please clarify this in the description or comment

[jira] [Commented] (YARN-1065) NM should provide AuxillaryService data to the container

2013-08-30 Thread Bikas Saha (JIRA)
[ https://issues.apache.org/jira/browse/YARN-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13755334#comment-13755334 ] Bikas Saha commented on YARN-1065: -- We need the service data provided by the Auxillary Ser

<    5   6   7   8   9   10   11   12   13   14   >