[jira] [Commented] (YARN-3996) YARN-789 (Support for zero capabilities in fairscheduler) is broken after YARN-3305

2016-10-27 Thread Neelesh Srinivas Salian (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15613380#comment-15613380
 ] 

Neelesh Srinivas Salian commented on YARN-3996:
---

[~templedf], it's been a while since I have looked at this and I don't think I 
can continue it. Feel free to reassign since I don't have cycles to work on 
this, atm.

> YARN-789 (Support for zero capabilities in fairscheduler) is broken after 
> YARN-3305
> ---
>
> Key: YARN-3996
> URL: https://issues.apache.org/jira/browse/YARN-3996
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Neelesh Srinivas Salian
>Priority: Critical
>  Labels: oct16-medium
> Attachments: YARN-3996.001.patch, YARN-3996.002.patch, 
> YARN-3996.003.patch, YARN-3996.prelim.patch
>
>
> RMAppManager#validateAndCreateResourceRequest calls into normalizeRequest 
> with mininumResource for the incrementResource. This causes normalize to 
> return zero if minimum is set to zero as per YARN-789



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-2353) FairScheduler: Update demand asynchronously instead of in the Update Thread

2016-10-16 Thread Neelesh Srinivas Salian (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neelesh Srinivas Salian updated YARN-2353:
--
Assignee: (was: Neelesh Srinivas Salian)

> FairScheduler: Update demand asynchronously instead of in the Update Thread
> ---
>
> Key: YARN-2353
> URL: https://issues.apache.org/jira/browse/YARN-2353
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.4.1
>Reporter: Karthik Kambatla
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-3996) YARN-789 (Support for zero capabilities in fairscheduler) is broken after YARN-3305

2015-10-15 Thread Neelesh Srinivas Salian (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neelesh Srinivas Salian updated YARN-3996:
--
Attachment: YARN-3996.003.patch

Version 3 of the patch attached with a test in FairScheduler.

Requesting Review.

Thank you.

> YARN-789 (Support for zero capabilities in fairscheduler) is broken after 
> YARN-3305
> ---
>
> Key: YARN-3996
> URL: https://issues.apache.org/jira/browse/YARN-3996
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Neelesh Srinivas Salian
>Priority: Critical
> Attachments: YARN-3996.001.patch, YARN-3996.002.patch, 
> YARN-3996.003.patch, YARN-3996.prelim.patch
>
>
> RMAppManager#validateAndCreateResourceRequest calls into normalizeRequest 
> with mininumResource for the incrementResource. This causes normalize to 
> return zero if minimum is set to zero as per YARN-789



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2353) FairScheduler: Update demand asynchronously instead of in the Update Thread

2015-10-12 Thread Neelesh Srinivas Salian (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14953288#comment-14953288
 ] 

Neelesh Srinivas Salian commented on YARN-2353:
---

Thanks for assigning it to me [~ka...@cloudera.com]. I'll post my questions and 
logic on the JIRA.

> FairScheduler: Update demand asynchronously instead of in the Update Thread
> ---
>
> Key: YARN-2353
> URL: https://issues.apache.org/jira/browse/YARN-2353
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.4.1
>Reporter: Karthik Kambatla
>Assignee: Neelesh Srinivas Salian
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3996) YARN-789 (Support for zero capabilities in fairscheduler) is broken after YARN-3305

2015-10-08 Thread Neelesh Srinivas Salian (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neelesh Srinivas Salian updated YARN-3996:
--
Attachment: YARN-3996.002.patch

Resolved the issues with the Capacity, FIFO and SLS schedulers.

I am not sure how to approach the testing. Wrote a basic unit test for this at 
the moment.

Trying to think how to make it more robust.
Will update if I think of a sturdier approach.

In the meantime, requesting some feedback on version 002 of the patch.

Thank you.

> YARN-789 (Support for zero capabilities in fairscheduler) is broken after 
> YARN-3305
> ---
>
> Key: YARN-3996
> URL: https://issues.apache.org/jira/browse/YARN-3996
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Neelesh Srinivas Salian
>Priority: Critical
> Attachments: YARN-3996.001.patch, YARN-3996.002.patch, 
> YARN-3996.prelim.patch
>
>
> RMAppManager#validateAndCreateResourceRequest calls into normalizeRequest 
> with mininumResource for the incrementResource. This causes normalize to 
> return zero if minimum is set to zero as per YARN-789



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3996) YARN-789 (Support for zero capabilities in fairscheduler) is broken after YARN-3305

2015-10-07 Thread Neelesh Srinivas Salian (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14947142#comment-14947142
 ] 

Neelesh Srinivas Salian commented on YARN-3996:
---

I'll go back and fix those. I know why this happened; the added implementation 
of incrementAllocationCapability on Fifo and Capacity. Need to do this in a 
better way.

Will update soon.

Thank you.

> YARN-789 (Support for zero capabilities in fairscheduler) is broken after 
> YARN-3305
> ---
>
> Key: YARN-3996
> URL: https://issues.apache.org/jira/browse/YARN-3996
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Neelesh Srinivas Salian
>Priority: Critical
> Attachments: YARN-3996.001.patch, YARN-3996.prelim.patch
>
>
> RMAppManager#validateAndCreateResourceRequest calls into normalizeRequest 
> with mininumResource for the incrementResource. This causes normalize to 
> return zero if minimum is set to zero as per YARN-789



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3996) YARN-789 (Support for zero capabilities in fairscheduler) is broken after YARN-3305

2015-10-06 Thread Neelesh Srinivas Salian (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neelesh Srinivas Salian updated YARN-3996:
--
Attachment: YARN-3996.001.patch

Attaching Patch with testCase for FairScheduler in the AppManager.
Requesting Review

> YARN-789 (Support for zero capabilities in fairscheduler) is broken after 
> YARN-3305
> ---
>
> Key: YARN-3996
> URL: https://issues.apache.org/jira/browse/YARN-3996
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Neelesh Srinivas Salian
>Priority: Critical
> Attachments: YARN-3996.001.patch, YARN-3996.prelim.patch
>
>
> RMAppManager#validateAndCreateResourceRequest calls into normalizeRequest 
> with mininumResource for the incrementResource. This causes normalize to 
> return zero if minimum is set to zero as per YARN-789



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4185) Retry interval delay for NM client can be improved from the fixed static retry

2015-10-05 Thread Neelesh Srinivas Salian (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14944384#comment-14944384
 ] 

Neelesh Srinivas Salian commented on YARN-4185:
---

[~adhoot], thanks for the clarification.
So, the initial retries can be done with backoff times of 1,2,4,8 that is still 
less then 10 and thus give the opportunity to retry for a short-lived NM 
restart (under 10 seconds)
We can continue to wait 10 seconds of backoff incrementally to accomodate a 
larger failure time.

Thus, the failure times can be under 1,2,4,8,10,10 and so on till the number of 
retries is exhausted.
My only concern is that if the failure lasts longer than the total wait time 
and the number of retries, there won't be a chance to retry.

I'll write up a patch to exhibit this.
Thank you.

> Retry interval delay for NM client can be improved from the fixed static 
> retry 
> ---
>
> Key: YARN-4185
> URL: https://issues.apache.org/jira/browse/YARN-4185
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Anubhav Dhoot
>Assignee: Neelesh Srinivas Salian
>
> Instead of having a fixed retry interval that starts off very high and stays 
> there, we are better off using an exponential backoff that has the same fixed 
> max limit. Today the retry interval is fixed at 10 sec that can be 
> unnecessarily high especially when NMs could rolling restart within a sec.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4185) Retry interval delay for NM client can be improved from the fixed static retry

2015-10-03 Thread Neelesh Srinivas Salian (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14942528#comment-14942528
 ] 

Neelesh Srinivas Salian commented on YARN-4185:
---

Thoughts:
1) Using the exponentialBackoffRetry policy will have a progression of wait 
time starting at 1sec per retry assuming it takes a second for the NM to come 
up.
Hence exponentially, the backoff time increases 2,4,8,16...till 512 as we 
approach 10 retries.

2) In the current strategy, the wait time is 10 seconds which causes an NM that 
restarted in 1 second to wait for a retry.

3) In the event of the retries going forward, at the 3rd retry ( the wait time 
is collectively 7 seconds (1+2+4) as per the exponential strategy) and (30 
(10+10+10) seconds as the current static retry)

4) If you keep retrying, collectively the waiting static retry has now waited 
for 60 seconds versus 2^6 = 64 seconds in the exponential strategy at the 6th 
retry attempt.

Logic for the Design:
1) In the event of retries being default to 10, 
   a. I propose after the 3rd attempt, we continue to keep the wait time as 4 
seconds and continue the same. 
   Thus the total time comes up to 1,2,4,4,4,4,4,4,4,4 = 35 seconds.
   b. Versus collectively spending 100 seconds on waiting time in the static 
retry strategy.

2) Alternatively, the logic could be:
   a. Have the 1st 3 attempts of retry. If further needed, fall back to the 
1sec start of the same logic.
  So, it looks like this.. (1,2,4)  (1,2,4)  (1,2,4) (1) for 10 retries.
   b. Thus we get the 10 retries done in collectively 22 seconds versus 100 
seconds.

Requesting feedback.
Thank you.

> Retry interval delay for NM client can be improved from the fixed static 
> retry 
> ---
>
> Key: YARN-4185
> URL: https://issues.apache.org/jira/browse/YARN-4185
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Anubhav Dhoot
>Assignee: Neelesh Srinivas Salian
>
> Instead of having a fixed retry interval that starts off very high and stays 
> there, we are better off using an exponential backoff that has the same fixed 
> max limit. Today the retry interval is fixed at 10 sec that can be 
> unnecessarily high especially when NMs could rolling restart within a sec.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3996) YARN-789 (Support for zero capabilities in fairscheduler) is broken after YARN-3305

2015-10-02 Thread Neelesh Srinivas Salian (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14941447#comment-14941447
 ] 

Neelesh Srinivas Salian commented on YARN-3996:
---

[~adhoot] thanks for the review. Will update with Version 1.

> YARN-789 (Support for zero capabilities in fairscheduler) is broken after 
> YARN-3305
> ---
>
> Key: YARN-3996
> URL: https://issues.apache.org/jira/browse/YARN-3996
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Neelesh Srinivas Salian
>Priority: Critical
> Attachments: YARN-3996.prelim.patch
>
>
> RMAppManager#validateAndCreateResourceRequest calls into normalizeRequest 
> with mininumResource for the incrementResource. This causes normalize to 
> return zero if minimum is set to zero as per YARN-789



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4222) Retries is typoed to spell Retires in parts of hadoop-yarn and hadoop-common

2015-10-02 Thread Neelesh Srinivas Salian (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neelesh Srinivas Salian updated YARN-4222:
--
Summary: Retries is typoed to spell Retires in parts of hadoop-yarn and 
hadoop-common  (was: Retries is typoed to spell Retires in parts of the 
hadoop-yarn and hadoop-common)

> Retries is typoed to spell Retires in parts of hadoop-yarn and hadoop-common
> 
>
> Key: YARN-4222
> URL: https://issues.apache.org/jira/browse/YARN-4222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.1
>Reporter: Neelesh Srinivas Salian
>Assignee: Neelesh Srinivas Salian
>Priority: Minor
> Fix For: 2.8.0
>
> Attachments: YARN-4222.001.patch
>
>
> Spotted this typo in the code while working on a separate YARN issue.
> E.g DEFAULT_RM_NODEMANAGER_CONNECT_RETIRES
> Checked in the whole project. Found a few occurrences of the typo in 
> code/comment. 
> The JIRA is meant to help fix those typos.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-4222) Retries is typoed to spell Retires in parts of the hadoop-yarn and hadoop-common

2015-10-02 Thread Neelesh Srinivas Salian (JIRA)
Neelesh Srinivas Salian created YARN-4222:
-

 Summary: Retries is typoed to spell Retires in parts of the 
hadoop-yarn and hadoop-common
 Key: YARN-4222
 URL: https://issues.apache.org/jira/browse/YARN-4222
 Project: Hadoop YARN
  Issue Type: Bug
  Components: yarn
Affects Versions: 2.7.1
Reporter: Neelesh Srinivas Salian
Assignee: Neelesh Srinivas Salian
Priority: Minor


Spotted this typo in the code while working on a separate YARN issue.
E.g DEFAULT_RM_NODEMANAGER_CONNECT_RETIRES

Checked in the whole project. Found a few occurrences of the typo in 
code/comment. 

The JIRA is meant to help fix those typos.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-4222) Retries is typoed to spell Retires in parts of the hadoop-yarn and hadoop-common

2015-10-02 Thread Neelesh Srinivas Salian (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-4222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neelesh Srinivas Salian updated YARN-4222:
--
Attachment: YARN-4222.001.patch

> Retries is typoed to spell Retires in parts of the hadoop-yarn and 
> hadoop-common
> 
>
> Key: YARN-4222
> URL: https://issues.apache.org/jira/browse/YARN-4222
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 2.7.1
>Reporter: Neelesh Srinivas Salian
>Assignee: Neelesh Srinivas Salian
>Priority: Minor
> Attachments: YARN-4222.001.patch
>
>
> Spotted this typo in the code while working on a separate YARN issue.
> E.g DEFAULT_RM_NODEMANAGER_CONNECT_RETIRES
> Checked in the whole project. Found a few occurrences of the typo in 
> code/comment. 
> The JIRA is meant to help fix those typos.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (YARN-3996) YARN-789 (Support for zero capabilities in fairscheduler) is broken after YARN-3305

2015-10-01 Thread Neelesh Srinivas Salian (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neelesh Srinivas Salian updated YARN-3996:
--
Attachment: YARN-3996.prelim.patch

> YARN-789 (Support for zero capabilities in fairscheduler) is broken after 
> YARN-3305
> ---
>
> Key: YARN-3996
> URL: https://issues.apache.org/jira/browse/YARN-3996
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Neelesh Srinivas Salian
>Priority: Critical
> Attachments: YARN-3996.prelim.patch
>
>
> RMAppManager#validateAndCreateResourceRequest calls into normalizeRequest 
> with mininumResource for the incrementResource. This causes normalize to 
> return zero if minimum is set to zero as per YARN-789



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3996) YARN-789 (Support for zero capabilities in fairscheduler) is broken after YARN-3305

2015-10-01 Thread Neelesh Srinivas Salian (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14939941#comment-14939941
 ] 

Neelesh Srinivas Salian commented on YARN-3996:
---

[~adhoot] will add tests on top of this. Checking to see if the approach is 
right.

> YARN-789 (Support for zero capabilities in fairscheduler) is broken after 
> YARN-3305
> ---
>
> Key: YARN-3996
> URL: https://issues.apache.org/jira/browse/YARN-3996
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Neelesh Srinivas Salian
>Priority: Critical
> Attachments: YARN-3996.prelim.patch
>
>
> RMAppManager#validateAndCreateResourceRequest calls into normalizeRequest 
> with mininumResource for the incrementResource. This causes normalize to 
> return zero if minimum is set to zero as per YARN-789



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4185) Retry interval delay for NM client can be improved from the fixed static retry

2015-09-29 Thread Neelesh Srinivas Salian (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14935452#comment-14935452
 ] 

Neelesh Srinivas Salian commented on YARN-4185:
---

Writing up a patch for this.
Questions I had:
1) This would be included for the NMProxy and a new RetryPolicy setting
exponentialBackoffRetry(5,1000, TimeUnit.MILLISECONDS
What would be the value of the maxRetries?

I see the value set to 5 for NameNodeProxies. Is there an arbitrarily set value 
or does it need to be taken from the conf?
Thank you.

> Retry interval delay for NM client can be improved from the fixed static 
> retry 
> ---
>
> Key: YARN-4185
> URL: https://issues.apache.org/jira/browse/YARN-4185
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>
> Instead of having a fixed retry interval that starts off very high and stays 
> there, we are better off using an exponential backoff that has the same fixed 
> max limit. Today the retry interval is fixed at 10 sec that can be 
> unnecessarily high especially when NMs could rolling restart within a sec.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3996) YARN-789 (Support for zero capabilities in fairscheduler) is broken after YARN-3305

2015-09-29 Thread Neelesh Srinivas Salian (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14936319#comment-14936319
 ] 

Neelesh Srinivas Salian commented on YARN-3996:
---

The idea here is:
Scenario:
1) The ResourceRequest can be requesting for a value in memory that is in 
between RM_SCHEDULER_MINIMUM_ALLOCATION_MB and 
RM_SCHEDULER_INCREMENT_ALLOCATION_MB 
Where, let's say, the minimum is set to zero while the increment is 512MB and 
the request is 256MB
In such an event, the normalizeRequest() will normalize the request to the 
minimum as opposed to the increment which will be 512MB and fulfilling the 
request.

a. I think we may have to change the 
Resource normalize(
  ResourceCalculator calculator, Resource lhs, Resource min,
  Resource max, Resource increment)
with a check for Zero requests

But that would more of a core change that I am not too sure to do if it breaks 
anything else.

b. The other would be to check the zero requests and add a check in the Fair 
and Capacity scheduler code prior to calling  SchedulerUtils.normalizeRequests()

Thoughts?

Thank you.

> YARN-789 (Support for zero capabilities in fairscheduler) is broken after 
> YARN-3305
> ---
>
> Key: YARN-3996
> URL: https://issues.apache.org/jira/browse/YARN-3996
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Critical
>
> RMAppManager#validateAndCreateResourceRequest calls into normalizeRequest 
> with mininumResource for the incrementResource. This causes normalize to 
> return zero if minimum is set to zero as per YARN-789



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4185) Retry interval delay for NM client can be improved from the fixed static retry

2015-09-28 Thread Neelesh Srinivas Salian (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934350#comment-14934350
 ] 

Neelesh Srinivas Salian commented on YARN-4185:
---

I would like to work on this JIRA if no one has already begun.
If yes, could you please assign the JIRA to me?

Thank you.

> Retry interval delay for NM client can be improved from the fixed static 
> retry 
> ---
>
> Key: YARN-4185
> URL: https://issues.apache.org/jira/browse/YARN-4185
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>
> Instead of having a fixed retry interval that starts off very high and stays 
> there, we are better off using an exponential backoff that has the same fixed 
> max limit. Today the retry interval is fixed at 10 sec that can be 
> unnecessarily high especially when NMs could rolling restart within a sec.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3996) YARN-789 (Support for zero capabilities in fairscheduler) is broken after YARN-3305

2015-09-28 Thread Neelesh Srinivas Salian (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934349#comment-14934349
 ] 

Neelesh Srinivas Salian commented on YARN-3996:
---

I would like to work on this JIRA if no one has already begun.
If yes, could you please assign the JIRA to me?

Thank you.

> YARN-789 (Support for zero capabilities in fairscheduler) is broken after 
> YARN-3305
> ---
>
> Key: YARN-3996
> URL: https://issues.apache.org/jira/browse/YARN-3996
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacityscheduler, fairscheduler
>Reporter: Anubhav Dhoot
>Assignee: Anubhav Dhoot
>Priority: Critical
>
> RMAppManager#validateAndCreateResourceRequest calls into normalizeRequest 
> with mininumResource for the incrementResource. This causes normalize to 
> return zero if minimum is set to zero as per YARN-789



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2062) Too many InvalidStateTransitionExceptions from NodeState.NEW on RM failover

2015-09-27 Thread Neelesh Srinivas Salian (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14909918#comment-14909918
 ] 

Neelesh Srinivas Salian commented on YARN-2062:
---

[~ka...@cloudera.com] is this still valid?



> Too many InvalidStateTransitionExceptions from NodeState.NEW on RM failover
> ---
>
> Key: YARN-2062
> URL: https://issues.apache.org/jira/browse/YARN-2062
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.4.0
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
> Attachments: yarn-2062-1.patch
>
>
> On busy clusters, we see several 
> {{org.apache.hadoop.yarn.state.InvalidStateTransitonException}} for events 
> invoked against NEW nodes. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-2353) FairScheduler: Update demand asynchronously instead of in the Update Thread

2015-09-27 Thread Neelesh Srinivas Salian (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14909919#comment-14909919
 ] 

Neelesh Srinivas Salian commented on YARN-2353:
---

[~ka...@cloudera.com],if you are not looking into it, I would like to help out.
Either way, could you please add some description to help elaborate the 
improvement?

Thank you.

> FairScheduler: Update demand asynchronously instead of in the Update Thread
> ---
>
> Key: YARN-2353
> URL: https://issues.apache.org/jira/browse/YARN-2353
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: scheduler
>Affects Versions: 2.4.1
>Reporter: Karthik Kambatla
>Assignee: Karthik Kambatla
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4123) Unable to start YARN - Error starting JobHistoryServer

2015-09-26 Thread Neelesh Srinivas Salian (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14909547#comment-14909547
 ] 

Neelesh Srinivas Salian commented on YARN-4123:
---

[~VINOTH.KANAKASABAPATHY], since you are running CDH, I believe 
https://community.cloudera.com/ will be the right avenue to help you move 
forward with your question, if you are still observing the behavior.

Closing the JIRA here for now. 
Please re-open if applicable.

Thank you.



> Unable to start YARN - Error starting JobHistoryServer
> --
>
> Key: YARN-4123
> URL: https://issues.apache.org/jira/browse/YARN-4123
> Project: Hadoop YARN
>  Issue Type: Bug
> Environment: Cloudera CDH 5.4.0
>Reporter: Vinoth Kanakasabapathy
>
> Hi,
> I am having issues while restarting YARN service. It keeps failing with 
> errors shown in the logs below. YARN was working fine until last week and 
> then the below error messages started to pop up all of a sudden. 
> Tried restarting YARN/JOB history server. Nothing worked. Kindly help to 
> alleviate this issue.
> Thanks,
> Vinoth
> org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer
> Error starting JobHistoryServer
> java.lang.IllegalAccessError: tried to access class 
> org.apache.hadoop.mapred.JobACLsManager from class 
> org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager
>   at 
> org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.serviceInit(HistoryFileManager.java:503)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.mapreduce.v2.hs.JobHistory.serviceInit(JobHistory.java:94)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
>   at 
> org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.serviceInit(JobHistoryServer.java:145)
>   at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>   at 
> org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.launchJobHistoryServer(JobHistoryServer.java:222)
>   at 
> org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.main(JobHistoryServer.java:232)
> 1:06:55.303 PMINFOorg.apache.hadoop.util.ExitUtil 
> Exiting with status -1
> 1:06:55.355 PMINFOorg.apache.hadoop.mapreduce.v2.hs.JobHistory
> Stopping JobHistory
> 1:06:55.358 PMINFO
> org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer  
> SHUTDOWN_MSG: 
> /
> SHUTDOWN_MSG: Shutting down JobHistoryServer at master/10.144.25.49
> /



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-1772) Fair Scheduler documentation should indicate that admin ACLs also give submit permissions

2015-09-13 Thread Neelesh Srinivas Salian (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742681#comment-14742681
 ] 

Neelesh Srinivas Salian commented on YARN-1772:
---

This looks to be resolved. [~d4rr3ll]'s comment is the line that exists. If 
there needs to be any additional clarity, I can add so accordingly.

[~sandyr]

Thank you.

> Fair Scheduler documentation should indicate that admin ACLs also give submit 
> permissions
> -
>
> Key: YARN-1772
> URL: https://issues.apache.org/jira/browse/YARN-1772
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Sandy Ryza
>Priority: Minor
>  Labels: newbie
>
> I can submit to a Fair Scheduler queue if I'm in the submit ACL OR if I'm in 
> the administer ACL.  The Fair Scheduler docs seem to leave out the second 
> part. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3455) Document CGroup support

2015-09-13 Thread Neelesh Srinivas Salian (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14742709#comment-14742709
 ] 

Neelesh Srinivas Salian commented on YARN-3455:
---

[~rohit12sh] does this need anything additional?


> Document CGroup support 
> 
>
> Key: YARN-3455
> URL: https://issues.apache.org/jira/browse/YARN-3455
> Project: Hadoop YARN
>  Issue Type: Task
>  Components: documentation
>Reporter: Rohith Sharma K S
>
> It would be very useful if CGroup support is documented having sections like 
> below
> # Introduction
> # Configuring CGroups
> # Any specific configuration that controls CPU scheduling
> # How/when to use CGroups with some use case expanations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)