[jira] [Comment Edited] (YARN-8248) Job hangs when a job requests a resource that its queue does not have

2018-05-21 Thread Szilard Nemeth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16482255#comment-16482255
 ] 

Szilard Nemeth edited comment on YARN-8248 at 5/21/18 7:25 AM:
---

Hi [~haibochen]!

Fixed those with patch013, except the first that complains about 
SchedulerUtils.recordFactory as I haven't modified that one and uppercase 
naming for this field does not make sense I guess.


was (Author: snemeth):
Hi [~haibochen]!

Fixed those, except the first that complains about SchedulerUtils.recordFactory 
as I haven't modified that one and uppercase naming for this field does not 
make sense I guess.

> Job hangs when a job requests a resource that its queue does not have
> -
>
> Key: YARN-8248
> URL: https://issues.apache.org/jira/browse/YARN-8248
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8248-001.patch, YARN-8248-002.patch, 
> YARN-8248-003.patch, YARN-8248-004.patch, YARN-8248-005.patch, 
> YARN-8248-006.patch, YARN-8248-007.patch, YARN-8248-008.patch, 
> YARN-8248-009.patch, YARN-8248-010.patch, YARN-8248-011.patch, 
> YARN-8248-012.patch, YARN-8248-013.patch
>
>
> Job hangs when mapreduce.job.queuename is specified and the queue has 0 of 
> any resource (vcores / memory / other)
> In this scenario, the job should be immediately rejected upon submission 
> since the specified queue cannot serve the resource needs of the submitted 
> job.
>  
> Command to run:
> {code:java}
> bin/yarn jar 
> "./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar" 
> pi -Dmapreduce.job.queuename=sample_queue 1 1000;{code}
> fair-scheduler.xml queue config (excerpt):
>  
> {code:java}
>  
> 1 mb,0vcores
> 9 mb,0vcores
> 50
> -1.0f
> 2.0
> fair
>   
> {code}
> Diagnostic message from the web UI: 
> {code:java}
> Wed May 02 06:35:57 -0700 2018] Application is added to the scheduler and is 
> not yet activated. (Resource request:  exceeds current 
> queue or its parents maximum resource allowed).{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8248) Job hangs when a job requests a resource that its queue does not have

2018-05-18 Thread Szilard Nemeth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16481001#comment-16481001
 ] 

Szilard Nemeth edited comment on YARN-8248 at 5/18/18 6:42 PM:
---

Thanks [~haibochen] for your comments again!
 # Indeed, I removed this class
 # 3. Fair enough, in both methods I was able to use 
SchedulerUtils.validateResourceRequests() so the code is consistent now.

4. Fixed

5. As per our discussion offline, we agreed on a cleaner way so no exception 
expectations or fail() is required in the catch block.

 

Please check the new patch and let me know if it looks good!

Thanks!


was (Author: snemeth):
Thanks [~haibochen] for your comments again!
 # Indeed, I removed this class
 # 3. Fair enough, in both methods I was able to use 
SchedulerUtils.validateResourceRequests() so the code is consistent now.

4. Fixed

5. As per our discussion offline, we agreed on a clenar way so no exception 
expectations or fail() is required in the catch block.

 

Please check the new patch and let me know if it looks good!

Thanks!

> Job hangs when a job requests a resource that its queue does not have
> -
>
> Key: YARN-8248
> URL: https://issues.apache.org/jira/browse/YARN-8248
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8248-001.patch, YARN-8248-002.patch, 
> YARN-8248-003.patch, YARN-8248-004.patch, YARN-8248-005.patch, 
> YARN-8248-006.patch, YARN-8248-007.patch, YARN-8248-008.patch, 
> YARN-8248-009.patch, YARN-8248-010.patch, YARN-8248-011.patch
>
>
> Job hangs when mapreduce.job.queuename is specified and the queue has 0 of 
> any resource (vcores / memory / other)
> In this scenario, the job should be immediately rejected upon submission 
> since the specified queue cannot serve the resource needs of the submitted 
> job.
>  
> Command to run:
> {code:java}
> bin/yarn jar 
> "./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar" 
> pi -Dmapreduce.job.queuename=sample_queue 1 1000;{code}
> fair-scheduler.xml queue config (excerpt):
>  
> {code:java}
>  
> 1 mb,0vcores
> 9 mb,0vcores
> 50
> -1.0f
> 2.0
> fair
>   
> {code}
> Diagnostic message from the web UI: 
> {code:java}
> Wed May 02 06:35:57 -0700 2018] Application is added to the scheduler and is 
> not yet activated. (Resource request:  exceeds current 
> queue or its parents maximum resource allowed).{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8248) Job hangs when a job requests a resource that its queue does not have

2018-05-18 Thread Szilard Nemeth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16480469#comment-16480469
 ] 

Szilard Nemeth edited comment on YARN-8248 at 5/18/18 11:50 AM:


Thanks [~haibochen] for your comments!
 # Thanks, testcase names are indeed better this way.
 # Good point. I modified the arguments then some testcases turned out to be 
doing essentially the same. 
 # Thanks for spotting this. Added Assert.fail().
 # Very good point, this was the part I was unsure about as indicated, I mean 
where I should throw an exception inside allocate(). Modified the code to be 
fail-fast and modified the testcases accordingly. Please note that from now the 
testcases also check the correctness of the fail-fast behavior.
 # My intentions with validateAndFilterAsks() were:

 - Collect all the invalid resource requests.
 - In validateResourceRequestsAgainstQueueMaxResource(): collect all the 
invalid resource informations (invalid resource requests).
 - Remove invalid resource requests from the original asks collection. This one 
turned out to be unnecessary as per the fail-fast behaviour.
 The code block you mentioned would throw 
SchedulerInvalidResoureRequestException when the first invalid resource is 
found without checking the rest of them.
 Moreover, isAnyMajorResourceZero(...) actually checks if any resource is 0, 
but if you don't request that resource it's not an invalid resource request 
situation I think. E.g. memory configured as max resources for queue is 0 but 
you don't request memory. This is why I collect all the resources with 0 amount 
and check whether those resources are requested or not.
 I think what I described with the first 2 points are useful as the AM will 
exactly know which resource requests are invalid from all the resource requests 
and will exactly know which resource type caused the problem in those requests.
 I would vote for this method stay as it is but if you still have concerns we 
should talk about this offline.

If you don't think that collecting invalid resource request gives too much 
value, I can align my code how you suggested, to throw exception on the first 
found invalid one.

Please check my updated patch!


was (Author: snemeth):
Thanks [~haibochen] for your comments!
 # Thanks, testcase names are indeed better this way.
 # Good point. I modified the arguments then some testcases turned out to be 
doing essentially the same. 
 # Thanks for spotting this. Added Assert.fail().
 # Very good point, this was the part I was unsure about as indicated, I mean 
where I should throw an exception inside allocate(). Modified the code to be 
fail-fast and modified the testcases accordingly. Please note that from now the 
testcases also check the correctness of the fail-fast behavior.
 # My intentions with validateAndFilterAsks() were:

 - Collect all the invalid resource requests.
 - In validateResourceRequestsAgainstQueueMaxResource(): collect all the 
invalid resource informations (invalid resource requests).
 - Remove invalid resource requests from the original asks collection. This one 
turned out to be unnecessary as per the fail-fast behaviour.
 The code block you mentioned would throw 
SchedulerInvalidResoureRequestException when the first invalid resource is 
found without checking the rest of them.
 Moreover, isAnyMajorResourceZero(...) actually checks if any resource is 0, 
but if you don't request that resource it's not an invalid resource request 
situation I think. E.g. memory configured as max resources for queue is 0 but 
you don't request memory. This is why I collect all the resources with 0 amount 
and check whether those resources are requested or not.
 I think what I described with the first 2 points are useful as the AM will 
exactly know which resource requests are invalid from all the resource requests 
and will exactly know which resource type caused the problem in those requests.
I would vote for this method stay as it is but if you still have concerns we 
should talk about this offline.

> Job hangs when a job requests a resource that its queue does not have
> -
>
> Key: YARN-8248
> URL: https://issues.apache.org/jira/browse/YARN-8248
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8248-001.patch, YARN-8248-002.patch, 
> YARN-8248-003.patch, YARN-8248-004.patch, YARN-8248-005.patch, 
> YARN-8248-006.patch, YARN-8248-007.patch, YARN-8248-008.patch
>
>
> Job hangs when mapreduce.job.queuename is specified and the queue has 0 of 
> any resource (vcores / memory / other)
> In this scenario, the job should be immediately rejected upon submission 
> since the 

[jira] [Comment Edited] (YARN-8248) Job hangs when a job requests a resource that its queue does not have

2018-05-18 Thread Szilard Nemeth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16480469#comment-16480469
 ] 

Szilard Nemeth edited comment on YARN-8248 at 5/18/18 10:31 AM:


Thanks [~haibochen] for your comments!
 # Thanks, testcase names are indeed better this way.
 # Good point. I modified the arguments then some testcases turned out to be 
doing essentially the same. 
 # Thanks for spotting this. Added Assert.fail().
 # Very good point, this was the part I was unsure about as indicated, I mean 
where I should throw an exception inside allocate(). Modified the code to be 
fail-fast and modified the testcases accordingly. Please note that from now the 
testcases also check the correctness of the fail-fast behavior.
 # My intentions with validateAndFilterAsks() were:

 - Collect all the invalid resource requests.
 - In validateResourceRequestsAgainstQueueMaxResource(): collect all the 
invalid resource informations (invalid resource requests).
 - Remove invalid resource requests from the original asks collection. This one 
turned out to be unnecessary as per the fail-fast behaviour.
 The code block you mentioned would throw 
SchedulerInvalidResoureRequestException when the first invalid resource is 
found without checking the rest of them.
 Moreover, isAnyMajorResourceZero(...) actually checks if any resource is 0, 
but if you don't request that resource it's not an invalid resource request 
situation I think. E.g. memory configured as max resources for queue is 0 but 
you don't request memory. This is why I collect all the resources with 0 amount 
and check whether those resources are requested or not.
 I think what I described with the first 2 points are useful as the AM will 
exactly know which resource requests are invalid from all the resource requests 
and will exactly know which resource type caused the problem in those requests.
I would vote for this method stay as it is but if you still have concerns we 
should talk about this offline.


was (Author: snemeth):
Thanks [~haibochen] for your comments!
 # Thanks, testcase names are indeed better this way.
 # Good point. I modified the arguments then some testcases turned out to be 
doing essentially the same. 
 # Thanks for spotting this. Added Assert.fail().
 # Very good point, this was the part I was unsure about as indicated, I mean 
where I should throw an exception inside allocate(). Modified the code to be 
fail-fast and modified the testcases accordingly. Please note that from now the 
testcases also check the correctness of the fail-fast behavior.
 # My intentions with validateAndFilterAsks() were: 
- Collect all the invalid resource requests.
- In validateResourceRequestsAgainstQueueMaxResource(): collect all the invalid 
resource informations (invalid resource requests).
- Remove invalid resource requests from the original asks collection. This one 
turned out to be unnecessary as per the fail-fast behaviour.
The code block you mentioned would throw 
SchedulerInvalidResoureRequestException when the first invalid resource is 
found without checking the rest of them.
Moreover, isAnyMajorResourceZero(...) actually checks if any resource is 0, but 
if you don't request that resource it's not an invalid resource request 
situation I think. E.g. memory configured as max resources for queue is 0 but 
you don't request memory. This is why I collect all the resources with 0 amount 
and check whether those resources are requested or not.
I think what I described with the first 2 points are useful as the AM will 
exactly know which resource requests are invalid from all the resource requests 
and will exactly know which resource type caused the problem in those requests.

> Job hangs when a job requests a resource that its queue does not have
> -
>
> Key: YARN-8248
> URL: https://issues.apache.org/jira/browse/YARN-8248
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8248-001.patch, YARN-8248-002.patch, 
> YARN-8248-003.patch, YARN-8248-004.patch, YARN-8248-005.patch, 
> YARN-8248-006.patch, YARN-8248-007.patch
>
>
> Job hangs when mapreduce.job.queuename is specified and the queue has 0 of 
> any resource (vcores / memory / other)
> In this scenario, the job should be immediately rejected upon submission 
> since the specified queue cannot serve the resource needs of the submitted 
> job.
>  
> Command to run:
> {code:java}
> bin/yarn jar 
> "./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar" 
> pi -Dmapreduce.job.queuename=sample_queue 1 1000;{code}
> fair-scheduler.xml queue config (excerpt):
>  
> {code:java}
>  
> 1 mb,0vcores

[jira] [Comment Edited] (YARN-8248) Job hangs when a job requests a resource that its queue does not have

2018-05-17 Thread Szilard Nemeth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16479338#comment-16479338
 ] 

Szilard Nemeth edited comment on YARN-8248 at 5/17/18 4:37 PM:
---

Thanks [~haibochen] for your comments.
 # Deleted the debug message.
 # Comments are added.
 # I have added code that filters invalid asks and throws 
SchedulerInvalidResourceRequestException if at least one invalid resource 
request was found. I added the throw statement to the very end of the 
allocate() method, not sure if this is correct this way.

Thanks!


was (Author: snemeth):
Thanks [~haibochen] for your comments.
 # Deleted the debug message.
 # Comments are added.
 # I have added code that filters invalid asks and throws 
SchedulerInvalidResourceRequestException if at least one invalid resource 
request was found. I added the throw statement to the very end of the 
allocate() method, not sure if this is correct this way.

Please check!

> Job hangs when a job requests a resource that its queue does not have
> -
>
> Key: YARN-8248
> URL: https://issues.apache.org/jira/browse/YARN-8248
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, yarn
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8248-001.patch, YARN-8248-002.patch, 
> YARN-8248-003.patch, YARN-8248-004.patch, YARN-8248-005.patch, 
> YARN-8248-006.patch, YARN-8248-007.patch
>
>
> Job hangs when mapreduce.job.queuename is specified and the queue has 0 of 
> any resource (vcores / memory / other)
> In this scenario, the job should be immediately rejected upon submission 
> since the specified queue cannot serve the resource needs of the submitted 
> job.
>  
> Command to run:
> {code:java}
> bin/yarn jar 
> "./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar" 
> pi -Dmapreduce.job.queuename=sample_queue 1 1000;{code}
> fair-scheduler.xml queue config (excerpt):
>  
> {code:java}
>  
> 1 mb,0vcores
> 9 mb,0vcores
> 50
> -1.0f
> 2.0
> fair
>   
> {code}
> Diagnostic message from the web UI: 
> {code:java}
> Wed May 02 06:35:57 -0700 2018] Application is added to the scheduler and is 
> not yet activated. (Resource request:  exceeds current 
> queue or its parents maximum resource allowed).{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8248) Job hangs when a job requests a resource that its queue does not have

2018-05-17 Thread Szilard Nemeth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16479338#comment-16479338
 ] 

Szilard Nemeth edited comment on YARN-8248 at 5/17/18 4:37 PM:
---

Thanks [~haibochen] for your comments.
 # Deleted the debug message.
 # Comments are added.
 # I have added code that filters invalid asks and throws 
SchedulerInvalidResourceRequestException if at least one invalid resource 
request was found. I added the throw statement to the very end of the 
allocate() method, not sure if this is correct this way.

Please check!


was (Author: snemeth):
Thanks [~haibochen] for your comments.
 # Deleted the debug message.
 # Comments are added.
 # I have added code that filters invalid asks and throws 
ScgedulerInvalidResourceRequestException if at least one invalid resource 
request was found. I added the throw statement to the very end of the 
allocate() method, not sure if this is correct this way.

Please check!

> Job hangs when a job requests a resource that its queue does not have
> -
>
> Key: YARN-8248
> URL: https://issues.apache.org/jira/browse/YARN-8248
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, yarn
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8248-001.patch, YARN-8248-002.patch, 
> YARN-8248-003.patch, YARN-8248-004.patch, YARN-8248-005.patch, 
> YARN-8248-006.patch, YARN-8248-007.patch
>
>
> Job hangs when mapreduce.job.queuename is specified and the queue has 0 of 
> any resource (vcores / memory / other)
> In this scenario, the job should be immediately rejected upon submission 
> since the specified queue cannot serve the resource needs of the submitted 
> job.
>  
> Command to run:
> {code:java}
> bin/yarn jar 
> "./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar" 
> pi -Dmapreduce.job.queuename=sample_queue 1 1000;{code}
> fair-scheduler.xml queue config (excerpt):
>  
> {code:java}
>  
> 1 mb,0vcores
> 9 mb,0vcores
> 50
> -1.0f
> 2.0
> fair
>   
> {code}
> Diagnostic message from the web UI: 
> {code:java}
> Wed May 02 06:35:57 -0700 2018] Application is added to the scheduler and is 
> not yet activated. (Resource request:  exceeds current 
> queue or its parents maximum resource allowed).{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-8248) Job hangs when a job requests a resource that its queue does not have

2018-05-17 Thread Szilard Nemeth (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-8248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16479338#comment-16479338
 ] 

Szilard Nemeth edited comment on YARN-8248 at 5/17/18 4:36 PM:
---

Thanks [~haibochen] for your comments.
 # Deleted the debug message.
 # Comments are added.
 # I have added code that filters invalid asks and throws 
ScgedulerInvalidResourceRequestException if at least one invalid resource 
request was found. I added the throw statement to the very end of the 
allocate() method, not sure if this is correct this way.

Please check!


was (Author: snemeth):
Thanks [~haibochen] for your comments.
 # Deleted the debug message.
 # Comments are added.
 # I have added code that filters invalid asks and throws 
ScgedulerInvalidResourceRequestException if at least one invalid resource 
request was found. I added the throw statement to the very end of the 
allocate() method, not sure if this is correct this way.

Please check!

> Job hangs when a job requests a resource that its queue does not have
> -
>
> Key: YARN-8248
> URL: https://issues.apache.org/jira/browse/YARN-8248
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: fairscheduler, yarn
>Reporter: Szilard Nemeth
>Assignee: Szilard Nemeth
>Priority: Major
> Attachments: YARN-8248-001.patch, YARN-8248-002.patch, 
> YARN-8248-003.patch, YARN-8248-004.patch, YARN-8248-005.patch, 
> YARN-8248-006.patch, YARN-8248-007.patch
>
>
> Job hangs when mapreduce.job.queuename is specified and the queue has 0 of 
> any resource (vcores / memory / other)
> In this scenario, the job should be immediately rejected upon submission 
> since the specified queue cannot serve the resource needs of the submitted 
> job.
>  
> Command to run:
> {code:java}
> bin/yarn jar 
> "./share/hadoop/mapreduce/hadoop-mapreduce-examples-$MY_HADOOP_VERSION.jar" 
> pi -Dmapreduce.job.queuename=sample_queue 1 1000;{code}
> fair-scheduler.xml queue config (excerpt):
>  
> {code:java}
>  
> 1 mb,0vcores
> 9 mb,0vcores
> 50
> -1.0f
> 2.0
> fair
>   
> {code}
> Diagnostic message from the web UI: 
> {code:java}
> Wed May 02 06:35:57 -0700 2018] Application is added to the scheduler and is 
> not yet activated. (Resource request:  exceeds current 
> queue or its parents maximum resource allowed).{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org