[ 
https://issues.apache.org/jira/browse/YARN-3020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14275825#comment-14275825
 ] 

Peter D Kirchner commented on YARN-3020:
----------------------------------------

I investigated the rates in the third paragraph of my comment immediately 
above, and found that an application is able to make addContainerRequest()s 
much faster than this.  Bear in mind that the elapsed time for making the 
client-api call to addContainerRequest() is not a measurement of the 
performance impact of the reported over-requests sent to the server and the 
resulting over-allocation of containers. It turns out my application has some 
extrinsic delay in issuing addContainerRequests which predominated in limiting 
the rate I measured and reported in the third paragraph of the comment 
immediately above.

To follow up, I measured addContainerRequest() timing with System.nanoTime().  
The first call to addContainerRequest() takes around 5 milliseconds.  The rest 
take around half a millisecond on average.  Here are some statistics for 
calling addContainerRequest():  microseconds average=433 count=914 max=11202 
min=223 .  I measure similar times for consecutive calls (without additional 
application delays in between addContainerRequest()s).

When the over-request bug is fixed, I will still think it tedious to call 1000x 
for 1000 identical containers but many applications can probably afford the 
half second to do so. Arguably, the bug exists in part because of the 
tediousness of bookkeeping on the yarn-client-api side for these requests.  If 
in the process of bug-fixing or cleanup, a change that re-introduces an integer 
quantity with the request would be welcome.

> n similar addContainerRequest()s produce n*(n+1)/2 containers
> -------------------------------------------------------------
>
>                 Key: YARN-3020
>                 URL: https://issues.apache.org/jira/browse/YARN-3020
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 2.5.0, 2.6.0, 2.5.1, 2.5.2
>            Reporter: Peter D Kirchner
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> BUG: If the application master calls addContainerRequest() n times, but with 
> the same priority, I get up to 1+2+3+...+n containers = n*(n+1)/2 .  The most 
> containers are requested when the interval between calls to 
> addContainerRequest() exceeds the heartbeat interval of calls to allocate() 
> (in AMRMClientImpl's run() method).
> If the application master calls addContainerRequest() n times, but with a 
> unique priority each time, I get n containers (as I intended).
> Analysis:
> There is a logic problem in AMRMClientImpl.java.
> Although AMRMClientImpl.java, allocate() does an ask.clear() , on subsequent 
> calls to addContainerRequest(), addResourceRequest() finds the previous 
> matching remoteRequest and increments the container count rather than 
> starting anew, and does an addResourceRequestToAsk() which defeats the 
> ask.clear().
> From documentation and code comments, it was hard for me to discern the 
> intended behavior of the API, but the inconsistency reported in this issue 
> suggests one case or the other is implemented incorrectly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to