[ 
https://issues.apache.org/jira/browse/YARN-4767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15322969#comment-15322969
 ] 

Ashwin Shankar edited comment on YARN-4767 at 6/9/16 5:53 PM:
--------------------------------------------------------------

hey [~templedf],
Thanks much for rebasing the patch! Just to give you some context on what we 
see at my company - we first got complaints from users that they cannot access 
the AM UI.Since these http requests go through the Web proxy, we looked at that 
process and found that it was unresponsive since all its threads were busy.When 
we listed open file descriptors, we saw that the webproxy had many connections 
from itself to itself, which seemed weird then, but makes sense now since its 
due to AM redirecting requests back to proxy. Web proxy logs showed that  most 
of the requests were made to one or two specific apps. We then looked at that 
app's AM logs and found "UnresolvedHostException" when AM was trying to resolve 
proxy host(which is basically the master node where RM runs) in AmIpFilter 
code. We believe it wasn't able to resolve due to an intermittent network event 
to the DNS but its not conclusive. Overall, this issue has been occurring 
pretty much once every week and we have to bounce the webproxy to fix it. 

Thanks [~kasha] for the review! [~xgong], [~vinodkv] please take a look at the 
patch when you get a chance. we would like to backport it as soon as its 
committed.


was (Author: ashwinshankar77):
hey [~templedf],
Thanks much for rebasing the patch! Just to give you some context on what we 
see at my company - we first got complaints from users that they cannot access 
the AM UI.Since these http requests go through the Web proxy, we looked at that 
process and found that it was unresponsive since all its threads were busy.When 
we listed open file descriptors, we saw that the webproxy had many connections 
from itself to itself, which seemed weird then, but makes sense now since its 
due to AM redirecting requests back to proxy. Web proxy logs showed that  most 
of the requests were made to one or two specific apps. We then looked at that 
app's AM logs and found "UnresolvedHostException" when AM was trying to resolve 
proxy host(which is basically the master node where RM runs) in AmIpFilter 
code. We believe it wasn't able to resolve due to an intermittent network event 
to the DNS but its not conclusive. Overall, this issue has been occurring 
pretty much once every week and we have to bounce the webproxy to fix it. 

Thanks [~kasha] for review! [~xgong], [~vinodkv] please take a look at the 
patch when you get a chance. we would like to backport it as soon as its 
committed.

> Network issues can cause persistent RM UI outage
> ------------------------------------------------
>
>                 Key: YARN-4767
>                 URL: https://issues.apache.org/jira/browse/YARN-4767
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: webapp
>    Affects Versions: 2.7.2
>            Reporter: Daniel Templeton
>            Assignee: Daniel Templeton
>            Priority: Critical
>         Attachments: YARN-4767.001.patch, YARN-4767.002.patch, 
> YARN-4767.003.patch, YARN-4767.004.patch, YARN-4767.005.patch, 
> YARN-4767.006.patch, YARN-4767.007.patch
>
>
> If a network issue causes an AM web app to resolve the RM proxy's address to 
> something other than what's listed in the allowed proxies list, the 
> AmIpFilter will 302 redirect the RM proxy's request back to the RM proxy.  
> The RM proxy will then consume all available handler threads connecting to 
> itself over and over, resulting in an outage of the web UI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to