[jira] [Commented] (YARN-8108) RM metrics rest API throws GSSException in kerberized environment

Eric Yang (JIRA) Wed, 18 Apr 2018 17:14:45 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-8108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16443370#comment-16443370
 ]


Eric Yang commented on YARN-8108:
---------------------------------

{quote}Why do we want to bypass the code registered by the proxyserver?{quote}

1.  proxyserver is a default registration to have spnego filter applied to 
logs, and static directory.  The same protection can be covered by 
RMAuthenticationFilter.  It doesn't seem to cause harm when this is going 
through RMAuthenticationFilter path.  I am not sure if proxyserver should or 
should not issue DT when it is embedded on RM.  Thoughts?

{quote}Should the proxy service even be using the RM's auth filter?{quote}
2.  Probably for embedded case.  Both /proxy and /cluster can be configured to 
two different HTTP principals.  This might be problematic when checking if TGS 
session key is granted properly because the service principal names are the 
same on the same service port, but it appears there is two TGS session key 
initialized by Hadoop code to cause this problem.

{quote}How/why does changing addFilter to addGlobalFilter fix the problem? 
Adding the filter to every context (even those explicitly registered to not be 
filtered) seems counterintuitive.{quote}
3.  The global coverage ensures that all entry points are served by the same 
TGS session key.  Hence, it doesn't confuse Kerberos that there are two 
kerberos sessions initialized.

{quote}I think we also need to root cause exactly what change caused the RM 
auth filter to be double registered so we can ensure we've correctly fixed the 
bug.{quote}

I tried to peel the code to see if we can eliminate one of the registration, 
but it appears that we have spotty kerberos coverage when that happens.  The 
amount of code reuse make it impossible to decouple the threads and regroup 
them in the timely manner.  The trade off between refactoring httpServer2 code 
to get filter initialization configuration correct for special case vs global 
coverage of the same TGS session key.  I chose to be conservative in code 
changes without causing regression.  This is the reason that global filter 
approach was implemented.  Base on my testing, nothing is broken when global 
filter is applied.

I found the following stack trace that might be able to help people who like to 
try to improve filter configuration in httpServer2.  However, the init code is 
all over the place, but going down the same path, it is not easy to untangle 
them.

First instance (RMAuthenticationFilter applies to /cluster/*, /ws/*, /app/*) is 
configured at:
{code}
        at org.apache.hadoop.http.HttpServer2.defineFilter(HttpServer2.java:998)
        at org.apache.hadoop.http.HttpServer2.addFilter(HttpServer2.java:962)
        at 
org.apache.hadoop.yarn.server.security.http.RMAuthenticationFilterInitializer.initFilter(RMAuthenticationFilterInitializer.java:111)
        at 
org.apache.hadoop.http.HttpServer2.initializeWebServer(HttpServer2.java:587)
        at org.apache.hadoop.http.HttpServer2.<init>(HttpServer2.java:537)
        at org.apache.hadoop.http.HttpServer2.<init>(HttpServer2.java:117)
        at 
org.apache.hadoop.http.HttpServer2$Builder.build(HttpServer2.java:421)
        at org.apache.hadoop.yarn.webapp.WebApps$Builder.build(WebApps.java:333)
        at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:424)
        at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startWepApp(ResourceManager.java:1189)
        at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1299)
        at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
        at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1495)
{code}

Second instance (SpnegoFilter applies to /logs, /static/*, /proxy/*) is 
configured at:
{code}
        at org.apache.hadoop.http.HttpServer2.defineFilter(HttpServer2.java:998)
        at org.apache.hadoop.http.HttpServer2.defineFilter(HttpServer2.java:989)
        at org.apache.hadoop.http.HttpServer2.initSpnego(HttpServer2.java:1141)
        at org.apache.hadoop.http.HttpServer2.access$200(HttpServer2.java:117)
        at 
org.apache.hadoop.http.HttpServer2$Builder.build(HttpServer2.java:424)
        at org.apache.hadoop.yarn.webapp.WebApps$Builder.build(WebApps.java:333)
        at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:424)
        at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startWepApp(ResourceManager.java:1189)
        at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1299)
        at 
org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
        at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1495)
{code}

Line number might be slightly off because I added some debug statement to 
gather the stack trace.

> RM metrics rest API throws GSSException in kerberized environment
> -----------------------------------------------------------------
>
>                 Key: YARN-8108
>                 URL: https://issues.apache.org/jira/browse/YARN-8108
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 3.0.0
>            Reporter: Kshitij Badani
>            Priority: Major
>         Attachments: YARN-8108.001.patch
>
>
> Test is trying to pull up metrics data from SHS after kiniting as 'test_user'
> It is throwing GSSException as follows
> {code:java}
> b2b460b80713|RUNNING: curl --silent -k -X GET -D 
> /hwqe/hadoopqe/artifacts/tmp-94845 --negotiate -u : 
> http://rm_host:8088/proxy/application_1518674952153_0070/metrics/json2018-02-15
>  07:15:48,757|INFO|MainThread|machine.py:194 - 
> run()||GUID=fc5a3266-28f8-4eed-bae2-b2b460b80713|Exit Code: 0
> 2018-02-15 07:15:48,758|INFO|MainThread|spark.py:1757 - 
> getMetricsJsonData()|metrics:
> <html>
> <head>
> <meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>
> <title>Error 403 GSSException: Failure unspecified at GSS-API level 
> (Mechanism level: Request is a replay (34))</title>
> </head>
> <body><h2>HTTP ERROR 403</h2>
> <p>Problem accessing /proxy/application_1518674952153_0070/metrics/json. 
> Reason:
> <pre> GSSException: Failure unspecified at GSS-API level (Mechanism level: 
> Request is a replay (34))</pre></p>
> </body>
> </html>
> {code}
> Rootcausing : proxyserver on RM can't be supported for Kerberos enabled 
> cluster because AuthenticationFilter is applied twice in Hadoop code (once in 
> httpServer2 for RM, and another instance from AmFilterInitializer for proxy 
> server). This will require code changes to hadoop-yarn-server-web-proxy 
> project



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-8108) RM metrics rest API throws GSSException in kerberized environment

Reply via email to