[jira] [Resolved] (YARN-6990) AmIpFilter:findRedirectUrl use HAServiceProtocol to getServiceStatus
[ https://issues.apache.org/jira/browse/YARN-6990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yunjiong zhao resolved YARN-6990. - Resolution: Duplicate Assignee: yunjiong zhao > AmIpFilter:findRedirectUrl use HAServiceProtocol to getServiceStatus > > > Key: YARN-6990 > URL: https://issues.apache.org/jira/browse/YARN-6990 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: yunjiong zhao >Assignee: yunjiong zhao > > Due to we have multiple IP in ResourceManager, when try to access proxy URL > like https://*:50030/proxy/application_1502349494018_10877/, it will failed > due to it use HAServiceProtocol to find out which one is active RM. > {code} > 2017-08-10 10:51:42,344 WARN [971256592@qtp-666312528-0] > org.apache.hadoop.ipc.Client: Exception encountered while connecting to the > server : > org.apache.hadoop.security.AccessControlException: Client cannot authenticate > via:[KERBEROS] > at > org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcClient.java:172) > at > org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:396) > at > org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:563) > at > org.apache.hadoop.ipc.Client$Connection.access$1900(Client.java:378) > at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:732) > at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:728) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:727) > at > org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:378) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1492) > at org.apache.hadoop.ipc.Client.call(Client.java:1402) > at org.apache.hadoop.ipc.Client.call(Client.java:1363) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) > at com.sun.proxy.$Proxy84.getServiceStatus(Unknown Source) > at > org.apache.hadoop.ha.protocolPB.HAServiceProtocolClientSideTranslatorPB.getServiceStatus(HAServiceProtocolClientSideTranslatorPB.java:122) > at org.apache.hadoop.yarn.util.RMHAUtils.getHAState(RMHAUtils.java:68) > at > org.apache.hadoop.yarn.util.RMHAUtils.findActiveRMHAId(RMHAUtils.java:44) > at > org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.findRedirectUrl(AmIpFilter.java:174) > at > org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:138) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1243) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:326) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > at > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) > at > org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) > at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) > {code} > This only can happen on RM have multiple IPs, related code is inside > AmIpFilter.java doFilter function: > {code} > if (!getProxyAddresses().contains(httpReq.getRemoteAddr(
[jira] [Commented] (YARN-6990) AmIpFilter:findRedirectUrl use HAServiceProtocol to getServiceStatus
[ https://issues.apache.org/jira/browse/YARN-6990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16122538#comment-16122538 ] yunjiong zhao commented on YARN-6990: - I just find YARN-6625 fixed the issue. We use 2.7. > AmIpFilter:findRedirectUrl use HAServiceProtocol to getServiceStatus > > > Key: YARN-6990 > URL: https://issues.apache.org/jira/browse/YARN-6990 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: yunjiong zhao >Assignee: yunjiong zhao > > Due to we have multiple IP in ResourceManager, when try to access proxy URL > like https://*:50030/proxy/application_1502349494018_10877/, it will failed > due to it use HAServiceProtocol to find out which one is active RM. > {code} > 2017-08-10 10:51:42,344 WARN [971256592@qtp-666312528-0] > org.apache.hadoop.ipc.Client: Exception encountered while connecting to the > server : > org.apache.hadoop.security.AccessControlException: Client cannot authenticate > via:[KERBEROS] > at > org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcClient.java:172) > at > org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:396) > at > org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:563) > at > org.apache.hadoop.ipc.Client$Connection.access$1900(Client.java:378) > at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:732) > at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:728) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:727) > at > org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:378) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1492) > at org.apache.hadoop.ipc.Client.call(Client.java:1402) > at org.apache.hadoop.ipc.Client.call(Client.java:1363) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) > at com.sun.proxy.$Proxy84.getServiceStatus(Unknown Source) > at > org.apache.hadoop.ha.protocolPB.HAServiceProtocolClientSideTranslatorPB.getServiceStatus(HAServiceProtocolClientSideTranslatorPB.java:122) > at org.apache.hadoop.yarn.util.RMHAUtils.getHAState(RMHAUtils.java:68) > at > org.apache.hadoop.yarn.util.RMHAUtils.findActiveRMHAId(RMHAUtils.java:44) > at > org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.findRedirectUrl(AmIpFilter.java:174) > at > org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:138) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1243) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:326) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > at > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) > at > org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) > at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) > {code} > This only can happen on RM have multiple IPs, related code is inside > AmIpFilter.java doFilter function: > {code} > if (!
[jira] [Updated] (YARN-6990) AmIpFilter:findRedirectUrl use HAServiceProtocol to getServiceStatus
[ https://issues.apache.org/jira/browse/YARN-6990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yunjiong zhao updated YARN-6990: Affects Version/s: 2.7.0 > AmIpFilter:findRedirectUrl use HAServiceProtocol to getServiceStatus > > > Key: YARN-6990 > URL: https://issues.apache.org/jira/browse/YARN-6990 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.7.0 >Reporter: yunjiong zhao > > Due to we have multiple IP in ResourceManager, when try to access proxy URL > like https://*:50030/proxy/application_1502349494018_10877/, it will failed > due to it use HAServiceProtocol to find out which one is active RM. > {code} > 2017-08-10 10:51:42,344 WARN [971256592@qtp-666312528-0] > org.apache.hadoop.ipc.Client: Exception encountered while connecting to the > server : > org.apache.hadoop.security.AccessControlException: Client cannot authenticate > via:[KERBEROS] > at > org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcClient.java:172) > at > org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:396) > at > org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:563) > at > org.apache.hadoop.ipc.Client$Connection.access$1900(Client.java:378) > at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:732) > at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:728) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:727) > at > org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:378) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1492) > at org.apache.hadoop.ipc.Client.call(Client.java:1402) > at org.apache.hadoop.ipc.Client.call(Client.java:1363) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) > at com.sun.proxy.$Proxy84.getServiceStatus(Unknown Source) > at > org.apache.hadoop.ha.protocolPB.HAServiceProtocolClientSideTranslatorPB.getServiceStatus(HAServiceProtocolClientSideTranslatorPB.java:122) > at org.apache.hadoop.yarn.util.RMHAUtils.getHAState(RMHAUtils.java:68) > at > org.apache.hadoop.yarn.util.RMHAUtils.findActiveRMHAId(RMHAUtils.java:44) > at > org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.findRedirectUrl(AmIpFilter.java:174) > at > org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:138) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1243) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:326) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > at > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) > at > org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) > at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) > {code} > This only can happen on RM have multiple IPs, related code is inside > AmIpFilter.java doFilter function: > {code} > if (!getProxyAddresses().contains(httpReq.getRemoteAddr())) { > String redirectUrl = findRedirectUrl(); > Stri
[jira] [Created] (YARN-6990) AmIpFilter:findRedirectUrl use HAServiceProtocol to getServiceStatus
yunjiong zhao created YARN-6990: --- Summary: AmIpFilter:findRedirectUrl use HAServiceProtocol to getServiceStatus Key: YARN-6990 URL: https://issues.apache.org/jira/browse/YARN-6990 Project: Hadoop YARN Issue Type: Bug Reporter: yunjiong zhao Due to we have multiple IP in ResourceManager, when try to access proxy URL like https://*:50030/proxy/application_1502349494018_10877/, it will failed due to it use HAServiceProtocol to find out which one is active RM. {code} 2017-08-10 10:51:42,344 WARN [971256592@qtp-666312528-0] org.apache.hadoop.ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[KERBEROS] at org.apache.hadoop.security.SaslRpcClient.selectSaslClient(SaslRpcClient.java:172) at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:396) at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:563) at org.apache.hadoop.ipc.Client$Connection.access$1900(Client.java:378) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:732) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:728) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:727) at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:378) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1492) at org.apache.hadoop.ipc.Client.call(Client.java:1402) at org.apache.hadoop.ipc.Client.call(Client.java:1363) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) at com.sun.proxy.$Proxy84.getServiceStatus(Unknown Source) at org.apache.hadoop.ha.protocolPB.HAServiceProtocolClientSideTranslatorPB.getServiceStatus(HAServiceProtocolClientSideTranslatorPB.java:122) at org.apache.hadoop.yarn.util.RMHAUtils.getHAState(RMHAUtils.java:68) at org.apache.hadoop.yarn.util.RMHAUtils.findActiveRMHAId(RMHAUtils.java:44) at org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.findRedirectUrl(AmIpFilter.java:174) at org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:138) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1243) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) {code} This only can happen on RM have multiple IPs, related code is inside AmIpFilter.java doFilter function: {code} if (!getProxyAddresses().contains(httpReq.getRemoteAddr())) { String redirectUrl = findRedirectUrl(); String target = redirectUrl + httpReq.getRequestURI(); ProxyUtils.sendRedirect(httpReq, httpResp, target); return; } {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6339) Improve performance for createAndGetApplicationReport
[ https://issues.apache.org/jira/browse/YARN-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15944399#comment-15944399 ] yunjiong zhao commented on YARN-6339: - Thanks [~wangda] & [~xgong] for your time. > Improve performance for createAndGetApplicationReport > - > > Key: YARN-6339 > URL: https://issues.apache.org/jira/browse/YARN-6339 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: yunjiong zhao >Assignee: yunjiong zhao > Fix For: 2.8.1, 3.0.0-alpha3 > > Attachments: YARN-6339.001.patch, YARN-6339.002.patch, > YARN-6339.003.patch > > > There are two performance issue when calling createAndGetApplicationReport: > One is inside ProtoUtils.convertFromProtoFormat, replace is too slow for > clusters which have more than 3000 nodes. Use substring is much better: > https://issues.apache.org/jira/browse/YARN-6285?focusedCommentId=15923241&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15923241 > Another one is inside getLogAggregationReportsForApp, if some application's > LogAggregationStatus is TIME_OUT, every time it was called it will create an > HashMap which will produce lots of garbage. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6285) Add option to set max limit on ResourceManager for ApplicationClientProtocol.getApplications
[ https://issues.apache.org/jira/browse/YARN-6285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15937278#comment-15937278 ] yunjiong zhao commented on YARN-6285: - Yes, We'll test both of them. Thanks. > Add option to set max limit on ResourceManager for > ApplicationClientProtocol.getApplications > > > Key: YARN-6285 > URL: https://issues.apache.org/jira/browse/YARN-6285 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: yunjiong zhao >Assignee: yunjiong zhao > Attachments: YARN-6285.001.patch, YARN-6285.002.patch, > YARN-6285.003.patch > > > When users called ApplicationClientProtocol.getApplications, it will return > lots of data, and generate lots of garbage on ResourceManager which caused > long time GC. > For example, on one of our RM, when called rest API " http:// address:port>/ws/v1/cluster/apps" it can return 150MB data which have 944 > applications. > getApplications have limit parameter, but some user might not set it, and > then the limit will be Long.MAX_VALUE. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6339) Improve performance for createAndGetApplicationReport
[ https://issues.apache.org/jira/browse/YARN-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yunjiong zhao updated YARN-6339: Attachment: YARN-6339.003.patch [~wangda], Good suggestion. Update patch set logAggregationStatusForAppReport to volatile, no need changes in createAndGetApplicationReport() any more since it's safe update logAggregationStatusForAppReport inside getLogAggregationStatusForAppReport(). Thanks for your time to review the patch. > Improve performance for createAndGetApplicationReport > - > > Key: YARN-6339 > URL: https://issues.apache.org/jira/browse/YARN-6339 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: yunjiong zhao >Assignee: yunjiong zhao > Attachments: YARN-6339.001.patch, YARN-6339.002.patch, > YARN-6339.003.patch > > > There are two performance issue when calling createAndGetApplicationReport: > One is inside ProtoUtils.convertFromProtoFormat, replace is too slow for > clusters which have more than 3000 nodes. Use substring is much better: > https://issues.apache.org/jira/browse/YARN-6285?focusedCommentId=15923241&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15923241 > Another one is inside getLogAggregationReportsForApp, if some application's > LogAggregationStatus is TIME_OUT, every time it was called it will create an > HashMap which will produce lots of garbage. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6285) Add option to set max limit on ResourceManager for ApplicationClientProtocol.getApplications
[ https://issues.apache.org/jira/browse/YARN-6285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933790#comment-15933790 ] yunjiong zhao commented on YARN-6285: - YARN-6339 is not applied to our cluster yet. When I create YARN-6285, what I want is a simple patch which allow us to control the GC ASAP. With YARN-6339, I believe we can set yarn.resourcemanager.max-limit-get-applications with a bigger value or not need set a limit any more. Will let you know after YARN-6339 pasted review and applied in our cluster. > Add option to set max limit on ResourceManager for > ApplicationClientProtocol.getApplications > > > Key: YARN-6285 > URL: https://issues.apache.org/jira/browse/YARN-6285 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: yunjiong zhao >Assignee: yunjiong zhao > Attachments: YARN-6285.001.patch, YARN-6285.002.patch, > YARN-6285.003.patch > > > When users called ApplicationClientProtocol.getApplications, it will return > lots of data, and generate lots of garbage on ResourceManager which caused > long time GC. > For example, on one of our RM, when called rest API " http:// address:port>/ws/v1/cluster/apps" it can return 150MB data which have 944 > applications. > getApplications have limit parameter, but some user might not set it, and > then the limit will be Long.MAX_VALUE. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6339) Improve performance for createAndGetApplicationReport
[ https://issues.apache.org/jira/browse/YARN-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933782#comment-15933782 ] yunjiong zhao commented on YARN-6339: - {quote}Why changes of createAndGetApplicationReport required? {quote} The purpose is to avoid calling getLogAggregationStatus() unnecessary inside getLogAggregationReportsForApp() after application's LogAggregationStatus changed to TIME_OUT. I think we should add LogAggregationStatus.TIME_OUT in isLogAggregationFinished() like LogAggregationStatus.SUCCEEDED and LogAggregationStatus.FAILED. If ignore future risks, we can even change logAggregationStatusForAppReport inside getLogAggregationStatusForAppReport() with hold readLock only. To avoid confusing, due to createAndGetApplicationReport() will call getLogAggregationStatusForAppReport() with hold readLock, I think update logAggregationStatusForAppReport inside createAndGetApplicationReport() with writeLock hold is right thing to do. {code} } else if (logTimeOutCount > 0) { + logAggregationStatusForAppReport = LogAggregationStatus.TIME_OUT; return LogAggregationStatus.TIME_OUT; } {code} > Improve performance for createAndGetApplicationReport > - > > Key: YARN-6339 > URL: https://issues.apache.org/jira/browse/YARN-6339 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: yunjiong zhao >Assignee: yunjiong zhao > Attachments: YARN-6339.001.patch, YARN-6339.002.patch > > > There are two performance issue when calling createAndGetApplicationReport: > One is inside ProtoUtils.convertFromProtoFormat, replace is too slow for > clusters which have more than 3000 nodes. Use substring is much better: > https://issues.apache.org/jira/browse/YARN-6285?focusedCommentId=15923241&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15923241 > Another one is inside getLogAggregationReportsForApp, if some application's > LogAggregationStatus is TIME_OUT, every time it was called it will create an > HashMap which will produce lots of garbage. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6285) Add option to set max limit on ResourceManager for ApplicationClientProtocol.getApplications
[ https://issues.apache.org/jira/browse/YARN-6285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15933508#comment-15933508 ] yunjiong zhao commented on YARN-6285: - [~wangda], appreciate if you have time double check LogAggregationReportPBImpl.getLogAggregationStatus() and take a look at YARN-6339. > Add option to set max limit on ResourceManager for > ApplicationClientProtocol.getApplications > > > Key: YARN-6285 > URL: https://issues.apache.org/jira/browse/YARN-6285 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: yunjiong zhao >Assignee: yunjiong zhao > Attachments: YARN-6285.001.patch, YARN-6285.002.patch, > YARN-6285.003.patch > > > When users called ApplicationClientProtocol.getApplications, it will return > lots of data, and generate lots of garbage on ResourceManager which caused > long time GC. > For example, on one of our RM, when called rest API " http:// address:port>/ws/v1/cluster/apps" it can return 150MB data which have 944 > applications. > getApplications have limit parameter, but some user might not set it, and > then the limit will be Long.MAX_VALUE. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6339) Improve performance for createAndGetApplicationReport
[ https://issues.apache.org/jira/browse/YARN-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yunjiong zhao updated YARN-6339: Attachment: YARN-6339.002.patch Update patch for more improvement. Change RMAppImpl.logAggregationStatus from HashMap to ConcurrentHashMap so even hold a readlock, we can safely update logAggregationStatus. Then return Collections.unmodifiableMap to avoid create too many HashMap. > Improve performance for createAndGetApplicationReport > - > > Key: YARN-6339 > URL: https://issues.apache.org/jira/browse/YARN-6339 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: yunjiong zhao >Assignee: yunjiong zhao > Attachments: YARN-6339.001.patch, YARN-6339.002.patch > > > There are two performance issue when calling createAndGetApplicationReport: > One is inside ProtoUtils.convertFromProtoFormat, replace is too slow for > clusters which have more than 3000 nodes. Use substring is much better: > https://issues.apache.org/jira/browse/YARN-6285?focusedCommentId=15923241&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15923241 > Another one is inside getLogAggregationReportsForApp, if some application's > LogAggregationStatus is TIME_OUT, every time it was called it will create an > HashMap which will produce lots of garbage. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6285) Add option to set max limit on ResourceManager for ApplicationClientProtocol.getApplications
[ https://issues.apache.org/jira/browse/YARN-6285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15925292#comment-15925292 ] yunjiong zhao commented on YARN-6285: - On one of our cluster GetApplicationsAvgTime looks very bad: {quote} "GetApplicationsNumOps" : 243, "GetApplicationsAvgTime" : 3868.0, {quote} On the cluster we applied this patch and set yarn.resourcemanager.max-limit-get-applications to 400 {quote} "GetApplicationsNumOps" : 3370, "GetApplicationsAvgTime" : 549.0, {quote} > Add option to set max limit on ResourceManager for > ApplicationClientProtocol.getApplications > > > Key: YARN-6285 > URL: https://issues.apache.org/jira/browse/YARN-6285 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: yunjiong zhao >Assignee: yunjiong zhao > Attachments: YARN-6285.001.patch, YARN-6285.002.patch, > YARN-6285.003.patch > > > When users called ApplicationClientProtocol.getApplications, it will return > lots of data, and generate lots of garbage on ResourceManager which caused > long time GC. > For example, on one of our RM, when called rest API " http:// address:port>/ws/v1/cluster/apps" it can return 150MB data which have 944 > applications. > getApplications have limit parameter, but some user might not set it, and > then the limit will be Long.MAX_VALUE. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6285) Add option to set max limit on ResourceManager for ApplicationClientProtocol.getApplications
[ https://issues.apache.org/jira/browse/YARN-6285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15925255#comment-15925255 ] yunjiong zhao commented on YARN-6285: - I created another issue https://issues.apache.org/jira/browse/YARN-6339 for improve performance. > Add option to set max limit on ResourceManager for > ApplicationClientProtocol.getApplications > > > Key: YARN-6285 > URL: https://issues.apache.org/jira/browse/YARN-6285 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: yunjiong zhao >Assignee: yunjiong zhao > Attachments: YARN-6285.001.patch, YARN-6285.002.patch, > YARN-6285.003.patch > > > When users called ApplicationClientProtocol.getApplications, it will return > lots of data, and generate lots of garbage on ResourceManager which caused > long time GC. > For example, on one of our RM, when called rest API " http:// address:port>/ws/v1/cluster/apps" it can return 150MB data which have 944 > applications. > getApplications have limit parameter, but some user might not set it, and > then the limit will be Long.MAX_VALUE. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6339) Improve performance for createAndGetApplicationReport
[ https://issues.apache.org/jira/browse/YARN-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yunjiong zhao updated YARN-6339: Attachment: YARN-6339.001.patch This patch have 3 improvements: 1. Use substring instead of replace 2. Update logAggregationStatusForAppReport to reduce time spend in getLogAggregationStatusForAppReport. 3. Inside getLogAggregationReportsForApp, move somecondition checks from for loop to outside, so for some applications, it won't run that for loop. > Improve performance for createAndGetApplicationReport > - > > Key: YARN-6339 > URL: https://issues.apache.org/jira/browse/YARN-6339 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: yunjiong zhao >Assignee: yunjiong zhao > Attachments: YARN-6339.001.patch > > > There are two performance issue when calling createAndGetApplicationReport: > One is inside ProtoUtils.convertFromProtoFormat, replace is too slow for > clusters which have more than 3000 nodes. Use substring is much better: > https://issues.apache.org/jira/browse/YARN-6285?focusedCommentId=15923241&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15923241 > Another one is inside getLogAggregationReportsForApp, if some application's > LogAggregationStatus is TIME_OUT, every time it was called it will create an > HashMap which will produce lots of garbage. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6339) Improve performance for createAndGetApplicationReport
yunjiong zhao created YARN-6339: --- Summary: Improve performance for createAndGetApplicationReport Key: YARN-6339 URL: https://issues.apache.org/jira/browse/YARN-6339 Project: Hadoop YARN Issue Type: Improvement Reporter: yunjiong zhao Assignee: yunjiong zhao There are two performance issue when calling createAndGetApplicationReport: One is inside ProtoUtils.convertFromProtoFormat, replace is too slow for clusters which have more than 3000 nodes. Use substring is much better: https://issues.apache.org/jira/browse/YARN-6285?focusedCommentId=15923241&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15923241 Another one is inside getLogAggregationReportsForApp, if some application's LogAggregationStatus is TIME_OUT, every time it was called it will create an HashMap which will produce lots of garbage. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6285) Add option to set max limit on ResourceManager for ApplicationClientProtocol.getApplications
[ https://issues.apache.org/jira/browse/YARN-6285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15923364#comment-15923364 ] yunjiong zhao commented on YARN-6285: - {quote} convertFromProtoFormat is called once for every app {quote} This is not true. There are multiple place will hit convertFromProtoFormat. For example: Inside RMAppImpl.getLogAggregationStatusForAppReport(): {code} for (Entry report : reports.entrySet()) { switch (report.getValue().getLogAggregationStatus()) { // will call convertFromProtoFormat {code} Inside RMAppImpl.getLogAggregationReportsForApp {code} for (Entry output : outputs.entrySet()) { if (!output.getValue().getLogAggregationStatus() {code} And our cluster which have more than 3000 nodes and running applications some times more than 500, from above two places getApplications may call convertFromProtoFormat 3,000,000 times. I'm not saying it will completely solve the problem. But definitely can approve the situation. > Add option to set max limit on ResourceManager for > ApplicationClientProtocol.getApplications > > > Key: YARN-6285 > URL: https://issues.apache.org/jira/browse/YARN-6285 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: yunjiong zhao >Assignee: yunjiong zhao > Attachments: YARN-6285.001.patch, YARN-6285.002.patch, > YARN-6285.003.patch > > > When users called ApplicationClientProtocol.getApplications, it will return > lots of data, and generate lots of garbage on ResourceManager which caused > long time GC. > For example, on one of our RM, when called rest API " http:// address:port>/ws/v1/cluster/apps" it can return 150MB data which have 944 > applications. > getApplications have limit parameter, but some user might not set it, and > then the limit will be Long.MAX_VALUE. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6285) Add option to set max limit on ResourceManager for ApplicationClientProtocol.getApplications
[ https://issues.apache.org/jira/browse/YARN-6285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15923241#comment-15923241 ] yunjiong zhao commented on YARN-6285: - I used below code for testing replace and substring, replace took 28107ms, and substring took 563ms. I below above change definitely can improve the performance. {code} private static void testReplace() { long s = System.currentTimeMillis(); for (int i = 0; i < 1; i++) { "LOG_disable".replace("LOG_", ""); } System.out.println(System.currentTimeMillis() - s); } private static void testSubstring() { long s = System.currentTimeMillis(); for (int i = 0; i < 1; i++) { "LOG_disable".substring(4); } System.out.println(System.currentTimeMillis() - s); } {code} > Add option to set max limit on ResourceManager for > ApplicationClientProtocol.getApplications > > > Key: YARN-6285 > URL: https://issues.apache.org/jira/browse/YARN-6285 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: yunjiong zhao >Assignee: yunjiong zhao > Attachments: YARN-6285.001.patch, YARN-6285.002.patch, > YARN-6285.003.patch > > > When users called ApplicationClientProtocol.getApplications, it will return > lots of data, and generate lots of garbage on ResourceManager which caused > long time GC. > For example, on one of our RM, when called rest API " http:// address:port>/ws/v1/cluster/apps" it can return 150MB data which have 944 > applications. > getApplications have limit parameter, but some user might not set it, and > then the limit will be Long.MAX_VALUE. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6285) Add option to set max limit on ResourceManager for ApplicationClientProtocol.getApplications
[ https://issues.apache.org/jira/browse/YARN-6285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15923208#comment-15923208 ] yunjiong zhao commented on YARN-6285: - Sorry for late response. 2.25 seconds in getApplications doesn't include ResourceRequest. Most of the time was spend on getLogAggregationReportsForApp as stack trace shows. I believe below code change should improve the performance (will test later) {code} diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ProtoUtils.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ProtoUtils.java index ab283e7..926c757 100644 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ProtoUtils.java +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/api/records/impl/pb/ProtoUtils.java @@ -296,6 +296,8 @@ public static ReservationRequestInterpreter convertFromProtoFormat( * Log Aggregation Status */ private static final String LOG_AGGREGATION_STATUS_PREFIX = "LOG_"; + private static final int LOG_AGGREGATION_STATUS_PREFIX_LEN = + LOG_AGGREGATION_STATUS_PREFIX.length(); public static LogAggregationStatusProto convertToProtoFormat( LogAggregationStatus e) { return LogAggregationStatusProto.valueOf(LOG_AGGREGATION_STATUS_PREFIX @@ -304,8 +306,8 @@ public static LogAggregationStatusProto convertToProtoFormat( public static LogAggregationStatus convertFromProtoFormat( LogAggregationStatusProto e) { -return LogAggregationStatus.valueOf(e.name().replace( - LOG_AGGREGATION_STATUS_PREFIX, "")); +return LogAggregationStatus.valueOf(e.name().substring( +LOG_AGGREGATION_STATUS_PREFIX_LEN)); } /* diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java index 9f00b2e..1db66a5 100644 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java @@ -1700,17 +1700,16 @@ public ResourceRequest getAMResourceRequest() { Map outputs = new HashMap(); outputs.putAll(logAggregationStatus); - if (!isLogAggregationFinished()) { + if (!isLogAggregationFinished() && isAppInFinalState(this) && + System.currentTimeMillis() > this.logAggregationStartTime + + this.logAggregationStatusTimeout) { for (Entry output : outputs.entrySet()) { if (!output.getValue().getLogAggregationStatus() .equals(LogAggregationStatus.TIME_OUT) && !output.getValue().getLogAggregationStatus() .equals(LogAggregationStatus.SUCCEEDED) && !output.getValue().getLogAggregationStatus() -.equals(LogAggregationStatus.FAILED) - && isAppInFinalState(this) - && System.currentTimeMillis() > this.logAggregationStartTime - + this.logAggregationStatusTimeout) { +.equals(LogAggregationStatus.FAILED)) { output.getValue().setLogAggregationStatus( LogAggregationStatus.TIME_OUT); } {code} Should I open a new issue for those changes? {quote} 1) Add parameter to indicate if we should include ResourceRequest/getLogAggregationReportsForApp in the response, default is true to make it compatible. (Can be done if above experimental shows it really helps). {quote} This will help if user use those parameters. > Add option to set max limit on ResourceManager for > ApplicationClientProtocol.getApplications > > > Key: YARN-6285 > URL: https://issues.apache.org/jira/browse/YARN-6285 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: yunjiong zhao >Assignee: yunjiong zhao > Attachments: YARN-6285.001.patch, YARN-6285.002.patch, > YARN-6285.003.patch > > > When users called ApplicationClientProtocol.getApplications, it will return > lots of data, and generate lots of garbage on ResourceManager which caused > long time GC. > For example, on one of our RM, when called rest API " http:// address:port>/ws/v1/cluster/apps" it can return 150MB data which hav
[jira] [Commented] (YARN-6285) Add option to set max limit on ResourceManager for ApplicationClientProtocol.getApplications
[ https://issues.apache.org/jira/browse/YARN-6285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15897949#comment-15897949 ] yunjiong zhao commented on YARN-6285: - [~sunilg], Totally agree. This patch is for short time purpose to give cluster admin a choice to prevent RM spend to much time on GC. After we deployed this patch and set a limit to 50, in the last two days, our cluster's GC was doing good. The top 10 worst case are: {quote} 0.4960477 0.4992665 0.5180593 0.5804366 0.5876860 0.5885162 0.5900650 0.6041406 0.6474685 0.8865442 {quote} And total time spend on GC is around 1.5%, compared to before it's much better. > Add option to set max limit on ResourceManager for > ApplicationClientProtocol.getApplications > > > Key: YARN-6285 > URL: https://issues.apache.org/jira/browse/YARN-6285 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: yunjiong zhao >Assignee: yunjiong zhao > Attachments: YARN-6285.001.patch, YARN-6285.002.patch, > YARN-6285.003.patch > > > When users called ApplicationClientProtocol.getApplications, it will return > lots of data, and generate lots of garbage on ResourceManager which caused > long time GC. > For example, on one of our RM, when called rest API " http:// address:port>/ws/v1/cluster/apps" it can return 150MB data which have 944 > applications. > getApplications have limit parameter, but some user might not set it, and > then the limit will be Long.MAX_VALUE. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6280) Add a query parameter in ResourceManager Cluster Applications REST API to control whether or not returns ResourceRequest
[ https://issues.apache.org/jira/browse/YARN-6280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15895840#comment-15895840 ] yunjiong zhao commented on YARN-6280: - How about change the default behavior to hide ResouceRequest? > Add a query parameter in ResourceManager Cluster Applications REST API to > control whether or not returns ResourceRequest > > > Key: YARN-6280 > URL: https://issues.apache.org/jira/browse/YARN-6280 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager, restapi >Affects Versions: 2.7.3 >Reporter: Lantao Jin > Attachments: YARN-6280.001.patch, YARN-6280.002.patch > > > Begin from v2.7, the ResourceManager Cluster Applications REST API returns > ResourceRequest list. It's a very large construction in AppInfo. > As a test, we use below URI to query only 2 results: > http:// address:port>/ws/v1/cluster/apps?states=running,accepted&limit=2 > The results are very different: > ||Hadoop version|Total Character|Total Word|Total Lines|Size|| > |2.4.1|1192| 42| 42| 1.2 KB| > |2.7.1|1222179| 48740| 48735| 1.21 MB| > Most RESTful API requesters don't know about this after upgraded and their > old queries may cause ResourceManager more GC consuming and slower. Even if > they know this but have no idea to reduce the impact of ResourceManager > except slow down their query frequency. > The patch adding a query parameter "showResourceRequests" to help requesters > who don't need this information to reduce the overhead. In consideration of > compatibility of interface, the default value is true if they don't set the > parameter, so the behaviour is the same as now. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6285) Add option to set max limit on ResourceManager for ApplicationClientProtocol.getApplications
[ https://issues.apache.org/jira/browse/YARN-6285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15895837#comment-15895837 ] yunjiong zhao commented on YARN-6285: - I checked the returned result by rest API, the major size was ResourceRequest. If in https://issues.apache.org/jira/browse/YARN-6280, change the default behavior to not show ResourceRequest will help. However it's not enough, I have 2 reasons: 1. Slowness in getApplications, below stack trace files shows it spend at least 2.25 seconds in getApplications. {code} grep -A20 " #7876 daemon " 829 "363440407@qtp-1966670937-117" #7876 daemon prio=5 os_prio=0 tid=0x7f12093a2800 nid=0x1c46 runnable [0x7f05344b8000] java.lang.Thread.State: RUNNABLE at java.util.regex.Matcher.search(Matcher.java:1248) at java.util.regex.Matcher.find(Matcher.java:637) at java.util.regex.Matcher.replaceAll(Matcher.java:951) at java.lang.String.replace(String.java:2240) at org.apache.hadoop.yarn.api.records.impl.pb.ProtoUtils.convertFromProtoFormat(ProtoUtils.java:270) at org.apache.hadoop.yarn.server.api.protocolrecords.impl.pb.LogAggregationReportPBImpl.convertFromProtoFormat(LogAggregationReportPBImpl.java:158) at org.apache.hadoop.yarn.server.api.protocolrecords.impl.pb.LogAggregationReportPBImpl.getLogAggregationStatus(LogAggregationReportPBImpl.java:142) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.getLogAggregationStatusForAppReport(RMAppImpl.java:1559) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.createAndGetApplicationReport(RMAppImpl.java:631) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:814) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:681) at org.apache.hadoop.yarn.server.webapp.AppsBlock$1.run(AppsBlock.java:89) at org.apache.hadoop.yarn.server.webapp.AppsBlock$1.run(AppsBlock.java:86) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) at org.apache.hadoop.yarn.server.webapp.AppsBlock.fetchData(AppsBlock.java:84) at org.apache.hadoop.yarn.server.webapp.AppsBlock.render(AppsBlock.java:101) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) {code} {code} grep -A20 " #7876 daemon " 838 "363440407@qtp-1966670937-117" #7876 daemon prio=5 os_prio=0 tid=0x7f12093a2800 nid=0x1c46 runnable [0x7f05344b8000] java.lang.Thread.State: RUNNABLE at java.util.HashMap.hash(HashMap.java:338) at java.util.HashMap.putMapEntries(HashMap.java:514) at java.util.HashMap.putAll(HashMap.java:784) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.getLogAggregationReportsForApp(RMAppImpl.java:1466) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.getLogAggregationStatusForAppReport(RMAppImpl.java:1549) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.createAndGetApplicationReport(RMAppImpl.java:631) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:814) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplications(ClientRMService.java:681) at org.apache.hadoop.yarn.server.webapp.AppsBlock$1.run(AppsBlock.java:89) at org.apache.hadoop.yarn.server.webapp.AppsBlock$1.run(AppsBlock.java:86) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) at org.apache.hadoop.yarn.server.webapp.AppsBlock.fetchData(AppsBlock.java:84) at org.apache.hadoop.yarn.server.webapp.AppsBlock.render(AppsBlock.java:101) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:69) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:79) at org.apache.hadoop.yarn.webapp.View.render(View.java:235) at org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:43) {code} {code} grep -A20 " #7876 daemon " 839 "363440407@qtp-1966670937-117" #7876 daemon prio=5 os_prio=0 tid=0x7f12093a2800 nid=0x1c46 runnable [0x7f05344b8000] java.lang.Thread.State: RUNNABLE at java.util.AbstractCollection.addAll(AbstractCollection.java:343) at java.util.LinkedHashSet.(LinkedHashSet.java:169) at org.apache.hadoop.security.UserGroupInformation.getGroupNames(UserGroupInformation.java:1569) - locked <0x7f06bb34c0b8> (a o
[jira] [Updated] (YARN-6285) Add option to set max limit on ResourceManager for ApplicationClientProtocol.getApplications
[ https://issues.apache.org/jira/browse/YARN-6285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yunjiong zhao updated YARN-6285: Attachment: YARN-6285.003.patch Update patch according to [~benoyantony] comments. Set default value to Long.MAX_VALUE, so by default, it changes nothing. Thanks [~benoyantony] for your time. > Add option to set max limit on ResourceManager for > ApplicationClientProtocol.getApplications > > > Key: YARN-6285 > URL: https://issues.apache.org/jira/browse/YARN-6285 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: yunjiong zhao >Assignee: yunjiong zhao > Attachments: YARN-6285.001.patch, YARN-6285.002.patch, > YARN-6285.003.patch > > > When users called ApplicationClientProtocol.getApplications, it will return > lots of data, and generate lots of garbage on ResourceManager which caused > long time GC. > For example, on one of our RM, when called rest API " http:// address:port>/ws/v1/cluster/apps" it can return 150MB data which have 944 > applications. > getApplications have limit parameter, but some user might not set it, and > then the limit will be Long.MAX_VALUE. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-6285) Add option to set max limit on ResourceManager for ApplicationClientProtocol.getApplications
[ https://issues.apache.org/jira/browse/YARN-6285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15895190#comment-15895190 ] yunjiong zhao edited comment on YARN-6285 at 3/3/17 10:54 PM: -- Fix checkstyle. Failed unit test in TestRMRestart is not related. was (Author: zhaoyunjiong): Fix checkstyle. Failure unit test is not related. > Add option to set max limit on ResourceManager for > ApplicationClientProtocol.getApplications > > > Key: YARN-6285 > URL: https://issues.apache.org/jira/browse/YARN-6285 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: yunjiong zhao >Assignee: yunjiong zhao > Attachments: YARN-6285.001.patch, YARN-6285.002.patch > > > When users called ApplicationClientProtocol.getApplications, it will return > lots of data, and generate lots of garbage on ResourceManager which caused > long time GC. > For example, on one of our RM, when called rest API " http:// address:port>/ws/v1/cluster/apps" it can return 150MB data which have 944 > applications. > getApplications have limit parameter, but some user might not set it, and > then the limit will be Long.MAX_VALUE. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6285) Add option to set max limit on ResourceManager for ApplicationClientProtocol.getApplications
[ https://issues.apache.org/jira/browse/YARN-6285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yunjiong zhao updated YARN-6285: Attachment: YARN-6285.002.patch Fix checkstyle. Failure unit test is not related. > Add option to set max limit on ResourceManager for > ApplicationClientProtocol.getApplications > > > Key: YARN-6285 > URL: https://issues.apache.org/jira/browse/YARN-6285 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: yunjiong zhao >Assignee: yunjiong zhao > Attachments: YARN-6285.001.patch, YARN-6285.002.patch > > > When users called ApplicationClientProtocol.getApplications, it will return > lots of data, and generate lots of garbage on ResourceManager which caused > long time GC. > For example, on one of our RM, when called rest API " http:// address:port>/ws/v1/cluster/apps" it can return 150MB data which have 944 > applications. > getApplications have limit parameter, but some user might not set it, and > then the limit will be Long.MAX_VALUE. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (YARN-6285) Add option to set max limit on ResourceManager for ApplicationClientProtocol.getApplications
[ https://issues.apache.org/jira/browse/YARN-6285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15894960#comment-15894960 ] yunjiong zhao edited comment on YARN-6285 at 3/3/17 9:22 PM: - This patch allow set a max limit on RM for ApplicationClientProtocol.getApplications. Also in the log, it will tell cluster admin which user called the getApplications with bigger limit than the max limit like below {quote} INFO [main] resourcemanager.ClientRMService (ClientRMService.java:getApplications(878)) - User yunjzhao called getApplications with limit=9223372036854775807 {quote} was (Author: zhaoyunjiong): This patch allowed set a max limit on RM for ApplicationClientProtocol.getApplications. Also in the log, it will tell cluster admin which user called the getApplications with bigger limit than the max limit like below {quote} INFO [main] resourcemanager.ClientRMService (ClientRMService.java:getApplications(878)) - User yunjzhao called getApplications with limit=9223372036854775807 {quote} > Add option to set max limit on ResourceManager for > ApplicationClientProtocol.getApplications > > > Key: YARN-6285 > URL: https://issues.apache.org/jira/browse/YARN-6285 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: yunjiong zhao >Assignee: yunjiong zhao > Attachments: YARN-6285.001.patch > > > When users called ApplicationClientProtocol.getApplications, it will return > lots of data, and generate lots of garbage on ResourceManager which caused > long time GC. > For example, on one of our RM, when called rest API " http:// address:port>/ws/v1/cluster/apps" it can return 150MB data which have 944 > applications. > getApplications have limit parameter, but some user might not set it, and > then the limit will be Long.MAX_VALUE. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Updated] (YARN-6285) Add option to set max limit on ResourceManager for ApplicationClientProtocol.getApplications
[ https://issues.apache.org/jira/browse/YARN-6285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yunjiong zhao updated YARN-6285: Attachment: YARN-6285.001.patch This patch allowed set a max limit on RM for ApplicationClientProtocol.getApplications. Also in the log, it will tell cluster admin which user called the getApplications with bigger limit than the max limit like below {quote} INFO [main] resourcemanager.ClientRMService (ClientRMService.java:getApplications(878)) - User yunjzhao called getApplications with limit=9223372036854775807 {quote} > Add option to set max limit on ResourceManager for > ApplicationClientProtocol.getApplications > > > Key: YARN-6285 > URL: https://issues.apache.org/jira/browse/YARN-6285 > Project: Hadoop YARN > Issue Type: Improvement >Reporter: yunjiong zhao >Assignee: yunjiong zhao > Attachments: YARN-6285.001.patch > > > When users called ApplicationClientProtocol.getApplications, it will return > lots of data, and generate lots of garbage on ResourceManager which caused > long time GC. > For example, on one of our RM, when called rest API " http:// address:port>/ws/v1/cluster/apps" it can return 150MB data which have 944 > applications. > getApplications have limit parameter, but some user might not set it, and > then the limit will be Long.MAX_VALUE. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Created] (YARN-6285) Add option to set max limit on ResourceManager for ApplicationClientProtocol.getApplications
yunjiong zhao created YARN-6285: --- Summary: Add option to set max limit on ResourceManager for ApplicationClientProtocol.getApplications Key: YARN-6285 URL: https://issues.apache.org/jira/browse/YARN-6285 Project: Hadoop YARN Issue Type: Improvement Reporter: yunjiong zhao Assignee: yunjiong zhao When users called ApplicationClientProtocol.getApplications, it will return lots of data, and generate lots of garbage on ResourceManager which caused long time GC. For example, on one of our RM, when called rest API " http:///ws/v1/cluster/apps" it can return 150MB data which have 944 applications. getApplications have limit parameter, but some user might not set it, and then the limit will be Long.MAX_VALUE. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org
[jira] [Commented] (YARN-6254) Provide a mechanism to whitelist the RM REST API clients
[ https://issues.apache.org/jira/browse/YARN-6254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15889706#comment-15889706 ] yunjiong zhao commented on YARN-6254: - Reduce yarn.resourcemanager.max-completed-applications from default value 1 to a small value like 500 should solve the problem. > Provide a mechanism to whitelist the RM REST API clients > > > Key: YARN-6254 > URL: https://issues.apache.org/jira/browse/YARN-6254 > Project: Hadoop YARN > Issue Type: New Feature > Components: resourcemanager >Affects Versions: 2.7.1 >Reporter: Aroop Maliakkal > > Currently RM REST APIs are open to everyone. Can we provide a whitelist > feature so that we can control what frequency and what hosts can hit the RM > REST APIs ? > Thanks, > /Aroop -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org