[jira] [Created] (YARN-4013) Publisher V2 should write the unmanaged AM flag too
Zhijie Shen created YARN-4013: - Summary: Publisher V2 should write the unmanaged AM flag too Key: YARN-4013 URL: https://issues.apache.org/jira/browse/YARN-4013 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Upon rebase the branch, I find we need to redo the similar work for V2 publisher: https://issues.apache.org/jira/browse/YARN-3543 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3992) TestApplicationPriority.testApplicationPriorityAllocation fails intermittently
Zhijie Shen created YARN-3992: - Summary: TestApplicationPriority.testApplicationPriorityAllocation fails intermittently Key: YARN-3992 URL: https://issues.apache.org/jira/browse/YARN-3992 Project: Hadoop YARN Issue Type: Test Reporter: Zhijie Shen {code} java.lang.AssertionError: expected:7 but was:5 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority.testApplicationPriorityAllocation(TestApplicationPriority.java:182) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3993) Change to use the AM flag in ContainerContext determine AM container
Zhijie Shen created YARN-3993: - Summary: Change to use the AM flag in ContainerContext determine AM container Key: YARN-3993 URL: https://issues.apache.org/jira/browse/YARN-3993 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen After YARN-3116, we will have a flag in ContainerContext to determine if the container is AM or not in aux service. We need to change accordingly to make use of this feature instead of depending on container ID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3984) Rethink event column key issue
Zhijie Shen created YARN-3984: - Summary: Rethink event column key issue Key: YARN-3984 URL: https://issues.apache.org/jira/browse/YARN-3984 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Fix For: YARN-2928 Currently, the event column key is event_id?info_key?timestamp, which is not so friendly to fetching all the events of an entity and sorting them in a chronologic order. IMHO, timestamp?event_id?info_key may be a better key schema. I open this jira to continue the discussion about it which was commented on YARN-3908. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3908) Bugs in HBaseTimelineWriterImpl
[ https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-3908. --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: YARN-2928 Committed the patch to branch YARN-2928. Thanks for the patch, Vrushali and Sangjin, as well as other folks for contributing your thoughts. Bugs in HBaseTimelineWriterImpl --- Key: YARN-3908 URL: https://issues.apache.org/jira/browse/YARN-3908 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Vrushali C Fix For: YARN-2928 Attachments: YARN-3908-YARN-2928.001.patch, YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch, YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.005.patch 1. In HBaseTimelineWriterImpl, the info column family contains the basic fields of a timeline entity plus events. However, entity#info map is not stored at all. 2 event#timestamp is also not persisted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3914) Entity created time should be part of the row key of entity table
Zhijie Shen created YARN-3914: - Summary: Entity created time should be part of the row key of entity table Key: YARN-3914 URL: https://issues.apache.org/jira/browse/YARN-3914 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Entity created time should be part of the row key of entity table, between entity type and entity Id. The reason to have it is to index the entities. Though we cannot index the entities for all kinds of information, indexing them according to the created time is very necessary. Without it, every query for the latest entities that belong to an application and a type will scan through all the entities that belong to them. For example, if we want to list the 100 latest started containers in an YARN app. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3908) Bugs in HBaseTimelineWriterImpl
Zhijie Shen created YARN-3908: - Summary: Bugs in HBaseTimelineWriterImpl Key: YARN-3908 URL: https://issues.apache.org/jira/browse/YARN-3908 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Zhijie Shen Assignee: Vrushali C 1. In HBaseTimelineWriterImpl, the info column family contains the basic fields of a timeline entity plus events. However, entity#info map is not stored at all. 2 event#timestamp is also not persisted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3880) Writing more RM side app-level metrics
Zhijie Shen created YARN-3880: - Summary: Writing more RM side app-level metrics Key: YARN-3880 URL: https://issues.apache.org/jira/browse/YARN-3880 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen In YARN-3044, we implemented an analog of metrics publisher for ATS v1. While it helps to write app/attempt/container life cycle events, it really doesn't write as many app-level system metrics that RM are now having. Just list the metrics that I found missing: * runningContainers * memorySeconds * vcoreSeconds * preemptedResourceMB * preemptedResourceVCores * numNonAMContainerPreempted * numAMContainerPreempted Please feel fee to add more into the list if you find it's not covered. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3881) Writing RM cluster-level metrics
Zhijie Shen created YARN-3881: - Summary: Writing RM cluster-level metrics Key: YARN-3881 URL: https://issues.apache.org/jira/browse/YARN-3881 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen RM has a bunch of metrics that we may want to write into the timeline backend to. I attached the metrics.json that I've crawled via {{http://localhost:8088/jmx?qry=Hadoop:*}}. IMHO, we need to pay attention to three groups of metrics: 1. QueueMetrics 2. JvmMetrics 3. ClusterMetrics The problem is that unlike other metrics belongs to a single application, these ones belongs to RM or cluster-wide. Therefore, current write path is not going to work for these metrics because they don't have the associated user/flow/app context info. We need to rethink of modeling cross-app metrics and the api to handle them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM
[ https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-3116. --- Resolution: Duplicate Close this jira as the duplicate of YARN-3828, whether the contributor has already started working on the issue. [Collector wireup] We need an assured way to determine if a container is an AM container on NM -- Key: YARN-3116 URL: https://issues.apache.org/jira/browse/YARN-3116 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen In YARN-3030, to start the per-app aggregator only for a started AM container, we need to determine if the container is an AM container or not from the context in NM (we can do it on RM). This information is missing, such that we worked around to considered the container with ID _01 as the AM container. Unfortunately, this is neither necessary or sufficient condition. We need to have a way to determine if a container is an AM container on NM. We can add flag to the container object or create an API to do the judgement. Perhaps the distributed AM information may also be useful to YARN-2877. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3822) Scalability validation of RM writing app/attempt/container lifecycle events
Zhijie Shen created YARN-3822: - Summary: Scalability validation of RM writing app/attempt/container lifecycle events Key: YARN-3822 URL: https://issues.apache.org/jira/browse/YARN-3822 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager, timelineserver Reporter: Zhijie Shen Assignee: Naganarasimha G R We need to test how scalable RM metrics publisher is -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3761) Set delegation token service address at the server side
Zhijie Shen created YARN-3761: - Summary: Set delegation token service address at the server side Key: YARN-3761 URL: https://issues.apache.org/jira/browse/YARN-3761 Project: Hadoop YARN Issue Type: Improvement Components: security Reporter: Zhijie Shen Nowadays, YARN components generate the delegation token without the service address set, and leave it to the client to set. With our java client library, it is usually fine. However, if users are using REST API, it's going to be a problem: The delegation token is returned as a url string. It's so unfriendly for the thin client to deserialize the url string, set the token service address and serialize it again for further usage. If we move the task of setting the service address to the server side, the client can get rid of this trouble. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3751) TestAHSWebServices fails after YARN-3467
Zhijie Shen created YARN-3751: - Summary: TestAHSWebServices fails after YARN-3467 Key: YARN-3751 URL: https://issues.apache.org/jira/browse/YARN-3751 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen YARN-3467 changed AppInfo and assumed that used resource is not null. It's not true as this information is not published to timeline server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3746) NotFoundException(404) will java.lang.IllegalStateException: STREAM when accepting XML as the content
Zhijie Shen created YARN-3746: - Summary: NotFoundException(404) will java.lang.IllegalStateException: STREAM when accepting XML as the content Key: YARN-3746 URL: https://issues.apache.org/jira/browse/YARN-3746 Project: Hadoop YARN Issue Type: Bug Components: webapp Reporter: Zhijie Shen Assignee: Zhijie Shen Both RM and ATS REST API are affected. And the weird thing is that it only happens with 404, but not other error code, and it only happens with xml, but not json. {code} zshens-mbp:Deployment zshen$ curl -H Accept: application/xml -H Content-Type:application/xml http://localhost:8188/ws/v1/applicationhistory/apps/application_1432863609211_0001 html head meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/ titleError 500 STREAM/title /head bodyh2HTTP ERROR 500/h2 pProblem accessing /ws/v1/applicationhistory/apps/application_1432863609211_0001. Reason: preSTREAM/pre/ph3Caused by:/h3prejava.lang.IllegalStateException: STREAM at org.mortbay.jetty.Response.getWriter(Response.java:616) at org.apache.hadoop.yarn.webapp.View.writer(View.java:141) at org.apache.hadoop.yarn.webapp.view.TextView.writer(TextView.java:39) at org.apache.hadoop.yarn.webapp.view.TextView.echoWithoutEscapeHtml(TextView.java:60) at org.apache.hadoop.yarn.webapp.view.TextView.putWithoutEscapeHtml(TextView.java:80) at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:81) at org.apache.hadoop.yarn.webapp.Dispatcher.render(Dispatcher.java:197) at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:145) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58) at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118) at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:602) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:277) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:554) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1211) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at
[jira] [Created] (YARN-3723) Need to clearly document primaryFilter and otherInfo value type
Zhijie Shen created YARN-3723: - Summary: Need to clearly document primaryFilter and otherInfo value type Key: YARN-3723 URL: https://issues.apache.org/jira/browse/YARN-3723 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Priority: Critical -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3725) App submission via REST API is broken in secure mode due to Timeline DT service address is empty
Zhijie Shen created YARN-3725: - Summary: App submission via REST API is broken in secure mode due to Timeline DT service address is empty Key: YARN-3725 URL: https://issues.apache.org/jira/browse/YARN-3725 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, timelineserver Affects Versions: 2.7.0 Reporter: Zhijie Shen Assignee: Zhijie Shen Priority: Blocker YARN-2971 changes TimelineClient to use the service address from Timeline DT to renew the DT instead of configured address. This break the procedure of submitting an YARN app via REST API in the secure mode. The problem is that service address is set by the client instead of the server in Java code. REST API response is an encode token Sting, such that it's so inconvenient to deserialize it and set the service address and serialize it again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3701) Isolating the erro of generating a single app report when getting all apps from generic history service
Zhijie Shen created YARN-3701: - Summary: Isolating the erro of generating a single app report when getting all apps from generic history service Key: YARN-3701 URL: https://issues.apache.org/jira/browse/YARN-3701 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Priority: Blocker Nowadays, if some error of generating a single app report when getting the application list from generic history service, it will throw the exception. Therefore, even if it just 1 out of 100 apps has something wrong, the whole app list is screwed. The worst impact is making the default page (app list) of GHS web UI crash, wile REST API /applicationhistory/apps will also break. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3622) Enable application client to communicate with new timeline service
Zhijie Shen created YARN-3622: - Summary: Enable application client to communicate with new timeline service Key: YARN-3622 URL: https://issues.apache.org/jira/browse/YARN-3622 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen YARN application has client and AM. We have the story to make TimelineClient work inside AM for v2, but not for client. TimelineClient inside app client needs to be taken care of too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3623) Having the config to indicate the timeline service version
Zhijie Shen created YARN-3623: - Summary: Having the config to indicate the timeline service version Key: YARN-3623 URL: https://issues.apache.org/jira/browse/YARN-3623 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen So far RM, MR AM, DA AM added/changed new config to enable the feature to write the timeline data to v2 server. It's good to have a YARN timeline-service.version config like timeline-service.enable to indicate the version of the running timeline service with the given YARN cluster. It's beneficial for users to more smoothly move from v1 to v2, as they don't need to change the existing config, but switch this config from v1 to v2. And each framework doesn't need to have their own v1/v2 config. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3588) Timeline entity uniqueness
Zhijie Shen created YARN-3588: - Summary: Timeline entity uniqueness Key: YARN-3588 URL: https://issues.apache.org/jira/browse/YARN-3588 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen In YARN-3051, we have some discussion about how to uniquely identify an entity. Sangjin and some other folks propose to only uniquely identify an entity by type, id in the scope of a single app. This is different from entity uniqueness in ATSv1, where type, id can globally identify an entity. This is going to affect the way of fetching a single entity, and raise the compatibility issue. Let's continue our discussion here to unblock YARN-3051. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2289) ApplicationHistoryStore should be versioned
[ https://issues.apache.org/jira/browse/YARN-2289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-2289. --- Resolution: Won't Fix We won't do improvement for GHS ApplicationHistoryStore should be versioned --- Key: YARN-2289 URL: https://issues.apache.org/jira/browse/YARN-2289 Project: Hadoop YARN Issue Type: Sub-task Components: applications Reporter: Junping Du Assignee: Junping Du -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-1688) Rethinking about POJO Classes
[ https://issues.apache.org/jira/browse/YARN-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-1688. --- Resolution: Won't Fix Rethinking about POJO Classes - Key: YARN-1688 URL: https://issues.apache.org/jira/browse/YARN-1688 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen We need to think about how the POJO classes evolve. Should we back up them with proto and others. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-1688) Rethinking about POJO Classes
[ https://issues.apache.org/jira/browse/YARN-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-1688. --- Resolution: Fixed YARN-3539 will state timeline v1 APIs stable. We won't change v1 pojo classes. Rethinking about POJO Classes - Key: YARN-1688 URL: https://issues.apache.org/jira/browse/YARN-1688 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen We need to think about how the POJO classes evolve. Should we back up them with proto and others. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-1638) Add an integration test validating post, storage and retrival of entites+events
[ https://issues.apache.org/jira/browse/YARN-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-1638. --- Resolution: Fixed We already have integration test in some way, such as in TestDistributedShell Add an integration test validating post, storage and retrival of entites+events --- Key: YARN-1638 URL: https://issues.apache.org/jira/browse/YARN-1638 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Vinod Kumar Vavilapalli -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-1530) [Umbrella] Store, manage and serve per-framework application-timeline data
[ https://issues.apache.org/jira/browse/YARN-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-1530. --- Resolution: Fixed Timeline service v1 is almost done. Most functionality has been committed through multiple versions, but mostly completed before 2.6. There're still a few outstanding issues, which are kept open for further discussion. [Umbrella] Store, manage and serve per-framework application-timeline data -- Key: YARN-1530 URL: https://issues.apache.org/jira/browse/YARN-1530 Project: Hadoop YARN Issue Type: Bug Reporter: Vinod Kumar Vavilapalli Attachments: ATS-Write-Pipeline-Design-Proposal.pdf, ATS-meet-up-8-28-2014-notes.pdf, application timeline design-20140108.pdf, application timeline design-20140116.pdf, application timeline design-20140130.pdf, application timeline design-20140210.pdf This is a sibling JIRA for YARN-321. Today, each application/framework has to do store, and serve per-framework data all by itself as YARN doesn't have a common solution. This JIRA attempts to solve the storage, management and serving of per-framework data from various applications, both running and finished. The aim is to change YARN to collect and store data in a generic manner with plugin points for frameworks to do their own thing w.r.t interpretation and serving. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2307) Capacity scheduler user only ADMINISTER_QUEUE also can submit app
[ https://issues.apache.org/jira/browse/YARN-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-2307. --- Resolution: Invalid You probably miss setting {{yarn.acl.enable=true}} in yarn-site.xml. Close if for now. Feel free to reopen if it's not your case. Capacity scheduler user only ADMINISTER_QUEUE also can submit app -- Key: YARN-2307 URL: https://issues.apache.org/jira/browse/YARN-2307 Project: Hadoop YARN Issue Type: Bug Components: capacityscheduler Affects Versions: 2.3.0 Environment: hadoop 2.3.0 centos6.5 jdk1.7 Reporter: tangjunjie Priority: Minor Queue acls for user : root Queue Operations = root default china ADMINISTER_QUEUE unfunded user root only have ADMINISTER_QUEUE but user root can sumbit app to china queue -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2060) Add an admin module for the timeline server
[ https://issues.apache.org/jira/browse/YARN-2060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-2060. --- Resolution: Won't Fix We won't add new feature to ATS v1 Add an admin module for the timeline server --- Key: YARN-2060 URL: https://issues.apache.org/jira/browse/YARN-2060 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Like the job history server, it's good to have an admin module for the timeline server to allow the admin to manage the server on the fly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2626) Document of timeline server needs to be updated
[ https://issues.apache.org/jira/browse/YARN-2626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-2626. --- Resolution: Duplicate YARN-3539 is updating it. Close this one. Document of timeline server needs to be updated --- Key: YARN-2626 URL: https://issues.apache.org/jira/browse/YARN-2626 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: 2.6.0 Reporter: Zhijie Shen YARN-2033, the document is no longer accurate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2286) RM HA and failover test cases failed ocasionally
[ https://issues.apache.org/jira/browse/YARN-2286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-2286. --- Resolution: Cannot Reproduce Close it as I cannot reproduce them now locally RM HA and failover test cases failed ocasionally Key: YARN-2286 URL: https://issues.apache.org/jira/browse/YARN-2286 Project: Hadoop YARN Issue Type: Test Affects Versions: 3.0.0, 2.5.0 Reporter: Zhijie Shen Labels: test, test-fail * TestApplicationClientProtocolOnHA.testCancelDelegationTokenOnHA ** See https://builds.apache.org/job/PreCommit-YARN-Build/4271//testReport/ * TestRMFailover.testAutomaticFailover ** See https://builds.apache.org/job/PreCommit-YARN-Build/4277//testReport/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2294) Update sample program and documentations for writing YARN Application
[ https://issues.apache.org/jira/browse/YARN-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-2294. --- Resolution: Fixed Fix Version/s: 2.6.0 Update sample program and documentations for writing YARN Application - Key: YARN-2294 URL: https://issues.apache.org/jira/browse/YARN-2294 Project: Hadoop YARN Issue Type: Improvement Reporter: Li Lu Fix For: 2.6.0 Many APIs for writing YARN applications have been stabilized. However, some of them have also been changed since the last time sample YARN program, like distributed shell, and documentations were updated. There are on-going discussions in the user's mailing list about updating the outdated Writing YARN Applications documentation. Updating the sample programs like distributed shells is also needed, since they may probably be the very first demonstration of YARN applications for newcomers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2043) Rename internal names to being Timeline Service instead of application history
[ https://issues.apache.org/jira/browse/YARN-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-2043. --- Resolution: Won't Fix We won't refactor ATS v1 any more Rename internal names to being Timeline Service instead of application history -- Key: YARN-2043 URL: https://issues.apache.org/jira/browse/YARN-2043 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Naganarasimha G R Like package and class names. In line with YARN-2033, YARN-1982 etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2309) NPE during RM-Restart test scenario
[ https://issues.apache.org/jira/browse/YARN-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-2309. --- Resolution: Duplicate NPE during RM-Restart test scenario --- Key: YARN-2309 URL: https://issues.apache.org/jira/browse/YARN-2309 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Affects Versions: 2.4.0 Reporter: Nishan Shetty Priority: Minor During RMRestart test scenarios, we met with below exception. A point to note here is, Zookeeper also was not stable during this testing, we could see many Zookeeper exception before getting this NPE {code} 2014-07-10 10:49:46,817 WARN org.apache.hadoop.service.AbstractService: When stopping the service org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService : java.lang.NullPointerException java.lang.NullPointerException at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.serviceStop(EmbeddedElectorService.java:108) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:171) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.serviceInit(AdminService.java:125) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:232) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1039) {code} Zookeeper Exception {code} 2014-07-10 10:49:46,816 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService failed in state INITED; cause: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.waitForZKConnectionEvent(ActiveStandbyElector.java:1046) at org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.access$400(ActiveStandbyElector.java:1017) at org.apache.hadoop.ha.ActiveStandbyElector.getNewZooKeeper(ActiveStandbyElector.java:632) at org.apache.hadoop.ha.ActiveStandbyElector.createConnection(ActiveStandbyElector.java:766) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-321) Generic application history service
[ https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-321. -- Resolution: Fixed Close this umbrella jira with few sub tasks open. Generic history service has been implemented and rides on timeline server. YARN-2271 is left open to track one possible performance issue to fetch all the applications stored in the timeline store. Generic application history service --- Key: YARN-321 URL: https://issues.apache.org/jira/browse/YARN-321 Project: Hadoop YARN Issue Type: Improvement Reporter: Luke Lu Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java The mapreduce job history server currently needs to be deployed as a trusted server in sync with the mapreduce runtime. Every new application would need a similar application history server. Having to deploy O(T*V) (where T is number of type of application, V is number of version of application) trusted servers is clearly not scalable. Job history storage handling itself is pretty generic: move the logs and history data into a particular directory for later serving. Job history data is already stored as json (or binary avro). I propose that we create only one trusted application history server, which can have a generic UI (display json as a tree of strings) as well. Specific application/version can deploy untrusted webapps (a la AMs) to query the application history server and interpret the json for its specific UI and/or analytics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2021) Allow AM to set failed final status
[ https://issues.apache.org/jira/browse/YARN-2021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-2021. --- Resolution: Invalid Allow AM to set failed final status --- Key: YARN-2021 URL: https://issues.apache.org/jira/browse/YARN-2021 Project: Hadoop YARN Issue Type: Improvement Reporter: Jakob Homan Background: SAMZA-117. It would be good if an AM were able to signal via its final status the job itself has failed, even if the AM itself has finished up in a tidy fashion. It would be good if either (a) the AM can signal a final status of failed and exit cleanly, or (b) we had another status, says Application Failed, to indicate that the AM itself gave up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2239) Rename ClusterMetrics#getUnhealthyNMs() to getNumUnhealthyNMs()
[ https://issues.apache.org/jira/browse/YARN-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-2239. --- Resolution: Invalid Change in ClusterMetricsInfo is incompatible. The name has been used since 2.4. Let's keep to it. Feel free to reopen it if you have different thoughts. Rename ClusterMetrics#getUnhealthyNMs() to getNumUnhealthyNMs() --- Key: YARN-2239 URL: https://issues.apache.org/jira/browse/YARN-2239 Project: Hadoop YARN Issue Type: Improvement Components: resourcemanager Affects Versions: 2.4.0 Reporter: Kenji Kikushima Assignee: Kenji Kikushima Priority: Trivial Attachments: YARN-2239.patch In ClusterMetrics, other get NMs() methods have Num prefix. (Ex. getNumLostNMs()/getNumRebootedNMs()) For naming consistency, we should rename getUnhealthyNMs() to getNumUnhealthyNMs(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2225) Turn the virtual memory check to be off by default
[ https://issues.apache.org/jira/browse/YARN-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-2225. --- Resolution: Invalid Close the jira according to the comments so far. Feel free to reopen it if someone has other thoughts. Turn the virtual memory check to be off by default -- Key: YARN-2225 URL: https://issues.apache.org/jira/browse/YARN-2225 Project: Hadoop YARN Issue Type: Bug Reporter: Anubhav Dhoot Assignee: Anubhav Dhoot Attachments: YARN-2225.patch The virtual memory check may not be the best way to isolate applications. Virtual memory is not the constrained resource. It would be better if we limit the swapping of the task using swapiness instead. This patch will turn this DEFAULT_NM_VMEM_CHECK_ENABLED off by default and let users turn it on if they need to. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2218) TestSubmitApplicationWithRMHA fails intermittently in trunk
[ https://issues.apache.org/jira/browse/YARN-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-2218. --- Resolution: Cannot Reproduce Ran the test case locally, and it didn't fail. Close it now. Feel free to reopen it if it happens again. TestSubmitApplicationWithRMHA fails intermittently in trunk --- Key: YARN-2218 URL: https://issues.apache.org/jira/browse/YARN-2218 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Ashwin Shankar org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA testGetApplicationReportIdempotent(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA) Time elapsed: 2.536 sec FAILURE! java.lang.AssertionError: expected:ACCEPTED but was:SUBMITTED at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA.testGetApplicationReportIdempotent(TestSubmitApplicationWithRMHA.java:211) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2101) Document the system filters of the timeline entity
[ https://issues.apache.org/jira/browse/YARN-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-2101. --- Resolution: Invalid After changing to domain access control, this system filter is no longer necessary. Document the system filters of the timeline entity -- Key: YARN-2101 URL: https://issues.apache.org/jira/browse/YARN-2101 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen In Yarn-1937, to support ACLs, we have reserved a filter name for the timeline server to use, which should not be used by the users. We need to document the system filter explicitly to notify users not using it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-1935) Security for timeline server
[ https://issues.apache.org/jira/browse/YARN-1935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-1935. --- Resolution: Fixed Close the umbrella jira. The only left issue is to put generic history data in a non-default domain in secure scenario. Since we don't go on to develop new feature for ATS v1, we can leave that jira (YARN-2622) open and see if we have the supporting requirement for it. Security for timeline server Key: YARN-1935 URL: https://issues.apache.org/jira/browse/YARN-1935 Project: Hadoop YARN Issue Type: New Feature Reporter: Arun C Murthy Assignee: Zhijie Shen Attachments: Timeline Security Diagram.pdf, Timeline_Kerberos_DT_ACLs.2.patch, Timeline_Kerberos_DT_ACLs.patch Jira to track work to secure the ATS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-1794) Yarn CLI only shows running containers for Running Applications
[ https://issues.apache.org/jira/browse/YARN-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-1794. --- Resolution: Fixed Won't implement new feature for generic history service now. Yarn CLI only shows running containers for Running Applications --- Key: YARN-1794 URL: https://issues.apache.org/jira/browse/YARN-1794 Project: Hadoop YARN Issue Type: Sub-task Reporter: Mayank Bansal Assignee: Mayank Bansal -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-1794) Yarn CLI only shows running containers for Running Applications
[ https://issues.apache.org/jira/browse/YARN-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-1794. --- Resolution: Won't Fix Yarn CLI only shows running containers for Running Applications --- Key: YARN-1794 URL: https://issues.apache.org/jira/browse/YARN-1794 Project: Hadoop YARN Issue Type: Sub-task Reporter: Mayank Bansal Assignee: Mayank Bansal -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-1744) Renaming applicationhistoryservice module
[ https://issues.apache.org/jira/browse/YARN-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-1744. --- Resolution: Won't Fix ATS v2 starts from a new fresh sub module. Won't fix the current naming Renaming applicationhistoryservice module - Key: YARN-1744 URL: https://issues.apache.org/jira/browse/YARN-1744 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen When we started with the feature, the module only contains the source code of generic application history service, therefore, it was named applicationhistoryservice. However, as time goes on, we have been moving on with per framework historic data (see YARN-1530). The code base of this module has already gone beyond generic application history service, and include timeline service as well. It's good to come up a more accurate name to describe the project asap to prevent people from being confused by the module name about what service it can provide. Probably we need to refactor the AHS related classes as well for clarity. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2522) AHSClient may be not necessary
[ https://issues.apache.org/jira/browse/YARN-2522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-2522. --- Resolution: Won't Fix Won't do the refacotor work AHSClient may be not necessary -- Key: YARN-2522 URL: https://issues.apache.org/jira/browse/YARN-2522 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Per discussion in [YARN-2033|https://issues.apache.org/jira/browse/YARN-2033?focusedCommentId=14126073page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14126073], it may be not necessary to have a separate AHSClient. The methods can be incorporated into TimelineClient. APPLICATION_HISTORY_ENABLED is also useless then. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-1834) YarnClient will not be redirected to the history server when RM is done
[ https://issues.apache.org/jira/browse/YARN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-1834. --- Resolution: Won't Fix We won't improve generic history service now YarnClient will not be redirected to the history server when RM is done --- Key: YARN-1834 URL: https://issues.apache.org/jira/browse/YARN-1834 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen When RM is not available, the client will keep retrying on RM, such that it won't reach the history server to get the app/atttempt/container's info. Therefore, during RM restart, such a request will be blocked. However, it has the opportunity to move on given history service is enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-1524) Make aggregated logs of completed containers available via REST API
[ https://issues.apache.org/jira/browse/YARN-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-1524. --- Resolution: Won't Fix We add feature for GHS now. Make aggregated logs of completed containers available via REST API --- Key: YARN-1524 URL: https://issues.apache.org/jira/browse/YARN-1524 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Robert Kanter -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3563) Completed app shows -1 running containers on RM web UI
Zhijie Shen created YARN-3563: - Summary: Completed app shows -1 running containers on RM web UI Key: YARN-3563 URL: https://issues.apache.org/jira/browse/YARN-3563 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen See the attached screenshot. I saw this issue with trunk. Not sure if it exists in branch-2.7 too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3563) Completed app shows -1 running containers on RM web UI
[ https://issues.apache.org/jira/browse/YARN-3563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-3563. --- Resolution: Duplicate Didn't notice YARN-3563. Close this one as a duplicate Completed app shows -1 running containers on RM web UI -- Key: YARN-3563 URL: https://issues.apache.org/jira/browse/YARN-3563 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager, webapp Reporter: Zhijie Shen Attachments: Screen Shot 2015-04-29 at 2.11.19 PM.png See the attached screenshot. I saw this issue with trunk. Not sure if it exists in branch-2.7 too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3551) Consolidate data model change according the backend implementation
Zhijie Shen created YARN-3551: - Summary: Consolidate data model change according the backend implementation Key: YARN-3551 URL: https://issues.apache.org/jira/browse/YARN-3551 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Based on the comments on [YARN-3134|https://issues.apache.org/jira/browse/YARN-3134?focusedCommentId=14512080page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14512080] and [YARN-3411|https://issues.apache.org/jira/browse/YARN-3411?focusedCommentId=14512098page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14512098], we need to change the data model to restrict the data type of info/config/metric section. 1. Info: the value could be all kinds object that is able to be serialized/deserialized by jackson. 2. Config: the value will always be assumed as String. 3. Metric: single data or time series value have to be number for aggregation. Other than that, info/start time/finish time of metric seem not to be necessary for storage. They should be removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2032) Implement a scalable, available TimelineStore using HBase
[ https://issues.apache.org/jira/browse/YARN-2032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-2032. --- Resolution: Won't Fix It will be covered in YARN-2928 Implement a scalable, available TimelineStore using HBase - Key: YARN-2032 URL: https://issues.apache.org/jira/browse/YARN-2032 Project: Hadoop YARN Issue Type: Sub-task Reporter: Vinod Kumar Vavilapalli Assignee: Li Lu Attachments: YARN-2032-091114.patch, YARN-2032-branch-2-1.patch, YARN-2032-branch2-2.patch As discussed on YARN-1530, we should pursue implementing a scalable, available Timeline store using HBase. One goal is to reuse most of the code from the levelDB Based store - YARN-1635. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3541) Add version info on timeline service / generic history web UI and RES API
Zhijie Shen created YARN-3541: - Summary: Add version info on timeline service / generic history web UI and RES API Key: YARN-3541 URL: https://issues.apache.org/jira/browse/YARN-3541 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3522) DistributedShell uses the wrong user to put timeline data
Zhijie Shen created YARN-3522: - Summary: DistributedShell uses the wrong user to put timeline data Key: YARN-3522 URL: https://issues.apache.org/jira/browse/YARN-3522 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Priority: Blocker YARN-3287 breaks the timeline access control of distributed shell. In distributed shell AM: {code} if (conf.getBoolean(YarnConfiguration.TIMELINE_SERVICE_ENABLED, YarnConfiguration.DEFAULT_TIMELINE_SERVICE_ENABLED)) { // Creating the Timeline Client timelineClient = TimelineClient.createTimelineClient(); timelineClient.init(conf); timelineClient.start(); } else { timelineClient = null; LOG.warn(Timeline service is not enabled); } {code} {code} ugi.doAs(new PrivilegedExceptionActionTimelinePutResponse() { @Override public TimelinePutResponse run() throws Exception { return timelineClient.putEntities(entity); } }); {code} YARN-3287 changes the timeline client to get the right ugi at serviceInit, but DS AM still doesn't use submitter ugi to init timeline client, but use the ugi for each put entity call. It result in the wrong user of the put request. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3509) CollectorNodemanagerProtocol's authorization doesn't work
Zhijie Shen created YARN-3509: - Summary: CollectorNodemanagerProtocol's authorization doesn't work Key: YARN-3509 URL: https://issues.apache.org/jira/browse/YARN-3509 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager, security, timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3471) Fix timeline client retry
Zhijie Shen created YARN-3471: - Summary: Fix timeline client retry Key: YARN-3471 URL: https://issues.apache.org/jira/browse/YARN-3471 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen I found that the client retry has some problems: 1. The new put methods will retry on all exception, but they should only do it upon ConnectException. 2. We can reuse TimelineClientConnectionRetry to simplify the retry logic. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3461) Consolidate flow name/version/run defaults
Zhijie Shen created YARN-3461: - Summary: Consolidate flow name/version/run defaults Key: YARN-3461 URL: https://issues.apache.org/jira/browse/YARN-3461 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen In YARN-3391, it's not resolved what should be the defaults for flow name/version/run. Let's continue the discussion here and unblock YARN-3391 from moving forward. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3430) RMAppAttempt headroom data is missing in RM Web UI
[ https://issues.apache.org/jira/browse/YARN-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-3430. --- Resolution: Fixed After pull YARN-3273 into branch-2.7. Commit this patch again into branch-2.7 RMAppAttempt headroom data is missing in RM Web UI -- Key: YARN-3430 URL: https://issues.apache.org/jira/browse/YARN-3430 Project: Hadoop YARN Issue Type: Sub-task Components: resourcemanager, webapp, yarn Reporter: Xuan Gong Assignee: Xuan Gong Priority: Blocker Fix For: 2.7.0 Attachments: YARN-3430.1.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3334) [Event Producers] NM TimelineClient container metrics posting to new timeline service.
[ https://issues.apache.org/jira/browse/YARN-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-3334. --- Resolution: Fixed Fix Version/s: YARN-2928 Hadoop Flags: Reviewed Committed the patch to branch YARN-2928. Thanks for the patch, Junping! Thanks for review, Sangjin and Li! [Event Producers] NM TimelineClient container metrics posting to new timeline service. -- Key: YARN-3334 URL: https://issues.apache.org/jira/browse/YARN-3334 Project: Hadoop YARN Issue Type: Sub-task Components: nodemanager Affects Versions: YARN-2928 Reporter: Junping Du Assignee: Junping Du Fix For: YARN-2928 Attachments: YARN-3334-demo.patch, YARN-3334-v1.patch, YARN-3334-v2.patch, YARN-3334-v3.patch, YARN-3334-v4.patch, YARN-3334-v5.patch, YARN-3334-v6.patch, YARN-3334-v8.patch, YARN-3334.7.patch After YARN-3039, we have service discovery mechanism to pass app-collector service address among collectors, NMs and RM. In this JIRA, we will handle service address setting for TimelineClients in NodeManager, and put container metrics to the backend storage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3431) Sub resources of timeline entity needs to be passed to a separate endpoint.
Zhijie Shen created YARN-3431: - Summary: Sub resources of timeline entity needs to be passed to a separate endpoint. Key: YARN-3431 URL: https://issues.apache.org/jira/browse/YARN-3431 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3399) Default cluster ID for RM HA
Zhijie Shen created YARN-3399: - Summary: Default cluster ID for RM HA Key: YARN-3399 URL: https://issues.apache.org/jira/browse/YARN-3399 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Zhijie Shen In YARN-3040, timeline service will set the default cluster ID if users don't provide one. RM HA's current behavior is a bit different when users don't provide cluster ID. IllegalArgumentException will throw instead. Let's continue the discussion if RM HA needs the default cluster ID or not here, and what's the proper default cluster ID. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3393) Getting application(s) goes wrong when app finishes before starting the attempt
Zhijie Shen created YARN-3393: - Summary: Getting application(s) goes wrong when app finishes before starting the attempt Key: YARN-3393 URL: https://issues.apache.org/jira/browse/YARN-3393 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Priority: Critical When generating app report in ApplicationHistoryManagerOnTimelineStore, it checks if appAttempt == null. {code} ApplicationAttemptReport appAttempt = getApplicationAttempt(app.appReport.getCurrentApplicationAttemptId()); if (appAttempt != null) { app.appReport.setHost(appAttempt.getHost()); app.appReport.setRpcPort(appAttempt.getRpcPort()); app.appReport.setTrackingUrl(appAttempt.getTrackingUrl()); app.appReport.setOriginalTrackingUrl(appAttempt.getOriginalTrackingUrl()); } {code} However, {{getApplicationAttempt}} doesn't return null but throws ApplicationAttemptNotFoundException: {code} if (entity == null) { throw new ApplicationAttemptNotFoundException( The entity for application attempt + appAttemptId + doesn't exist in the timeline store); } else { return convertToApplicationAttemptReport(entity); } {code} They code isn't coupled well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3390) RMTimelineCollector should have the context info of each app
Zhijie Shen created YARN-3390: - Summary: RMTimelineCollector should have the context info of each app Key: YARN-3390 URL: https://issues.apache.org/jira/browse/YARN-3390 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen RMTimelineCollector should have the context info of each app whose entity has been put -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3391) Clearly define flow ID/ flow run / flow version in API and storage
Zhijie Shen created YARN-3391: - Summary: Clearly define flow ID/ flow run / flow version in API and storage Key: YARN-3391 URL: https://issues.apache.org/jira/browse/YARN-3391 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen To continue the discussion in YARN-3040, let's figure out the best way to describe the flow. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3377) TestTimelineServiceClientIntegration fails
[ https://issues.apache.org/jira/browse/YARN-3377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-3377. --- Resolution: Fixed Fix Version/s: YARN-2928 Hadoop Flags: Reviewed +1 for the patch. Committed it to branch YARN-2928. Thanks, Sangjin! TestTimelineServiceClientIntegration fails -- Key: YARN-3377 URL: https://issues.apache.org/jira/browse/YARN-3377 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Sangjin Lee Priority: Minor Fix For: YARN-2928 Attachments: YARN-3377.001.patch TestTimelineServiceClientIntegration fails. It appears we are getting 500 from the timeline collector. This appears to be mostly an issue with the test itself. {noformat} --- Test set: org.apache.hadoop.yarn.server.timelineservice.TestTimelineServiceClientIntegration --- Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 33.503 sec FAILURE! - in org.apache.hadoop.yarn.server.timelineservice.TestTimelineServiceClientIntegration testPutEntities(org.apache.hadoop.yarn.server.timelineservice.TestTimelineServiceClientIntegration) Time elapsed: 32.606 sec ERROR! org.apache.hadoop.yarn.exceptions.YarnException: Failed to get the response from the timeline server. at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putObjects(TimelineClientImpl.java:457) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putObjects(TimelineClientImpl.java:391) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:368) at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:342) at org.apache.hadoop.yarn.server.timelineservice.TestTimelineServiceClientIntegration.testPutEntities(TestTimelineServiceClientIntegration.java:74) {noformat} The relevant piece from the server side: {noformat} Mar 19, 2015 10:48:30 AM com.sun.jersey.api.core.PackagesResourceConfig init INFO: Scanning for root resource and provider classes in the packages: org.apache.hadoop.yarn.server.timelineservice.collector org.apache.hadoop.yarn.webapp org.apache.hadoop.yarn.webapp Mar 19, 2015 10:48:30 AM com.sun.jersey.api.core.ScanningResourceConfig logClasses INFO: Root resource classes found: class org.apache.hadoop.yarn.webapp.MyTestWebService class org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorWebService Mar 19, 2015 10:48:30 AM com.sun.jersey.api.core.ScanningResourceConfig logClasses INFO: Provider classes found: class org.apache.hadoop.yarn.webapp.YarnJacksonJaxbJsonProvider class org.apache.hadoop.yarn.webapp.GenericExceptionHandler class org.apache.hadoop.yarn.webapp.MyTestJAXBContextResolver Mar 19, 2015 10:48:30 AM com.sun.jersey.server.impl.application.WebApplicationImpl _initiate INFO: Initiating Jersey application, version 'Jersey: 1.9 09/02/2011 11:17 AM' Mar 19, 2015 10:48:31 AM com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator$8 resolve SEVERE: null java.lang.IllegalAccessException: Class com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator$8 can not access a member of class org.apache.hadoop.yarn.webapp.MyTestWebService$MyInfo with modifiers public at sun.reflect.Reflection.ensureMemberAccess(Reflection.java:95) at java.lang.Class.newInstance0(Class.java:366) at java.lang.Class.newInstance(Class.java:325) at com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator$8.resolve(WadlGeneratorJAXBGrammarGenerator.java:467) at com.sun.jersey.server.wadl.WadlGenerator$ExternalGrammarDefinition.resolve(WadlGenerator.java:181) at com.sun.jersey.server.wadl.ApplicationDescription.resolve(ApplicationDescription.java:81) at com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator.attachTypes(WadlGeneratorJAXBGrammarGenerator.java:518) at com.sun.jersey.server.wadl.WadlBuilder.generate(WadlBuilder.java:124) at com.sun.jersey.server.impl.wadl.WadlApplicationContextImpl.getApplication(WadlApplicationContextImpl.java:104) at com.sun.jersey.server.impl.wadl.WadlApplicationContextImpl.getApplication(WadlApplicationContextImpl.java:120) at com.sun.jersey.server.impl.wadl.WadlMethodFactory$WadlOptionsMethodDispatcher.dispatch(WadlMethodFactory.java:98) at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288) at
[jira] [Created] (YARN-3374) Aggregator's web server should randomly bind an available port
Zhijie Shen created YARN-3374: - Summary: Aggregator's web server should randomly bind an available port Key: YARN-3374 URL: https://issues.apache.org/jira/browse/YARN-3374 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen It's based on the configuration now. The approach won't work if we move to app-level aggregator container solution. On NM my start multiple such aggregators, which cannot bind to the same configured port. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3039) [Aggregator wireup] Implement ATS app-appgregator service discovery
[ https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-3039. --- Resolution: Fixed Fix Version/s: YARN-2928 Hadoop Flags: Reviewed Committed the patch to branch YARN-2928. Thanks for the patch, Junping! Thanks for review, Sangjin! [Aggregator wireup] Implement ATS app-appgregator service discovery --- Key: YARN-3039 URL: https://issues.apache.org/jira/browse/YARN-3039 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Junping Du Fix For: YARN-2928 Attachments: Service Binding for applicationaggregator of ATS (draft).pdf, Service Discovery For Application Aggregator of ATS (v2).pdf, YARN-3039-no-test.patch, YARN-3039-v2-incomplete.patch, YARN-3039-v3-core-changes-only.patch, YARN-3039-v4.patch, YARN-3039-v5.patch, YARN-3039-v6.patch, YARN-3039-v7.patch, YARN-3039-v8.patch, YARN-3039.9.patch Per design in YARN-2928, implement ATS writer service discovery. This is essential for off-node clients to send writes to the right ATS writer. This should also handle the case of AM failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3338) Exclude jline dependency from YARN
Zhijie Shen created YARN-3338: - Summary: Exclude jline dependency from YARN Key: YARN-3338 URL: https://issues.apache.org/jira/browse/YARN-3338 Project: Hadoop YARN Issue Type: Bug Components: build Reporter: Zhijie Shen Assignee: Zhijie Shen Priority: Blocker It was fixed in YARN-2815, but is broken again by YARN-1514. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3031) [Storage abstraction] Create backing storage write interface for ATS writers
[ https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-3031. --- Resolution: Duplicate Since the patch there covers the code of the writer interface. Let's resolve this one as the duplicate of YARN-3264. [Storage abstraction] Create backing storage write interface for ATS writers Key: YARN-3031 URL: https://issues.apache.org/jira/browse/YARN-3031 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Vrushali C Attachments: Sequence_diagram_write_interaction.2.png, Sequence_diagram_write_interaction.png, YARN-3031.01.patch, YARN-3031.02.patch, YARN-3031.03.patch Per design in YARN-2928, come up with the interface for the ATS writer to write to various backing storages. The interface should be created to capture the right level of abstractions so that it will enable all backing storage implementations to implement it efficiently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3125) [Event producers] Change distributed shell to use new timeline service
[ https://issues.apache.org/jira/browse/YARN-3125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-3125. --- Resolution: Fixed Fix Version/s: YARN-2928 Hadoop Flags: Reviewed Committed to branch YARN-2928. Thanks for the patch, Junping and Li! [Event producers] Change distributed shell to use new timeline service -- Key: YARN-3125 URL: https://issues.apache.org/jira/browse/YARN-3125 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Junping Du Fix For: YARN-2928 Attachments: YARN-3125.patch, YARN-3125_UT-022615.patch, YARN-3125_UT-022715.patch, YARN-3125v2.patch, YARN-3125v3.patch We can start with changing distributed shell to use new timeline service once the framework is completed, in which way we can quickly verify the next gen is working fine end-to-end. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3240) [Data Mode] Implement client API to put generic entities
Zhijie Shen created YARN-3240: - Summary: [Data Mode] Implement client API to put generic entities Key: YARN-3240 URL: https://issues.apache.org/jira/browse/YARN-3240 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3196) [Compatibility] Make TS next gen be compatible with the current TS
Zhijie Shen created YARN-3196: - Summary: [Compatibility] Make TS next gen be compatible with the current TS Key: YARN-3196 URL: https://issues.apache.org/jira/browse/YARN-3196 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen File a jira to make sure that we don't forget to be compatible with the current TS, such that we can smoothly move users to new TS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3043) [Data Model] Create ATS configuration, metadata, etc. as part of entities
[ https://issues.apache.org/jira/browse/YARN-3043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-3043. --- Resolution: Duplicate Let's make the all-inclusive data model definition in YARN-3041. [Data Model] Create ATS configuration, metadata, etc. as part of entities - Key: YARN-3043 URL: https://issues.apache.org/jira/browse/YARN-3043 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Varun Saxena Per design in YARN-2928, create APIs for configuration, metadata, etc. and integrate them into entities. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3042) [Data Model] Create ATS metrics API
[ https://issues.apache.org/jira/browse/YARN-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-3042. --- Resolution: Duplicate Let's make the all-inclusive data model definition in YARN-3041. [Data Model] Create ATS metrics API --- Key: YARN-3042 URL: https://issues.apache.org/jira/browse/YARN-3042 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Siddharth Wagle Per design in YARN-2928, create the ATS metrics API and integrate it into the entities. The concept may be based on the existing hadoop metrics, but we want to make sure we have something that would satisfy all ATS use cases. It also needs to capture whether a metric should be aggregated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3150) Documenting the timeline service v2
Zhijie Shen created YARN-3150: - Summary: Documenting the timeline service v2 Key: YARN-3150 URL: https://issues.apache.org/jira/browse/YARN-3150 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Let's make sure we will have a document to describe what's new in TS v2, the APIs, the client libs and so on. We should do better around documentation in v2 than v1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3134) Exploiting the option of using Phoenix to access HBase backend
Zhijie Shen created YARN-3134: - Summary: Exploiting the option of using Phoenix to access HBase backend Key: YARN-3134 URL: https://issues.apache.org/jira/browse/YARN-3134 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Quote the introduction on Phoenix web page: {code} Apache Phoenix is a relational database layer over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, and orchestrates the running of those scans to produce regular JDBC result sets. The table metadata is stored in an HBase table and versioned, such that snapshot queries over prior versions will automatically use the correct schema. Direct use of the HBase API, along with coprocessors and custom filters, results in performance on the order of milliseconds for small queries, or seconds for tens of millions of rows. {code} It may simply our implementation read/write data from/to HBase, and can easily build index and compose complex query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3123) Make YARN CLI show a single completed container even if the app is running
Zhijie Shen created YARN-3123: - Summary: Make YARN CLI show a single completed container even if the app is running Key: YARN-3123 URL: https://issues.apache.org/jira/browse/YARN-3123 Project: Hadoop YARN Issue Type: Improvement Components: client Reporter: Zhijie Shen Like YARN-2808, we can do the improvement for the single container command too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3125) Change distributed shell to use new timeline service
Zhijie Shen created YARN-3125: - Summary: Change distributed shell to use new timeline service Key: YARN-3125 URL: https://issues.apache.org/jira/browse/YARN-3125 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen We can start with changing distributed shell to use new timeline service once the framework is completed, in which way we can quickly verify the next gen is working fine end-to-end. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3115) Work-preserving restarting of per-node aggregator
Zhijie Shen created YARN-3115: - Summary: Work-preserving restarting of per-node aggregator Key: YARN-3115 URL: https://issues.apache.org/jira/browse/YARN-3115 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen YARN-3030 makes the per-node aggregator work as the aux service of a NM. It contains the states of the per-app aggregators corresponding to the running AM containers on this NM. While NM is restarted in work-preserving mode, this information of per-node aggregator needs to be carried on over restarting too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3030) set up ATS writer with basic request serving structure and lifecycle
[ https://issues.apache.org/jira/browse/YARN-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-3030. --- Resolution: Fixed Fix Version/s: YARN-2928 Committed the patch to branch YARN-2928. Thanks, Sangjin! set up ATS writer with basic request serving structure and lifecycle Key: YARN-3030 URL: https://issues.apache.org/jira/browse/YARN-3030 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Sangjin Lee Fix For: YARN-2928 Attachments: YARN-3030.001.patch, YARN-3030.002.patch, YARN-3030.003.patch, YARN-3030.004.patch Per design in YARN-2928, create an ATS writer as a service, and implement the basic service structure including the lifecycle management. Also, as part of this JIRA, we should come up with the ATS client API for sending requests to this ATS writer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3062) timelineserver gives inconsistent data for otherinfo field based on the filter param
[ https://issues.apache.org/jira/browse/YARN-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-3062. --- Resolution: Invalid Thanks for your confirmation, [~pramachandran]! Close the Jira. timelineserver gives inconsistent data for otherinfo field based on the filter param Key: YARN-3062 URL: https://issues.apache.org/jira/browse/YARN-3062 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.4.0, 2.5.0, 2.6.0 Reporter: Prakash Ramachandran Attachments: withfilter.json, withoutfilter.json When otherinfo field gets updated, in some cases the data returned for an entity is dependent on the filter usage. for ex in the attached files for the - entity: vertex_1421164610335_0020_1_01, - entitytype: TEZ_VERTEX_ID, for the otherinfo.numTasks, got updated from 1009 to 253 - using {code}http://machine:8188/ws/v1/timeline/TEZ_VERTEX_ID/vertex_1421164610335_0020_1_01/ {code} gives the updated value: 253 - using {code}http://cn042-10:8188/ws/v1/timeline/TEZ_VERTEX_ID?limit=11primaryFilter=TEZ_DAG_ID%3Adag_1421164610335_0020_1{code} gives the old value: 1009 for the otherinfo.status field, which gets updated, both of them show the updated value. TEZ-1942 has more details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-3063) Bootstrap TimelineServer Next Gen Module
Zhijie Shen created YARN-3063: - Summary: Bootstrap TimelineServer Next Gen Module Key: YARN-3063 URL: https://issues.apache.org/jira/browse/YARN-3063 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Based on the discussion on the umbrella Jira, we need to create a new sub-module for TS next gen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-1364) Limit the number of outstanding tfile writers in FileSystemApplicationHistoryStore
[ https://issues.apache.org/jira/browse/YARN-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-1364. --- Resolution: Won't Fix No longer maintain FS based generic history store. Limit the number of outstanding tfile writers in FileSystemApplicationHistoryStore -- Key: YARN-1364 URL: https://issues.apache.org/jira/browse/YARN-1364 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen It seems to be expensive to maintain a big number of outstanding t-file writers. RM is likely to run out of the I/O resources. Probably we'd like to limit the number of concurrent outstanding t-file writers, and queue the writing requests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2262) Few fields displaying wrong values in Timeline server after RM restart
[ https://issues.apache.org/jira/browse/YARN-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-2262. --- Resolution: Won't Fix No longer maintain FS based generic history store. Few fields displaying wrong values in Timeline server after RM restart -- Key: YARN-2262 URL: https://issues.apache.org/jira/browse/YARN-2262 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: 2.4.0 Reporter: Nishan Shetty Assignee: Naganarasimha G R Attachments: Capture.PNG, Capture1.PNG, yarn-testos-historyserver-HOST-10-18-40-95.log, yarn-testos-resourcemanager-HOST-10-18-40-84.log, yarn-testos-resourcemanager-HOST-10-18-40-95.log Few fields displaying wrong values in Timeline server after RM restart State:null FinalStatus: UNDEFINED Started: 8-Jul-2014 14:58:08 Elapsed: 2562047397789hrs, 44mins, 47sec -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2330) Jobs are not displaying in timeline server after RM restart
[ https://issues.apache.org/jira/browse/YARN-2330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-2330. --- Resolution: Won't Fix No longer maintain FS based generic history store Jobs are not displaying in timeline server after RM restart --- Key: YARN-2330 URL: https://issues.apache.org/jira/browse/YARN-2330 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: 2.4.1 Environment: Nodemanagers 3 (3*8GB) Queues A = 70% Queues B = 30% Reporter: Nishan Shetty Assignee: Naganarasimha G R Submit jobs to queue a While job is running Restart RM Observe that those jobs are not displayed in timelineserver {code} 2014-07-22 10:11:32,084 ERROR org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore: History information of application application_1406002968974_0003 is not included into the result due to the exception java.io.IOException: Cannot seek to negative offset at org.apache.hadoop.hdfs.DFSInputStream.seek(DFSInputStream.java:1381) at org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:63) at org.apache.hadoop.io.file.tfile.BCFile$Reader.init(BCFile.java:624) at org.apache.hadoop.io.file.tfile.TFile$Reader.init(TFile.java:804) at org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore$HistoryFileReader.init(FileSystemApplicationHistoryStore.java:683) at org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.getHistoryFileReader(FileSystemApplicationHistoryStore.java:661) at org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.getApplication(FileSystemApplicationHistoryStore.java:146) at org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.getAllApplications(FileSystemApplicationHistoryStore.java:199) at org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getAllApplications(ApplicationHistoryManagerImpl.java:103) at org.apache.hadoop.yarn.server.webapp.AppsBlock.render(AppsBlock.java:75) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66) at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76) at org.apache.hadoop.yarn.webapp.View.render(View.java:235) at org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49) at org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117) at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:845) at org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:56) at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82) at org.apache.hadoop.yarn.webapp.Dispatcher.render(Dispatcher.java:197) at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:156) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263) at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178) at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834) at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795) at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163) at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58) at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118) at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1192) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at
[jira] [Resolved] (YARN-1835) History client service needs to be more robust
[ https://issues.apache.org/jira/browse/YARN-1835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-1835. --- Resolution: Invalid The ApplicationHistoryManager has a new implementation, which doesn't have the aforementioned issue. History client service needs to be more robust -- Key: YARN-1835 URL: https://issues.apache.org/jira/browse/YARN-1835 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen While doing the test, I've found the following issues so far: 1. The history file not found exception is exposed to the user directly, which is better to be caught and translated into ApplicationNotFound. 2. NPE will be exposed as well, since ApplicationHistoryManager doesn't do necessary null check. In addition, TestApplicationHistoryManagerImpl missed to test most ApplicationHistoryManager methods. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2412) Augment HistoryStorage Reader Interface to Support Filters When Getting Applications
[ https://issues.apache.org/jira/browse/YARN-2412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-2412. --- Resolution: Invalid The generic history storage layer is rebuilt, the reader interface is not useful in the new stack. Augment HistoryStorage Reader Interface to Support Filters When Getting Applications Key: YARN-2412 URL: https://issues.apache.org/jira/browse/YARN-2412 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Shinichi Yamashita https://issues.apache.org/jira/browse/YARN-925?focusedCommentId=13800402page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13800402 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-1302) Add AHSDelegationTokenSecretManager for ApplicationHistoryProtocol
[ https://issues.apache.org/jira/browse/YARN-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-1302. --- Resolution: Duplicate Add AHSDelegationTokenSecretManager for ApplicationHistoryProtocol -- Key: YARN-1302 URL: https://issues.apache.org/jira/browse/YARN-1302 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Like the ApplicationClientProtocol, ApplicationHistoryProtocol needs its own security stack. We need to implement AHSDelegationTokenSecretManager, AHSDelegationTokenIndentifier, AHSDelegationTokenSelector and other analogs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-1344) Separate ApplicationAttemptStartDataProto and ApplicationAttemptRegisteredDataProto
[ https://issues.apache.org/jira/browse/YARN-1344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-1344. --- Resolution: Invalid The generic history storage has be rebuild. It's no longer an valid issue. Separate ApplicationAttemptStartDataProto and ApplicationAttemptRegisteredDataProto --- Key: YARN-1344 URL: https://issues.apache.org/jira/browse/YARN-1344 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen Some info in ApplicationAttemptStartData can separated, and put into ApplicationAttemptRegisteredData, to further minimize the info loss probability. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-1346) Revisit the output type of the reader interface
[ https://issues.apache.org/jira/browse/YARN-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-1346. --- Resolution: Invalid The generic history storage layer is rebuilt. It's no longer a valid problem. Revisit the output type of the reader interface --- Key: YARN-1346 URL: https://issues.apache.org/jira/browse/YARN-1346 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen In YARN-947, there's a discussion in YARN-947 about changing the reader interface to return the report protobuf (e.g., ApplicationReport) directly instead of AHS internal objects (e.g., ApplicationHistoryData). We need to think more about it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2177) Timeline server web interfaces high-availablity and scalability
[ https://issues.apache.org/jira/browse/YARN-2177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-2177. --- Resolution: Duplicate Close it as the duplicate of YARN-2928. This topic will be covered by TS next gen. Timeline server web interfaces high-availablity and scalability --- Key: YARN-2177 URL: https://issues.apache.org/jira/browse/YARN-2177 Project: Hadoop YARN Issue Type: Sub-task Reporter: Zhijie Shen Assignee: Zhijie Shen While we are going to leverage HBase to provide high available and scalable storage solution, we also need to take care of high-availability and scalability of the web interfaces, which are likely to handle a big volume of user requests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2520) Scalable and High Available Timeline Server
[ https://issues.apache.org/jira/browse/YARN-2520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-2520. --- Resolution: Duplicate Close it as the duplicate of YARN-2928, as TS next gen will cover this topic Scalable and High Available Timeline Server --- Key: YARN-2520 URL: https://issues.apache.org/jira/browse/YARN-2520 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: 3.0.0, 2.5.0 Reporter: Zhijie Shen Assignee: Zhijie Shen Attachments: Federal Timeline Servers.jpg YARN-2032 will provide a scalable and reliable timeline store based on HBase. However a single instance of the timeline server is not scalable enough to handle a large volume of user requests, being the single bottleneck. As the timeline server is the stateless machine, it's not difficult to start multiple timeline server instances and write into the same HBase timeline store. We can make use of Zookeeper to register all the timeline servers, as HA RMs do, and client can randomly pick one server to publish the timeline entities for load balancing. Moreover, since multiple timeline servers are started together, they are actually back up each other, solving the high availability problem as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2521) Reliable TimelineClient
[ https://issues.apache.org/jira/browse/YARN-2521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-2521. --- Resolution: Duplicate Close the ticket as the duplicate of YARN-2928 Reliable TimelineClient --- Key: YARN-2521 URL: https://issues.apache.org/jira/browse/YARN-2521 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Affects Versions: 3.0.0, 2.5.0 Reporter: Zhijie Shen Assignee: Zhijie Shen The timeline server is likely to be in outage. It would be beneficial if the timeline client can cache the timeline entity locally after the application pass it to the client, and before the client successfully hands it over to the server. To prevent the entity from being lost, we may want to persist it into the secondary storage, such as HDFS and Leveldb. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-3013) Findbugs warning aboutAbstractYarnScheduler.rmContext
[ https://issues.apache.org/jira/browse/YARN-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-3013. --- Resolution: Duplicate Close it as the duplicate. Thanks for pointing it out. Findbugs warning aboutAbstractYarnScheduler.rmContext - Key: YARN-3013 URL: https://issues.apache.org/jira/browse/YARN-3013 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Rohith Attachments: 0001-YARN-3013.patch {code} Bug type IS2_INCONSISTENT_SYNC (click for details) In class org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler Field org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext Synchronized 91% of the time {code} See https://builds.apache.org/job/PreCommit-YARN-Build/6266//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html#IS2_INCONSISTENT_SYNC for more details -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2991) TestRMRestart.testDecomissionedNMsMetricsOnRMRestart intermittently fails on trunk
Zhijie Shen created YARN-2991: - Summary: TestRMRestart.testDecomissionedNMsMetricsOnRMRestart intermittently fails on trunk Key: YARN-2991 URL: https://issues.apache.org/jira/browse/YARN-2991 Project: Hadoop YARN Issue Type: Test Reporter: Zhijie Shen {code} Error Message test timed out after 6 milliseconds Stacktrace java.lang.Exception: test timed out after 6 milliseconds at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1281) at java.lang.Thread.join(Thread.java:1355) at org.apache.hadoop.yarn.event.AsyncDispatcher.serviceStop(AsyncDispatcher.java:150) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) at org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157) at org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStop(ResourceManager.java:1106) at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) at org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testDecomissionedNMsMetricsOnRMRestart(TestRMRestart.java:1873) {code} It happened twice this months: https://builds.apache.org/job/PreCommit-YARN-Build/6096/ https://builds.apache.org/job/PreCommit-YARN-Build/6182/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2958) RMStateStore seems to unnecessarily and wronly store sequence number separately
Zhijie Shen created YARN-2958: - Summary: RMStateStore seems to unnecessarily and wronly store sequence number separately Key: YARN-2958 URL: https://issues.apache.org/jira/browse/YARN-2958 Project: Hadoop YARN Issue Type: Bug Components: resourcemanager Reporter: Zhijie Shen It seems that RMStateStore updates last sequence number when storing or updating each individual DT, to recover the latest sequence number when RM restarting. First, the current logic seems to be problematic: {code} public synchronized void updateRMDelegationTokenAndSequenceNumber( RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate, int latestSequenceNumber) { if(isFencedState()) { LOG.info(State store is in Fenced state. Can't update RM Delegation Token.); return; } try { updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier, renewDate, latestSequenceNumber); } catch (Exception e) { notifyStoreOperationFailed(e); } } {code} {code} @Override protected void updateStoredToken(RMDelegationTokenIdentifier id, long renewDate) { try { LOG.info(updating RMDelegation token with sequence number: + id.getSequenceNumber()); rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id, renewDate, id.getSequenceNumber()); } catch (Exception e) { LOG.error(Error in updating persisted RMDelegationToken with sequence number: + id.getSequenceNumber()); ExitUtil.terminate(1, e); } } {code} According to code above, even when renewing a DT, the last sequence number is updated in the store, which is wrong. For example, we have the following sequence: 1. Get DT 1 (seq = 1) 2. Get DT 2( seq = 2) 3. Renew DT 1 (seq = 1) 4. Restart RM The stored and then recovered last sequence number is 1. It makes the next created DT after RM restarting will conflict with DT 2 on sequence num. Second, the aforementioned bug doesn't happen actually, because the recovered last sequence num has been overwritten at by the correctly one. {code} public void recover(RMState rmState) throws Exception { LOG.info(recovering RMDelegationTokenSecretManager.); // recover RMDTMasterKeys for (DelegationKey dtKey : rmState.getRMDTSecretManagerState() .getMasterKeyState()) { addKey(dtKey); } // recover RMDelegationTokens MapRMDelegationTokenIdentifier, Long rmDelegationTokens = rmState.getRMDTSecretManagerState().getTokenState(); this.delegationTokenSequenceNumber = rmState.getRMDTSecretManagerState().getDTSequenceNumber(); for (Map.EntryRMDelegationTokenIdentifier, Long entry : rmDelegationTokens .entrySet()) { addPersistedDelegationToken(entry.getKey(), entry.getValue()); } } {code} The code above recovers delegationTokenSequenceNumber by reading the last sequence number in the store. It could be wrong. Fortunately, delegationTokenSequenceNumber updates it to the right number. {code} if (identifier.getSequenceNumber() getDelegationTokenSeqNum()) { setDelegationTokenSeqNum(identifier.getSequenceNumber()); } {code} All the stored identifiers will be gone through, and delegationTokenSequenceNumber will be set to the largest sequence number among these identifiers. Therefore, new DT will be assigned a sequence number which is always larger than that of all the recovered DT. To sum up, two negatives make a positive, but it's good to fix the issue. Please let me know if I've missed something here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2879) Compatibility validation between YARN 2.2/2.4 and 2.6
Zhijie Shen created YARN-2879: - Summary: Compatibility validation between YARN 2.2/2.4 and 2.6 Key: YARN-2879 URL: https://issues.apache.org/jira/browse/YARN-2879 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen Recently, I did some simple backward compatibility experiments. Bascially, I've taken the following 2 steps: 1. Deploy the application (MR and DistributedShell) that is compiled against *old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is submitted via *new* Hadoop (2.6) client. 2. Deploy the application (MR and DistributedShell) that is compiled against *old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is submitted via *old* Hadoop (2.2/2.4) client that comes with the app. I've tried these 2 steps on both insecure and secure cluster. Here's a short summary: || || || DS 2.2 || DS 2.4 || MR 2.2 + Shuffle 2.2 || MR 2.2 + Shuffle 2.6 || MR 2.4 + Shuffle 2.4 || MR 2.4 + Shuffle 2.6 || | Insecure | New Client | OK | OK | Client Incompatible | Client Incompatible | OK | OK | | Insecure | Old Client | OK | OK | AM Incompatible | Client Incompatible | OK | OK | | secure | New Client | OK | OK | Client Incompatible | Client Incompatible | OK | OK | | secure | Old Client | OK | OK | AM Incompatible | Client Incompatible | OK | OK | Note that I've tried to run NM with both old and new shuffle handler version. In general, the compatibility looks good overall. There're a few issues that are related to MR, but they seem to be not the YARN issue. I'll post the individual problem in the follow-up comments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2838) Issues with TimeLineServer (Application History)
[ https://issues.apache.org/jira/browse/YARN-2838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-2838. --- Resolution: Not a Problem Close the ticket and work on separate jiras. Issues with TimeLineServer (Application History) Key: YARN-2838 URL: https://issues.apache.org/jira/browse/YARN-2838 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Affects Versions: 2.6.0, 2.5.1 Reporter: Naganarasimha G R Assignee: Naganarasimha G R Attachments: IssuesInTimelineServer.pdf Few issues in usage of Timeline server for generic application history access -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2867) TimelineClient DT methods should check if the timeline service is enabled or not
Zhijie Shen created YARN-2867: - Summary: TimelineClient DT methods should check if the timeline service is enabled or not Key: YARN-2867 URL: https://issues.apache.org/jira/browse/YARN-2867 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Zhijie Shen DT related methods doesn't check if isEnabled == true. On the other side, the internal stuff is only inited when isEnabled == true. NPE happens if users call these methods when the timeline service config is not set to enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (YARN-2867) TimelineClient DT methods should check if the timeline service is enabled or not
[ https://issues.apache.org/jira/browse/YARN-2867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhijie Shen resolved YARN-2867. --- Resolution: Invalid Per discussion on [YARN-2375|https://issues.apache.org/jira/browse/YARN-2375?focusedCommentId=14213002page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14213002], close this Jira as invalid TimelineClient DT methods should check if the timeline service is enabled or not Key: YARN-2867 URL: https://issues.apache.org/jira/browse/YARN-2867 Project: Hadoop YARN Issue Type: Bug Affects Versions: 2.6.0 Reporter: Zhijie Shen DT related methods doesn't check if isEnabled == true. On the other side, the internal stuff is only inited when isEnabled == true. NPE happens if users call these methods when the timeline service config is not set to enabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2861) Timeline DT secret manager should not reuse the RM's configs.
Zhijie Shen created YARN-2861: - Summary: Timeline DT secret manager should not reuse the RM's configs. Key: YARN-2861 URL: https://issues.apache.org/jira/browse/YARN-2861 Project: Hadoop YARN Issue Type: Bug Reporter: Zhijie Shen Assignee: Zhijie Shen This is the configs for RM DT secret manager. We should create separate ones for timeline DT only. {code} @Override protected void serviceInit(Configuration conf) throws Exception { long secretKeyInterval = conf.getLong(YarnConfiguration.DELEGATION_KEY_UPDATE_INTERVAL_KEY, YarnConfiguration.DELEGATION_KEY_UPDATE_INTERVAL_DEFAULT); long tokenMaxLifetime = conf.getLong(YarnConfiguration.DELEGATION_TOKEN_MAX_LIFETIME_KEY, YarnConfiguration.DELEGATION_TOKEN_MAX_LIFETIME_DEFAULT); long tokenRenewInterval = conf.getLong(YarnConfiguration.DELEGATION_TOKEN_RENEW_INTERVAL_KEY, YarnConfiguration.DELEGATION_TOKEN_RENEW_INTERVAL_DEFAULT); secretManager = new TimelineDelegationTokenSecretManager(secretKeyInterval, tokenMaxLifetime, tokenRenewInterval, 360); secretManager.startThreads(); serviceAddr = TimelineUtils.getTimelineTokenServiceAddress(getConfig()); super.init(conf); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2854) The document about timeline service and generic service needs to be updated
Zhijie Shen created YARN-2854: - Summary: The document about timeline service and generic service needs to be updated Key: YARN-2854 URL: https://issues.apache.org/jira/browse/YARN-2854 Project: Hadoop YARN Issue Type: Bug Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Priority: Critical -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (YARN-2837) Timeline server needs to recover the timeline DT when restarting
Zhijie Shen created YARN-2837: - Summary: Timeline server needs to recover the timeline DT when restarting Key: YARN-2837 URL: https://issues.apache.org/jira/browse/YARN-2837 Project: Hadoop YARN Issue Type: New Feature Components: timelineserver Reporter: Zhijie Shen Assignee: Zhijie Shen Priority: Blocker Timeline server needs to recover the stateful information when restarting as RM/NM/JHS does now. So far the stateful information only includes the timeline DT. Without recovery, the timeline DT of the existing YARN apps is not long valid, and cannot be renewed any more after the timeline server is restarted. -- This message was sent by Atlassian JIRA (v6.3.4#6332)