[jira] [Created] (YARN-4013) Publisher V2 should write the unmanaged AM flag too

2015-08-03 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-4013:
-

 Summary: Publisher V2 should write the unmanaged AM flag too
 Key: YARN-4013
 URL: https://issues.apache.org/jira/browse/YARN-4013
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen


Upon rebase the branch, I find we need to redo the similar work for V2 
publisher:

https://issues.apache.org/jira/browse/YARN-3543



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3992) TestApplicationPriority.testApplicationPriorityAllocation fails intermittently

2015-07-29 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-3992:
-

 Summary: TestApplicationPriority.testApplicationPriorityAllocation 
fails intermittently
 Key: YARN-3992
 URL: https://issues.apache.org/jira/browse/YARN-3992
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Zhijie Shen


{code}
java.lang.AssertionError: expected:7 but was:5
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at 
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestApplicationPriority.testApplicationPriorityAllocation(TestApplicationPriority.java:182)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3993) Change to use the AM flag in ContainerContext determine AM container

2015-07-29 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-3993:
-

 Summary: Change to use the AM flag in ContainerContext determine 
AM container
 Key: YARN-3993
 URL: https://issues.apache.org/jira/browse/YARN-3993
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen


After YARN-3116, we will have a flag in ContainerContext to determine if the 
container is AM or not in aux service. We need to change accordingly to make 
use of this feature instead of depending on container ID.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3984) Rethink event column key issue

2015-07-27 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-3984:
-

 Summary: Rethink event column key issue
 Key: YARN-3984
 URL: https://issues.apache.org/jira/browse/YARN-3984
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
 Fix For: YARN-2928


Currently, the event column key is event_id?info_key?timestamp, which is not so 
friendly to fetching all the events of an entity and sorting them in a 
chronologic order. IMHO, timestamp?event_id?info_key may be a better key 
schema. I open this jira to continue the discussion about it which was 
commented on YARN-3908.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3908) Bugs in HBaseTimelineWriterImpl

2015-07-27 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-3908.
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: YARN-2928

Committed the patch to branch YARN-2928. Thanks for the patch, Vrushali and 
Sangjin, as well as other folks for contributing your thoughts.

 Bugs in HBaseTimelineWriterImpl
 ---

 Key: YARN-3908
 URL: https://issues.apache.org/jira/browse/YARN-3908
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Vrushali C
 Fix For: YARN-2928

 Attachments: YARN-3908-YARN-2928.001.patch, 
 YARN-3908-YARN-2928.002.patch, YARN-3908-YARN-2928.003.patch, 
 YARN-3908-YARN-2928.004.patch, YARN-3908-YARN-2928.004.patch, 
 YARN-3908-YARN-2928.005.patch


 1. In HBaseTimelineWriterImpl, the info column family contains the basic 
 fields of a timeline entity plus events. However, entity#info map is not 
 stored at all.
 2 event#timestamp is also not persisted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3914) Entity created time should be part of the row key of entity table

2015-07-10 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-3914:
-

 Summary: Entity created time should be part of the row key of 
entity table
 Key: YARN-3914
 URL: https://issues.apache.org/jira/browse/YARN-3914
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen


Entity created time should be part of the row key of entity table, between 
entity type and entity Id. The reason to have it is to index the entities. 
Though we cannot index the entities for all kinds of information, indexing them 
according to the created time is very necessary. Without it, every query for 
the latest entities that belong to an application and a type will scan through 
all the entities that belong to them. For example, if we want to list the 100 
latest started containers in an YARN app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3908) Bugs in HBaseTimelineWriterImpl

2015-07-09 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-3908:
-

 Summary: Bugs in HBaseTimelineWriterImpl
 Key: YARN-3908
 URL: https://issues.apache.org/jira/browse/YARN-3908
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Vrushali C


1. In HBaseTimelineWriterImpl, the info column family contains the basic fields 
of a timeline entity plus events. However, entity#info map is not stored at all.

2 event#timestamp is also not persisted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3880) Writing more RM side app-level metrics

2015-07-02 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-3880:
-

 Summary: Writing more RM side app-level metrics
 Key: YARN-3880
 URL: https://issues.apache.org/jira/browse/YARN-3880
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen


In YARN-3044, we implemented an analog of metrics publisher for ATS v1. While 
it helps to write app/attempt/container life cycle events, it really doesn't 
write  as many app-level system metrics that RM are now having.  Just list the 
metrics that I found missing:

* runningContainers
* memorySeconds
* vcoreSeconds
* preemptedResourceMB
* preemptedResourceVCores
* numNonAMContainerPreempted
* numAMContainerPreempted

Please feel fee to add more into the list if you find it's not covered.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3881) Writing RM cluster-level metrics

2015-07-02 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-3881:
-

 Summary: Writing RM cluster-level metrics
 Key: YARN-3881
 URL: https://issues.apache.org/jira/browse/YARN-3881
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen


RM has a bunch of metrics that we may want to write into the timeline backend 
to. I attached the metrics.json that I've crawled via 
{{http://localhost:8088/jmx?qry=Hadoop:*}}. IMHO, we need to pay attention to 
three groups of metrics:

1. QueueMetrics
2. JvmMetrics
3. ClusterMetrics

The problem is that unlike other metrics belongs to a single application, these 
ones belongs to RM or cluster-wide. Therefore, current write path is not going 
to work for these metrics because they don't have the associated user/flow/app 
context info. We need to rethink of modeling cross-app metrics and the api to 
handle them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3116) [Collector wireup] We need an assured way to determine if a container is an AM container on NM

2015-06-18 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-3116.
---
Resolution: Duplicate

Close this jira as the duplicate of YARN-3828, whether the contributor has 
already started working on the issue.

 [Collector wireup] We need an assured way to determine if a container is an 
 AM container on NM
 --

 Key: YARN-3116
 URL: https://issues.apache.org/jira/browse/YARN-3116
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 In YARN-3030, to start the per-app aggregator only for a started AM 
 container,  we need to determine if the container is an AM container or not 
 from the context in NM (we can do it on RM). This information is missing, 
 such that we worked around to considered the container with ID _01 as 
 the AM container. Unfortunately, this is neither necessary or sufficient 
 condition. We need to have a way to determine if a container is an AM 
 container on NM. We can add flag to the container object or create an API to 
 do the judgement. Perhaps the distributed AM information may also be useful 
 to YARN-2877.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3822) Scalability validation of RM writing app/attempt/container lifecycle events

2015-06-17 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-3822:
-

 Summary: Scalability validation of RM writing 
app/attempt/container lifecycle events
 Key: YARN-3822
 URL: https://issues.apache.org/jira/browse/YARN-3822
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, timelineserver
Reporter: Zhijie Shen
Assignee: Naganarasimha G R


We need to test how scalable RM metrics publisher is



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3761) Set delegation token service address at the server side

2015-06-02 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-3761:
-

 Summary: Set delegation token service address at the server side
 Key: YARN-3761
 URL: https://issues.apache.org/jira/browse/YARN-3761
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: security
Reporter: Zhijie Shen


Nowadays, YARN components generate the delegation token without the service 
address set, and leave it to the client to set. With our java client library, 
it is usually fine. However, if users are using REST API, it's going to be a 
problem: The delegation token is returned as a url string. It's so unfriendly 
for the thin client to deserialize the url string, set the token service 
address and serialize it again for further usage. If we move the task of 
setting the service address to the server side, the client can get rid of this 
trouble.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3751) TestAHSWebServices fails after YARN-3467

2015-05-31 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-3751:
-

 Summary: TestAHSWebServices fails after YARN-3467
 Key: YARN-3751
 URL: https://issues.apache.org/jira/browse/YARN-3751
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen


YARN-3467 changed AppInfo and assumed that used resource is not null. It's not 
true as this information is not published to timeline server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3746) NotFoundException(404) will java.lang.IllegalStateException: STREAM when accepting XML as the content

2015-05-29 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-3746:
-

 Summary: NotFoundException(404) will 
java.lang.IllegalStateException: STREAM when accepting XML as the content
 Key: YARN-3746
 URL: https://issues.apache.org/jira/browse/YARN-3746
 Project: Hadoop YARN
  Issue Type: Bug
  Components: webapp
Reporter: Zhijie Shen
Assignee: Zhijie Shen


Both RM and ATS REST API are affected. And the weird thing is that it only 
happens with 404, but not other error code, and it only happens with xml, but 
not json.
{code}
zshens-mbp:Deployment zshen$ curl -H Accept: application/xml -H 
Content-Type:application/xml 
http://localhost:8188/ws/v1/applicationhistory/apps/application_1432863609211_0001
html
head
meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/
titleError 500 STREAM/title
/head
bodyh2HTTP ERROR 500/h2
pProblem accessing 
/ws/v1/applicationhistory/apps/application_1432863609211_0001. Reason:
preSTREAM/pre/ph3Caused 
by:/h3prejava.lang.IllegalStateException: STREAM
at org.mortbay.jetty.Response.getWriter(Response.java:616)
at org.apache.hadoop.yarn.webapp.View.writer(View.java:141)
at org.apache.hadoop.yarn.webapp.view.TextView.writer(TextView.java:39)
at 
org.apache.hadoop.yarn.webapp.view.TextView.echoWithoutEscapeHtml(TextView.java:60)
at 
org.apache.hadoop.yarn.webapp.view.TextView.putWithoutEscapeHtml(TextView.java:80)
at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:81)
at org.apache.hadoop.yarn.webapp.Dispatcher.render(Dispatcher.java:197)
at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:145)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at 
com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
at 
com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
at 
com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
at 
com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
at 
com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
at 
com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
at 
com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
at 
com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:602)
at 
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:277)
at 
org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:554)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1211)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at 

[jira] [Created] (YARN-3723) Need to clearly document primaryFilter and otherInfo value type

2015-05-27 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-3723:
-

 Summary: Need to clearly document primaryFilter and otherInfo 
value type
 Key: YARN-3723
 URL: https://issues.apache.org/jira/browse/YARN-3723
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Critical






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3725) App submission via REST API is broken in secure mode due to Timeline DT service address is empty

2015-05-27 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-3725:
-

 Summary: App submission via REST API is broken in secure mode due 
to Timeline DT service address is empty
 Key: YARN-3725
 URL: https://issues.apache.org/jira/browse/YARN-3725
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, timelineserver
Affects Versions: 2.7.0
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Blocker


YARN-2971 changes TimelineClient to use the service address from Timeline DT to 
renew the DT instead of configured address. This break the procedure of 
submitting an YARN app via REST API in the secure mode.

The problem is that service address is set by the client instead of the server 
in Java code. REST API response is an encode token Sting, such that it's so 
inconvenient to deserialize it and set the service address and serialize it 
again. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3701) Isolating the erro of generating a single app report when getting all apps from generic history service

2015-05-21 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-3701:
-

 Summary: Isolating the erro of generating a single app report when 
getting all apps from generic history service
 Key: YARN-3701
 URL: https://issues.apache.org/jira/browse/YARN-3701
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Blocker


Nowadays, if some error of generating a single app report when getting the 
application list from generic history service, it will throw the exception. 
Therefore, even if it just 1 out of 100 apps has something wrong, the whole app 
list is screwed. The worst impact is making the default page (app list) of GHS 
web UI crash, wile REST API /applicationhistory/apps will also break.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3622) Enable application client to communicate with new timeline service

2015-05-11 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-3622:
-

 Summary: Enable application client to communicate with new 
timeline service
 Key: YARN-3622
 URL: https://issues.apache.org/jira/browse/YARN-3622
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen


YARN application has client and AM. We have the story to make TimelineClient 
work inside AM for v2, but not for client. TimelineClient inside app client 
needs to be taken care of too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3623) Having the config to indicate the timeline service version

2015-05-11 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-3623:
-

 Summary: Having the config to indicate the timeline service version
 Key: YARN-3623
 URL: https://issues.apache.org/jira/browse/YARN-3623
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen


So far RM, MR AM, DA AM added/changed new config to enable the feature to write 
the timeline data to v2 server. It's good to have a YARN 
timeline-service.version config like timeline-service.enable to indicate the 
version of the running timeline service with the given YARN cluster. It's 
beneficial for users to more smoothly move from v1 to v2, as they don't need to 
change the existing config, but switch this config from v1 to v2. And each 
framework doesn't need to have their own v1/v2 config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3588) Timeline entity uniqueness

2015-05-06 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-3588:
-

 Summary: Timeline entity uniqueness
 Key: YARN-3588
 URL: https://issues.apache.org/jira/browse/YARN-3588
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen


In YARN-3051, we have some discussion about how to uniquely identify an entity. 
Sangjin and some other folks propose to only uniquely identify an entity by 
type, id in the scope of a single app. This is different from entity 
uniqueness in ATSv1, where type, id can globally identify an entity. This is 
going to affect the way of fetching a single entity, and raise the 
compatibility issue. Let's continue our discussion here to unblock YARN-3051.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2289) ApplicationHistoryStore should be versioned

2015-05-01 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-2289.
---
Resolution: Won't Fix

We won't do improvement for GHS

 ApplicationHistoryStore should be versioned
 ---

 Key: YARN-2289
 URL: https://issues.apache.org/jira/browse/YARN-2289
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: applications
Reporter: Junping Du
Assignee: Junping Du





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-1688) Rethinking about POJO Classes

2015-05-01 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-1688.
---
Resolution: Won't Fix

 Rethinking about POJO Classes
 -

 Key: YARN-1688
 URL: https://issues.apache.org/jira/browse/YARN-1688
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 We need to think about how the POJO classes evolve. Should we back up them 
 with proto and others.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-1688) Rethinking about POJO Classes

2015-05-01 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-1688.
---
Resolution: Fixed

YARN-3539 will state timeline v1 APIs stable. We won't change v1 pojo classes.

 Rethinking about POJO Classes
 -

 Key: YARN-1688
 URL: https://issues.apache.org/jira/browse/YARN-1688
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 We need to think about how the POJO classes evolve. Should we back up them 
 with proto and others.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-1638) Add an integration test validating post, storage and retrival of entites+events

2015-05-01 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-1638.
---
Resolution: Fixed

We already have integration test in some way, such as in TestDistributedShell

 Add an integration test validating post, storage and retrival of 
 entites+events
 ---

 Key: YARN-1638
 URL: https://issues.apache.org/jira/browse/YARN-1638
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-1530) [Umbrella] Store, manage and serve per-framework application-timeline data

2015-05-01 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-1530.
---
Resolution: Fixed

Timeline service v1 is almost done. Most functionality has been committed 
through multiple versions, but mostly completed before 2.6. There're still a 
few outstanding issues, which are kept open for further discussion.

 [Umbrella] Store, manage and serve per-framework application-timeline data
 --

 Key: YARN-1530
 URL: https://issues.apache.org/jira/browse/YARN-1530
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
 Attachments: ATS-Write-Pipeline-Design-Proposal.pdf, 
 ATS-meet-up-8-28-2014-notes.pdf, application timeline design-20140108.pdf, 
 application timeline design-20140116.pdf, application timeline 
 design-20140130.pdf, application timeline design-20140210.pdf


 This is a sibling JIRA for YARN-321.
 Today, each application/framework has to do store, and serve per-framework 
 data all by itself as YARN doesn't have a common solution. This JIRA attempts 
 to solve the storage, management and serving of per-framework data from 
 various applications, both running and finished. The aim is to change YARN to 
 collect and store data in a generic manner with plugin points for frameworks 
 to do their own thing w.r.t interpretation and serving.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2307) Capacity scheduler user only ADMINISTER_QUEUE also can submit app

2015-05-01 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-2307.
---
Resolution: Invalid

You probably miss setting {{yarn.acl.enable=true}} in yarn-site.xml. Close if 
for now. Feel free to reopen if it's not your case.

 Capacity scheduler user only ADMINISTER_QUEUE also can submit app 
 --

 Key: YARN-2307
 URL: https://issues.apache.org/jira/browse/YARN-2307
 Project: Hadoop YARN
  Issue Type: Bug
  Components: capacityscheduler
Affects Versions: 2.3.0
 Environment: hadoop 2.3.0  centos6.5  jdk1.7
Reporter: tangjunjie
Priority: Minor

 Queue acls for user :  root
 Queue  Operations
 =
 root  
 default  
 china  ADMINISTER_QUEUE
 unfunded 
 user root only have ADMINISTER_QUEUE  but user root can sumbit app to
 china queue



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2060) Add an admin module for the timeline server

2015-05-01 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-2060.
---
Resolution: Won't Fix

We won't add new feature to ATS v1

 Add an admin module for the timeline server
 ---

 Key: YARN-2060
 URL: https://issues.apache.org/jira/browse/YARN-2060
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 Like the job history server, it's good to have an admin module for the 
 timeline server to allow the admin to manage the server on the fly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2626) Document of timeline server needs to be updated

2015-05-01 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-2626.
---
Resolution: Duplicate

YARN-3539 is updating it. Close this one.

 Document of timeline server needs to be updated
 ---

 Key: YARN-2626
 URL: https://issues.apache.org/jira/browse/YARN-2626
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 2.6.0
Reporter: Zhijie Shen

 YARN-2033, the document is no longer accurate.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2286) RM HA and failover test cases failed ocasionally

2015-05-01 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-2286.
---
Resolution: Cannot Reproduce

Close it as I cannot reproduce them now locally

 RM HA and failover test cases failed ocasionally
 

 Key: YARN-2286
 URL: https://issues.apache.org/jira/browse/YARN-2286
 Project: Hadoop YARN
  Issue Type: Test
Affects Versions: 3.0.0, 2.5.0
Reporter: Zhijie Shen
  Labels: test, test-fail

 * TestApplicationClientProtocolOnHA.testCancelDelegationTokenOnHA
 ** See https://builds.apache.org/job/PreCommit-YARN-Build/4271//testReport/
 * TestRMFailover.testAutomaticFailover
 ** See https://builds.apache.org/job/PreCommit-YARN-Build/4277//testReport/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2294) Update sample program and documentations for writing YARN Application

2015-05-01 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-2294.
---
   Resolution: Fixed
Fix Version/s: 2.6.0

 Update sample program and documentations for writing YARN Application
 -

 Key: YARN-2294
 URL: https://issues.apache.org/jira/browse/YARN-2294
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Li Lu
 Fix For: 2.6.0


 Many APIs for writing YARN applications have been stabilized. However, some 
 of them have also been changed since the last time sample YARN program, like 
 distributed shell, and documentations were updated. There are on-going 
 discussions in the user's mailing list about updating the outdated Writing 
 YARN Applications documentation. Updating the sample programs like 
 distributed shells is also needed, since they may probably be the very first 
 demonstration of YARN applications for newcomers. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2043) Rename internal names to being Timeline Service instead of application history

2015-05-01 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-2043.
---
Resolution: Won't Fix

We won't refactor ATS v1 any more

 Rename internal names to being Timeline Service instead of application history
 --

 Key: YARN-2043
 URL: https://issues.apache.org/jira/browse/YARN-2043
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Naganarasimha G R

 Like package and class names. In line with YARN-2033, YARN-1982 etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2309) NPE during RM-Restart test scenario

2015-05-01 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-2309.
---
Resolution: Duplicate

 NPE during RM-Restart test scenario
 ---

 Key: YARN-2309
 URL: https://issues.apache.org/jira/browse/YARN-2309
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Nishan Shetty
Priority: Minor

 During RMRestart test scenarios, we met with below exception. 
 A point to note here is, Zookeeper also was not stable during this testing, 
 we could see many Zookeeper exception before getting this NPE
 {code}
 2014-07-10 10:49:46,817 WARN org.apache.hadoop.service.AbstractService: When 
 stopping the service 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService : 
 java.lang.NullPointerException
 java.lang.NullPointerException
   at 
 org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.serviceStop(EmbeddedElectorService.java:108)
   at 
 org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
   at 
 org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
   at 
 org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:171)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.AdminService.serviceInit(AdminService.java:125)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:232)
   at 
 org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1039)
 {code}
 Zookeeper Exception
 {code}
 2014-07-10 10:49:46,816 INFO org.apache.hadoop.service.AbstractService: 
 Service org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService 
 failed in state INITED; cause: 
 org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
 = ConnectionLoss
 org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
 = ConnectionLoss
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
   at 
 org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.waitForZKConnectionEvent(ActiveStandbyElector.java:1046)
   at 
 org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef.access$400(ActiveStandbyElector.java:1017)
   at 
 org.apache.hadoop.ha.ActiveStandbyElector.getNewZooKeeper(ActiveStandbyElector.java:632)
   at 
 org.apache.hadoop.ha.ActiveStandbyElector.createConnection(ActiveStandbyElector.java:766)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-321) Generic application history service

2015-05-01 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-321.
--
Resolution: Fixed

Close this umbrella jira with few sub tasks open. Generic history service has 
been implemented and rides on timeline server. YARN-2271 is left open to track 
one possible performance issue to fetch all the applications stored in the 
timeline store.

 Generic application history service
 ---

 Key: YARN-321
 URL: https://issues.apache.org/jira/browse/YARN-321
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Luke Lu
 Attachments: AHS Diagram.pdf, ApplicationHistoryServiceHighLevel.pdf, 
 Generic Application History - Design-20131219.pdf, HistoryStorageDemo.java


 The mapreduce job history server currently needs to be deployed as a trusted 
 server in sync with the mapreduce runtime. Every new application would need a 
 similar application history server. Having to deploy O(T*V) (where T is 
 number of type of application, V is number of version of application) trusted 
 servers is clearly not scalable.
 Job history storage handling itself is pretty generic: move the logs and 
 history data into a particular directory for later serving. Job history data 
 is already stored as json (or binary avro). I propose that we create only one 
 trusted application history server, which can have a generic UI (display json 
 as a tree of strings) as well. Specific application/version can deploy 
 untrusted webapps (a la AMs) to query the application history server and 
 interpret the json for its specific UI and/or analytics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2021) Allow AM to set failed final status

2015-05-01 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-2021.
---
Resolution: Invalid

 Allow AM to set failed final status
 ---

 Key: YARN-2021
 URL: https://issues.apache.org/jira/browse/YARN-2021
 Project: Hadoop YARN
  Issue Type: Improvement
Reporter: Jakob Homan

 Background: SAMZA-117. It would be good if an AM were able to signal via its 
 final status the job itself has failed, even if the AM itself has finished up 
 in a tidy fashion.  It would be good if either (a) the AM can signal a final 
 status of failed and exit cleanly, or (b) we had another status, says 
 Application Failed, to indicate that the AM itself gave up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2239) Rename ClusterMetrics#getUnhealthyNMs() to getNumUnhealthyNMs()

2015-05-01 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-2239.
---
Resolution: Invalid

Change in ClusterMetricsInfo is incompatible. The name has been used since 2.4. 
Let's keep to it. Feel free to reopen it if you have different thoughts.

 Rename ClusterMetrics#getUnhealthyNMs() to getNumUnhealthyNMs()
 ---

 Key: YARN-2239
 URL: https://issues.apache.org/jira/browse/YARN-2239
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: resourcemanager
Affects Versions: 2.4.0
Reporter: Kenji Kikushima
Assignee: Kenji Kikushima
Priority: Trivial
 Attachments: YARN-2239.patch


 In ClusterMetrics, other get NMs() methods have Num prefix. (Ex. 
 getNumLostNMs()/getNumRebootedNMs())
 For naming consistency, we should rename getUnhealthyNMs() to 
 getNumUnhealthyNMs().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2225) Turn the virtual memory check to be off by default

2015-05-01 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-2225.
---
Resolution: Invalid

Close the jira according to the comments so far. Feel free to reopen it if 
someone has other thoughts.

 Turn the virtual memory check to be off by default
 --

 Key: YARN-2225
 URL: https://issues.apache.org/jira/browse/YARN-2225
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Anubhav Dhoot
Assignee: Anubhav Dhoot
 Attachments: YARN-2225.patch


 The virtual memory check may not be the best way to isolate applications. 
 Virtual memory is not the constrained resource. It would be better if we 
 limit the swapping of the task using swapiness instead. This patch will turn 
 this DEFAULT_NM_VMEM_CHECK_ENABLED off by default and let users turn it on if 
 they need to.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2218) TestSubmitApplicationWithRMHA fails intermittently in trunk

2015-05-01 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-2218.
---
Resolution: Cannot Reproduce

Ran the test case locally, and it didn't fail. Close it now. Feel free to 
reopen it if it happens again.

 TestSubmitApplicationWithRMHA fails intermittently in trunk
 ---

 Key: YARN-2218
 URL: https://issues.apache.org/jira/browse/YARN-2218
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Ashwin Shankar

 org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA
 testGetApplicationReportIdempotent(org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA)
   Time elapsed: 2.536 sec   FAILURE!
 java.lang.AssertionError: expected:ACCEPTED but was:SUBMITTED
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:144)
   at 
 org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA.testGetApplicationReportIdempotent(TestSubmitApplicationWithRMHA.java:211)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2101) Document the system filters of the timeline entity

2015-05-01 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-2101.
---
Resolution: Invalid

After changing to domain access control, this system filter is no longer 
necessary.

 Document the system filters of the timeline entity
 --

 Key: YARN-2101
 URL: https://issues.apache.org/jira/browse/YARN-2101
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 In Yarn-1937, to support ACLs, we have reserved a filter name for the 
 timeline server to use, which should not be used by the users. We need to 
 document the system filter explicitly to notify users not using it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-1935) Security for timeline server

2015-05-01 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-1935.
---
Resolution: Fixed

Close the umbrella jira. The only left issue is to put generic history data in 
a non-default domain in secure scenario. Since we don't go on to develop new 
feature for ATS v1, we can leave that jira  (YARN-2622) open and see if we have 
the supporting requirement for it.

 Security for timeline server
 

 Key: YARN-1935
 URL: https://issues.apache.org/jira/browse/YARN-1935
 Project: Hadoop YARN
  Issue Type: New Feature
Reporter: Arun C Murthy
Assignee: Zhijie Shen
 Attachments: Timeline Security Diagram.pdf, 
 Timeline_Kerberos_DT_ACLs.2.patch, Timeline_Kerberos_DT_ACLs.patch


 Jira to track work to secure the ATS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-1794) Yarn CLI only shows running containers for Running Applications

2015-05-01 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-1794.
---
Resolution: Fixed

Won't implement new feature for generic history service now.

 Yarn CLI only shows running containers for Running Applications
 ---

 Key: YARN-1794
 URL: https://issues.apache.org/jira/browse/YARN-1794
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-1794) Yarn CLI only shows running containers for Running Applications

2015-05-01 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-1794.
---
Resolution: Won't Fix

 Yarn CLI only shows running containers for Running Applications
 ---

 Key: YARN-1794
 URL: https://issues.apache.org/jira/browse/YARN-1794
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Mayank Bansal
Assignee: Mayank Bansal





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-1744) Renaming applicationhistoryservice module

2015-05-01 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-1744.
---
Resolution: Won't Fix

ATS v2 starts from a new fresh sub module. Won't fix the current naming

 Renaming applicationhistoryservice module
 -

 Key: YARN-1744
 URL: https://issues.apache.org/jira/browse/YARN-1744
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 When we started with the feature, the module only contains the source code of 
 generic application history service, therefore, it was named 
 applicationhistoryservice. However, as time goes on, we have been moving on 
 with per framework historic data (see YARN-1530). The code base of this 
 module has already gone beyond generic application history service, and 
 include timeline service as well. It's good to come up a more accurate name 
 to describe the project asap to prevent people from being confused by the 
 module name about what service it can provide. Probably we need to refactor 
 the AHS related classes as well for clarity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2522) AHSClient may be not necessary

2015-05-01 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-2522.
---
Resolution: Won't Fix

Won't do the refacotor work

 AHSClient may be not necessary
 --

 Key: YARN-2522
 URL: https://issues.apache.org/jira/browse/YARN-2522
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 Per discussion in 
 [YARN-2033|https://issues.apache.org/jira/browse/YARN-2033?focusedCommentId=14126073page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14126073],
  it may be not necessary to have a separate AHSClient. The methods can be 
 incorporated into TimelineClient. APPLICATION_HISTORY_ENABLED is also useless 
 then.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-1834) YarnClient will not be redirected to the history server when RM is done

2015-05-01 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-1834.
---
Resolution: Won't Fix

We won't improve generic history service now

 YarnClient will not be redirected to the history server when RM is done
 ---

 Key: YARN-1834
 URL: https://issues.apache.org/jira/browse/YARN-1834
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen

 When RM is not available, the client will keep retrying on RM, such that it 
 won't reach the history server to get the app/atttempt/container's info. 
 Therefore, during RM restart, such a request will be blocked. However, it has 
 the opportunity to move on given history service is enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-1524) Make aggregated logs of completed containers available via REST API

2015-05-01 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-1524.
---
Resolution: Won't Fix

We add feature for GHS now.

 Make aggregated logs of completed containers available via REST API
 ---

 Key: YARN-1524
 URL: https://issues.apache.org/jira/browse/YARN-1524
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Robert Kanter





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3563) Completed app shows -1 running containers on RM web UI

2015-04-29 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-3563:
-

 Summary: Completed app shows -1 running containers on RM web UI
 Key: YARN-3563
 URL: https://issues.apache.org/jira/browse/YARN-3563
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen


See the attached screenshot. I saw this issue with trunk. Not sure if it exists 
in branch-2.7 too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3563) Completed app shows -1 running containers on RM web UI

2015-04-29 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-3563.
---
Resolution: Duplicate

Didn't notice YARN-3563. Close this one as a duplicate

 Completed app shows -1 running containers on RM web UI
 --

 Key: YARN-3563
 URL: https://issues.apache.org/jira/browse/YARN-3563
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager, webapp
Reporter: Zhijie Shen
 Attachments: Screen Shot 2015-04-29 at 2.11.19 PM.png


 See the attached screenshot. I saw this issue with trunk. Not sure if it 
 exists in branch-2.7 too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3551) Consolidate data model change according the backend implementation

2015-04-27 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-3551:
-

 Summary: Consolidate data model change according the backend 
implementation
 Key: YARN-3551
 URL: https://issues.apache.org/jira/browse/YARN-3551
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen


Based on the comments on 
[YARN-3134|https://issues.apache.org/jira/browse/YARN-3134?focusedCommentId=14512080page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14512080]
 and 
[YARN-3411|https://issues.apache.org/jira/browse/YARN-3411?focusedCommentId=14512098page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14512098],
 we need to change the data model to restrict the data type of 
info/config/metric section.

1. Info: the value could be all kinds object that is able to be 
serialized/deserialized by jackson.

2. Config: the value will always be assumed as String.

3. Metric: single data or time series value have to be number for aggregation.

Other than that, info/start time/finish time of metric seem not to be necessary 
for storage. They should be removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2032) Implement a scalable, available TimelineStore using HBase

2015-04-23 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-2032.
---
Resolution: Won't Fix

It will be covered in YARN-2928

 Implement a scalable, available TimelineStore using HBase
 -

 Key: YARN-2032
 URL: https://issues.apache.org/jira/browse/YARN-2032
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Vinod Kumar Vavilapalli
Assignee: Li Lu
 Attachments: YARN-2032-091114.patch, YARN-2032-branch-2-1.patch, 
 YARN-2032-branch2-2.patch


 As discussed on YARN-1530, we should pursue implementing a scalable, 
 available Timeline store using HBase.
 One goal is to reuse most of the code from the levelDB Based store - 
 YARN-1635.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3541) Add version info on timeline service / generic history web UI and RES API

2015-04-23 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-3541:
-

 Summary: Add version info on timeline service / generic history 
web UI and RES API
 Key: YARN-3541
 URL: https://issues.apache.org/jira/browse/YARN-3541
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3522) DistributedShell uses the wrong user to put timeline data

2015-04-21 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-3522:
-

 Summary: DistributedShell uses the wrong user to put timeline data
 Key: YARN-3522
 URL: https://issues.apache.org/jira/browse/YARN-3522
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Blocker


YARN-3287 breaks the timeline access control of distributed shell. In 
distributed shell AM:

{code}
if (conf.getBoolean(YarnConfiguration.TIMELINE_SERVICE_ENABLED,
  YarnConfiguration.DEFAULT_TIMELINE_SERVICE_ENABLED)) {
  // Creating the Timeline Client
  timelineClient = TimelineClient.createTimelineClient();
  timelineClient.init(conf);
  timelineClient.start();
} else {
  timelineClient = null;
  LOG.warn(Timeline service is not enabled);
}
{code}

{code}
  ugi.doAs(new PrivilegedExceptionActionTimelinePutResponse() {
@Override
public TimelinePutResponse run() throws Exception {
  return timelineClient.putEntities(entity);
}
  });
{code}

YARN-3287 changes the timeline client to get the right ugi at serviceInit, but 
DS AM still doesn't use submitter ugi to init timeline client, but use the ugi 
for each put entity call. It result in the wrong user of the put request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3509) CollectorNodemanagerProtocol's authorization doesn't work

2015-04-20 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-3509:
-

 Summary: CollectorNodemanagerProtocol's authorization doesn't work
 Key: YARN-3509
 URL: https://issues.apache.org/jira/browse/YARN-3509
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager, security, timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3471) Fix timeline client retry

2015-04-09 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-3471:
-

 Summary: Fix timeline client retry
 Key: YARN-3471
 URL: https://issues.apache.org/jira/browse/YARN-3471
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen


I found that the client retry has some problems:

1. The new put methods will retry on all exception, but they should only do it 
upon ConnectException.
2. We can reuse TimelineClientConnectionRetry to simplify the retry logic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3461) Consolidate flow name/version/run defaults

2015-04-07 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-3461:
-

 Summary: Consolidate flow name/version/run defaults
 Key: YARN-3461
 URL: https://issues.apache.org/jira/browse/YARN-3461
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen


In YARN-3391, it's not resolved what should be the defaults for flow 
name/version/run. Let's continue the discussion here and unblock YARN-3391 from 
moving forward.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3430) RMAppAttempt headroom data is missing in RM Web UI

2015-04-06 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-3430.
---
Resolution: Fixed

After pull YARN-3273 into branch-2.7. Commit this patch again into branch-2.7

 RMAppAttempt headroom data is missing in RM Web UI
 --

 Key: YARN-3430
 URL: https://issues.apache.org/jira/browse/YARN-3430
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: resourcemanager, webapp, yarn
Reporter: Xuan Gong
Assignee: Xuan Gong
Priority: Blocker
 Fix For: 2.7.0

 Attachments: YARN-3430.1.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3334) [Event Producers] NM TimelineClient container metrics posting to new timeline service.

2015-04-06 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-3334.
---
   Resolution: Fixed
Fix Version/s: YARN-2928
 Hadoop Flags: Reviewed

Committed the patch to branch YARN-2928. Thanks for the patch, Junping! Thanks 
for review, Sangjin and Li!

 [Event Producers] NM TimelineClient container metrics posting to new timeline 
 service.
 --

 Key: YARN-3334
 URL: https://issues.apache.org/jira/browse/YARN-3334
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Affects Versions: YARN-2928
Reporter: Junping Du
Assignee: Junping Du
 Fix For: YARN-2928

 Attachments: YARN-3334-demo.patch, YARN-3334-v1.patch, 
 YARN-3334-v2.patch, YARN-3334-v3.patch, YARN-3334-v4.patch, 
 YARN-3334-v5.patch, YARN-3334-v6.patch, YARN-3334-v8.patch, YARN-3334.7.patch


 After YARN-3039, we have service discovery mechanism to pass app-collector 
 service address among collectors, NMs and RM. In this JIRA, we will handle 
 service address setting for TimelineClients in NodeManager, and put container 
 metrics to the backend storage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3431) Sub resources of timeline entity needs to be passed to a separate endpoint.

2015-04-01 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-3431:
-

 Summary: Sub resources of timeline entity needs to be passed to a 
separate endpoint.
 Key: YARN-3431
 URL: https://issues.apache.org/jira/browse/YARN-3431
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3399) Default cluster ID for RM HA

2015-03-25 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-3399:
-

 Summary: Default cluster ID for RM HA
 Key: YARN-3399
 URL: https://issues.apache.org/jira/browse/YARN-3399
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Zhijie Shen


In YARN-3040, timeline service will set the default cluster ID if users don't 
provide one. RM HA's current behavior is a bit different when users don't 
provide cluster ID. IllegalArgumentException will throw instead. Let's continue 
the discussion if RM HA needs the default cluster ID or not here, and what's 
the proper default cluster ID.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3393) Getting application(s) goes wrong when app finishes before starting the attempt

2015-03-23 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-3393:
-

 Summary: Getting application(s) goes wrong when app finishes 
before starting the attempt
 Key: YARN-3393
 URL: https://issues.apache.org/jira/browse/YARN-3393
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Critical


When generating app report in ApplicationHistoryManagerOnTimelineStore, it 
checks if appAttempt == null.
{code}
ApplicationAttemptReport appAttempt = 
getApplicationAttempt(app.appReport.getCurrentApplicationAttemptId());
if (appAttempt != null) {
  app.appReport.setHost(appAttempt.getHost());
  app.appReport.setRpcPort(appAttempt.getRpcPort());
  app.appReport.setTrackingUrl(appAttempt.getTrackingUrl());
  
app.appReport.setOriginalTrackingUrl(appAttempt.getOriginalTrackingUrl());
}
{code}

However, {{getApplicationAttempt}} doesn't return null but throws 
ApplicationAttemptNotFoundException:
{code}
if (entity == null) {
  throw new ApplicationAttemptNotFoundException(
  The entity for application attempt  + appAttemptId +
   doesn't exist in the timeline store);
} else {
  return convertToApplicationAttemptReport(entity);
}
{code}
They code isn't coupled well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3390) RMTimelineCollector should have the context info of each app

2015-03-23 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-3390:
-

 Summary: RMTimelineCollector should have the context info of each 
app
 Key: YARN-3390
 URL: https://issues.apache.org/jira/browse/YARN-3390
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen


RMTimelineCollector should have the context info of each app whose entity  has 
been put



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3391) Clearly define flow ID/ flow run / flow version in API and storage

2015-03-23 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-3391:
-

 Summary: Clearly define flow ID/ flow run / flow version in API 
and storage
 Key: YARN-3391
 URL: https://issues.apache.org/jira/browse/YARN-3391
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen


To continue the discussion in YARN-3040, let's figure out the best way to 
describe the flow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3377) TestTimelineServiceClientIntegration fails

2015-03-20 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-3377.
---
   Resolution: Fixed
Fix Version/s: YARN-2928
 Hadoop Flags: Reviewed

+1 for the patch. Committed it to branch YARN-2928. Thanks, Sangjin!

 TestTimelineServiceClientIntegration fails
 --

 Key: YARN-3377
 URL: https://issues.apache.org/jira/browse/YARN-3377
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee
Priority: Minor
 Fix For: YARN-2928

 Attachments: YARN-3377.001.patch


 TestTimelineServiceClientIntegration fails. It appears we are getting 500 
 from the timeline collector. This appears to be mostly an issue with the test 
 itself.
 {noformat}
 ---
 Test set: 
 org.apache.hadoop.yarn.server.timelineservice.TestTimelineServiceClientIntegration
 ---
 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 33.503 sec 
  FAILURE! - in 
 org.apache.hadoop.yarn.server.timelineservice.TestTimelineServiceClientIntegration
 testPutEntities(org.apache.hadoop.yarn.server.timelineservice.TestTimelineServiceClientIntegration)
   Time elapsed: 32.606 sec   ERROR!
 org.apache.hadoop.yarn.exceptions.YarnException: Failed to get the response 
 from the timeline server.
   at 
 org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putObjects(TimelineClientImpl.java:457)
   at 
 org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putObjects(TimelineClientImpl.java:391)
   at 
 org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:368)
   at 
 org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:342)
   at 
 org.apache.hadoop.yarn.server.timelineservice.TestTimelineServiceClientIntegration.testPutEntities(TestTimelineServiceClientIntegration.java:74)
 {noformat}
 The relevant piece from the server side:
 {noformat}
 Mar 19, 2015 10:48:30 AM com.sun.jersey.api.core.PackagesResourceConfig init
 INFO: Scanning for root resource and provider classes in the packages:
   org.apache.hadoop.yarn.server.timelineservice.collector
   org.apache.hadoop.yarn.webapp
   org.apache.hadoop.yarn.webapp
 Mar 19, 2015 10:48:30 AM com.sun.jersey.api.core.ScanningResourceConfig 
 logClasses
 INFO: Root resource classes found:
   class org.apache.hadoop.yarn.webapp.MyTestWebService
   class 
 org.apache.hadoop.yarn.server.timelineservice.collector.TimelineCollectorWebService
 Mar 19, 2015 10:48:30 AM com.sun.jersey.api.core.ScanningResourceConfig 
 logClasses
 INFO: Provider classes found:
   class org.apache.hadoop.yarn.webapp.YarnJacksonJaxbJsonProvider
   class org.apache.hadoop.yarn.webapp.GenericExceptionHandler
   class org.apache.hadoop.yarn.webapp.MyTestJAXBContextResolver
 Mar 19, 2015 10:48:30 AM 
 com.sun.jersey.server.impl.application.WebApplicationImpl _initiate
 INFO: Initiating Jersey application, version 'Jersey: 1.9 09/02/2011 11:17 AM'
 Mar 19, 2015 10:48:31 AM 
 com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator$8 
 resolve
 SEVERE: null
 java.lang.IllegalAccessException: Class 
 com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator$8 can 
 not access a member of class 
 org.apache.hadoop.yarn.webapp.MyTestWebService$MyInfo with modifiers public
   at sun.reflect.Reflection.ensureMemberAccess(Reflection.java:95)
   at java.lang.Class.newInstance0(Class.java:366)
   at java.lang.Class.newInstance(Class.java:325)
   at 
 com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator$8.resolve(WadlGeneratorJAXBGrammarGenerator.java:467)
   at 
 com.sun.jersey.server.wadl.WadlGenerator$ExternalGrammarDefinition.resolve(WadlGenerator.java:181)
   at 
 com.sun.jersey.server.wadl.ApplicationDescription.resolve(ApplicationDescription.java:81)
   at 
 com.sun.jersey.server.wadl.generators.WadlGeneratorJAXBGrammarGenerator.attachTypes(WadlGeneratorJAXBGrammarGenerator.java:518)
   at com.sun.jersey.server.wadl.WadlBuilder.generate(WadlBuilder.java:124)
   at 
 com.sun.jersey.server.impl.wadl.WadlApplicationContextImpl.getApplication(WadlApplicationContextImpl.java:104)
   at 
 com.sun.jersey.server.impl.wadl.WadlApplicationContextImpl.getApplication(WadlApplicationContextImpl.java:120)
   at 
 com.sun.jersey.server.impl.wadl.WadlMethodFactory$WadlOptionsMethodDispatcher.dispatch(WadlMethodFactory.java:98)
   at 
 com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
   at 
 

[jira] [Created] (YARN-3374) Aggregator's web server should randomly bind an available port

2015-03-19 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-3374:
-

 Summary: Aggregator's web server should randomly bind an available 
port
 Key: YARN-3374
 URL: https://issues.apache.org/jira/browse/YARN-3374
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen


It's based on the configuration now. The approach won't work if we move to 
app-level aggregator container solution. On NM my start multiple such 
aggregators, which cannot bind to the same configured port.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3039) [Aggregator wireup] Implement ATS app-appgregator service discovery

2015-03-17 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-3039.
---
   Resolution: Fixed
Fix Version/s: YARN-2928
 Hadoop Flags: Reviewed

Committed the patch to branch YARN-2928. Thanks for the patch, Junping! Thanks 
for review, Sangjin!

 [Aggregator wireup] Implement ATS app-appgregator service discovery
 ---

 Key: YARN-3039
 URL: https://issues.apache.org/jira/browse/YARN-3039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Junping Du
 Fix For: YARN-2928

 Attachments: Service Binding for applicationaggregator of ATS 
 (draft).pdf, Service Discovery For Application Aggregator of ATS (v2).pdf, 
 YARN-3039-no-test.patch, YARN-3039-v2-incomplete.patch, 
 YARN-3039-v3-core-changes-only.patch, YARN-3039-v4.patch, YARN-3039-v5.patch, 
 YARN-3039-v6.patch, YARN-3039-v7.patch, YARN-3039-v8.patch, YARN-3039.9.patch


 Per design in YARN-2928, implement ATS writer service discovery. This is 
 essential for off-node clients to send writes to the right ATS writer. This 
 should also handle the case of AM failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3338) Exclude jline dependency from YARN

2015-03-11 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-3338:
-

 Summary: Exclude jline dependency from YARN
 Key: YARN-3338
 URL: https://issues.apache.org/jira/browse/YARN-3338
 Project: Hadoop YARN
  Issue Type: Bug
  Components: build
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Blocker


It was fixed in YARN-2815, but is broken again by YARN-1514.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3031) [Storage abstraction] Create backing storage write interface for ATS writers

2015-03-04 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-3031.
---
Resolution: Duplicate

Since the patch there covers the code of the writer interface. Let's resolve 
this one as the duplicate of YARN-3264.

 [Storage abstraction] Create backing storage write interface for ATS writers
 

 Key: YARN-3031
 URL: https://issues.apache.org/jira/browse/YARN-3031
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
 Attachments: Sequence_diagram_write_interaction.2.png, 
 Sequence_diagram_write_interaction.png, YARN-3031.01.patch, 
 YARN-3031.02.patch, YARN-3031.03.patch


 Per design in YARN-2928, come up with the interface for the ATS writer to 
 write to various backing storages. The interface should be created to capture 
 the right level of abstractions so that it will enable all backing storage 
 implementations to implement it efficiently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3125) [Event producers] Change distributed shell to use new timeline service

2015-02-27 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-3125.
---
   Resolution: Fixed
Fix Version/s: YARN-2928
 Hadoop Flags: Reviewed

Committed to branch YARN-2928. Thanks for the patch, Junping and Li!

 [Event producers] Change distributed shell to use new timeline service
 --

 Key: YARN-3125
 URL: https://issues.apache.org/jira/browse/YARN-3125
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Junping Du
 Fix For: YARN-2928

 Attachments: YARN-3125.patch, YARN-3125_UT-022615.patch, 
 YARN-3125_UT-022715.patch, YARN-3125v2.patch, YARN-3125v3.patch


 We can start with changing distributed shell to use new timeline service once 
 the framework is completed, in which way we can quickly verify the next gen 
 is working fine end-to-end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3240) [Data Mode] Implement client API to put generic entities

2015-02-20 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-3240:
-

 Summary: [Data Mode] Implement client API to put generic entities
 Key: YARN-3240
 URL: https://issues.apache.org/jira/browse/YARN-3240
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3196) [Compatibility] Make TS next gen be compatible with the current TS

2015-02-13 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-3196:
-

 Summary: [Compatibility] Make TS next gen be compatible with the 
current TS
 Key: YARN-3196
 URL: https://issues.apache.org/jira/browse/YARN-3196
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen


File a jira to make sure that we don't forget to be compatible with the current 
TS, such that we can smoothly move users to new TS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3043) [Data Model] Create ATS configuration, metadata, etc. as part of entities

2015-02-12 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-3043.
---
Resolution: Duplicate

Let's make the all-inclusive data model definition in YARN-3041.

 [Data Model] Create ATS configuration, metadata, etc. as part of entities
 -

 Key: YARN-3043
 URL: https://issues.apache.org/jira/browse/YARN-3043
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Varun Saxena

 Per design in YARN-2928, create APIs for configuration, metadata, etc. and 
 integrate them into entities.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3042) [Data Model] Create ATS metrics API

2015-02-12 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-3042.
---
Resolution: Duplicate

Let's make the all-inclusive data model definition in YARN-3041.

 [Data Model] Create ATS metrics API
 ---

 Key: YARN-3042
 URL: https://issues.apache.org/jira/browse/YARN-3042
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Siddharth Wagle

 Per design in YARN-2928, create the ATS metrics API and integrate it into the 
 entities.
 The concept may be based on the existing hadoop metrics, but we want to make 
 sure we have something that would satisfy all ATS use cases.
 It also needs to capture whether a metric should be aggregated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3150) Documenting the timeline service v2

2015-02-05 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-3150:
-

 Summary: Documenting the timeline service v2
 Key: YARN-3150
 URL: https://issues.apache.org/jira/browse/YARN-3150
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen


Let's make sure we will have a document to describe what's new in TS v2, the 
APIs, the client libs and so on. We should do better around documentation in v2 
than v1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3134) Exploiting the option of using Phoenix to access HBase backend

2015-02-03 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-3134:
-

 Summary: Exploiting the option of using Phoenix to access HBase 
backend
 Key: YARN-3134
 URL: https://issues.apache.org/jira/browse/YARN-3134
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen


Quote the introduction on Phoenix web page:

{code}
Apache Phoenix is a relational database layer over HBase delivered as a 
client-embedded JDBC driver targeting low latency queries over HBase data. 
Apache Phoenix takes your SQL query, compiles it into a series of HBase scans, 
and orchestrates the running of those scans to produce regular JDBC result 
sets. The table metadata is stored in an HBase table and versioned, such that 
snapshot queries over prior versions will automatically use the correct schema. 
Direct use of the HBase API, along with coprocessors and custom filters, 
results in performance on the order of milliseconds for small queries, or 
seconds for tens of millions of rows.
{code}

It may simply our implementation read/write data from/to HBase, and can easily 
build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3123) Make YARN CLI show a single completed container even if the app is running

2015-02-02 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-3123:
-

 Summary: Make YARN CLI show a single completed container even if 
the app is running
 Key: YARN-3123
 URL: https://issues.apache.org/jira/browse/YARN-3123
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client
Reporter: Zhijie Shen


Like YARN-2808, we can do the improvement for the single container command too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3125) Change distributed shell to use new timeline service

2015-02-02 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-3125:
-

 Summary: Change distributed shell to use new timeline service
 Key: YARN-3125
 URL: https://issues.apache.org/jira/browse/YARN-3125
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen


We can start with changing distributed shell to use new timeline service once 
the framework is completed, in which way we can quickly verify the next gen is 
working fine end-to-end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3115) Work-preserving restarting of per-node aggregator

2015-01-29 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-3115:
-

 Summary: Work-preserving restarting of per-node aggregator
 Key: YARN-3115
 URL: https://issues.apache.org/jira/browse/YARN-3115
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen


YARN-3030 makes the per-node aggregator work as the aux service of a NM. It 
contains the states of the per-app aggregators corresponding to the running AM 
containers on this NM. While NM is restarted in work-preserving mode, this 
information of per-node aggregator needs to be carried on over restarting too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3030) set up ATS writer with basic request serving structure and lifecycle

2015-01-27 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-3030.
---
   Resolution: Fixed
Fix Version/s: YARN-2928

Committed the patch to branch YARN-2928. Thanks, Sangjin!

 set up ATS writer with basic request serving structure and lifecycle
 

 Key: YARN-3030
 URL: https://issues.apache.org/jira/browse/YARN-3030
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Sangjin Lee
 Fix For: YARN-2928

 Attachments: YARN-3030.001.patch, YARN-3030.002.patch, 
 YARN-3030.003.patch, YARN-3030.004.patch


 Per design in YARN-2928, create an ATS writer as a service, and implement the 
 basic service structure including the lifecycle management.
 Also, as part of this JIRA, we should come up with the ATS client API for 
 sending requests to this ATS writer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3062) timelineserver gives inconsistent data for otherinfo field based on the filter param

2015-01-15 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-3062.
---
Resolution: Invalid

Thanks for your confirmation, [~pramachandran]! Close the Jira.

 timelineserver gives inconsistent data for otherinfo field based on the 
 filter param
 

 Key: YARN-3062
 URL: https://issues.apache.org/jira/browse/YARN-3062
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.4.0, 2.5.0, 2.6.0
Reporter: Prakash Ramachandran
 Attachments: withfilter.json, withoutfilter.json


 When otherinfo field gets updated, in some cases the data returned for an 
 entity is dependent on the filter usage. 
 for ex in the attached files for the 
 - entity: vertex_1421164610335_0020_1_01,
 - entitytype: TEZ_VERTEX_ID,
 for the otherinfo.numTasks,  got updated from 1009 to 253
 - using 
 {code}http://machine:8188/ws/v1/timeline/TEZ_VERTEX_ID/vertex_1421164610335_0020_1_01/
  {code} gives the updated value: 253
 - using 
 {code}http://cn042-10:8188/ws/v1/timeline/TEZ_VERTEX_ID?limit=11primaryFilter=TEZ_DAG_ID%3Adag_1421164610335_0020_1{code}
  gives the old value: 1009
  
 for the otherinfo.status field, which gets updated,   both of them show the 
 updated value. 
 TEZ-1942 has more details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-3063) Bootstrap TimelineServer Next Gen Module

2015-01-14 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-3063:
-

 Summary: Bootstrap TimelineServer Next Gen Module
 Key: YARN-3063
 URL: https://issues.apache.org/jira/browse/YARN-3063
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen


Based on the discussion on the umbrella Jira, we need to create a new 
sub-module for TS next gen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-1364) Limit the number of outstanding tfile writers in FileSystemApplicationHistoryStore

2015-01-13 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-1364.
---
Resolution: Won't Fix

No longer maintain FS based generic history store.

 Limit the number of outstanding tfile writers in 
 FileSystemApplicationHistoryStore
 --

 Key: YARN-1364
 URL: https://issues.apache.org/jira/browse/YARN-1364
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 It seems to be expensive to maintain a big number of outstanding t-file 
 writers. RM is likely to run out of the I/O resources. Probably we'd like to 
 limit the number of concurrent outstanding t-file writers, and queue the 
 writing requests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2262) Few fields displaying wrong values in Timeline server after RM restart

2015-01-13 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-2262.
---
Resolution: Won't Fix

No longer maintain FS based generic history store.

 Few fields displaying wrong values in Timeline server after RM restart
 --

 Key: YARN-2262
 URL: https://issues.apache.org/jira/browse/YARN-2262
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 2.4.0
Reporter: Nishan Shetty
Assignee: Naganarasimha G R
 Attachments: Capture.PNG, Capture1.PNG, 
 yarn-testos-historyserver-HOST-10-18-40-95.log, 
 yarn-testos-resourcemanager-HOST-10-18-40-84.log, 
 yarn-testos-resourcemanager-HOST-10-18-40-95.log


 Few fields displaying wrong values in Timeline server after RM restart
 State:null
 FinalStatus:  UNDEFINED
 Started:  8-Jul-2014 14:58:08
 Elapsed:  2562047397789hrs, 44mins, 47sec 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2330) Jobs are not displaying in timeline server after RM restart

2015-01-13 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-2330.
---
Resolution: Won't Fix

No longer maintain FS based generic history store

 Jobs are not displaying in timeline server after RM restart
 ---

 Key: YARN-2330
 URL: https://issues.apache.org/jira/browse/YARN-2330
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 2.4.1
 Environment: Nodemanagers 3 (3*8GB)
 Queues A = 70%
 Queues B = 30%
Reporter: Nishan Shetty
Assignee: Naganarasimha G R

 Submit jobs to queue a
 While job is running Restart RM 
 Observe that those jobs are not displayed in timelineserver
 {code}
 2014-07-22 10:11:32,084 ERROR 
 org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore:
  History information of application application_1406002968974_0003 is not 
 included into the result due to the exception
 java.io.IOException: Cannot seek to negative offset
   at org.apache.hadoop.hdfs.DFSInputStream.seek(DFSInputStream.java:1381)
   at 
 org.apache.hadoop.fs.FSDataInputStream.seek(FSDataInputStream.java:63)
   at org.apache.hadoop.io.file.tfile.BCFile$Reader.init(BCFile.java:624)
   at org.apache.hadoop.io.file.tfile.TFile$Reader.init(TFile.java:804)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore$HistoryFileReader.init(FileSystemApplicationHistoryStore.java:683)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.getHistoryFileReader(FileSystemApplicationHistoryStore.java:661)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.getApplication(FileSystemApplicationHistoryStore.java:146)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.getAllApplications(FileSystemApplicationHistoryStore.java:199)
   at 
 org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryManagerImpl.getAllApplications(ApplicationHistoryManagerImpl.java:103)
   at 
 org.apache.hadoop.yarn.server.webapp.AppsBlock.render(AppsBlock.java:75)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
   at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
   at 
 org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
   at 
 org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
   at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:845)
   at 
 org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:56)
   at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
   at org.apache.hadoop.yarn.webapp.Dispatcher.render(Dispatcher.java:197)
   at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:156)
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
   at 
 com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
   at 
 com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
   at 
 com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
   at 
 com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
   at 
 com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
   at 
 com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
   at 
 com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
   at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
 org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1192)
   at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
   at 
 

[jira] [Resolved] (YARN-1835) History client service needs to be more robust

2015-01-13 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-1835.
---
Resolution: Invalid

The ApplicationHistoryManager has a new implementation, which doesn't have the 
aforementioned issue.

 History client service needs to be more robust
 --

 Key: YARN-1835
 URL: https://issues.apache.org/jira/browse/YARN-1835
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 While doing the test, I've found the following issues so far:
 1. The history file not found exception is exposed to the user directly, 
 which is better to be caught and translated into ApplicationNotFound.
 2. NPE will be exposed as well, since ApplicationHistoryManager doesn't do 
 necessary null check.
 In addition, TestApplicationHistoryManagerImpl missed to test most 
 ApplicationHistoryManager methods.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2412) Augment HistoryStorage Reader Interface to Support Filters When Getting Applications

2015-01-13 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-2412.
---
Resolution: Invalid

The generic history storage layer is rebuilt, the reader interface is not 
useful in the new stack.

 Augment HistoryStorage Reader Interface to Support Filters When Getting 
 Applications
 

 Key: YARN-2412
 URL: https://issues.apache.org/jira/browse/YARN-2412
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Shinichi Yamashita

 https://issues.apache.org/jira/browse/YARN-925?focusedCommentId=13800402page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13800402



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-1302) Add AHSDelegationTokenSecretManager for ApplicationHistoryProtocol

2015-01-13 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-1302.
---
Resolution: Duplicate

 Add AHSDelegationTokenSecretManager for ApplicationHistoryProtocol
 --

 Key: YARN-1302
 URL: https://issues.apache.org/jira/browse/YARN-1302
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 Like the ApplicationClientProtocol, ApplicationHistoryProtocol needs its own 
 security stack. We need to implement AHSDelegationTokenSecretManager, 
 AHSDelegationTokenIndentifier, AHSDelegationTokenSelector and other analogs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-1344) Separate ApplicationAttemptStartDataProto and ApplicationAttemptRegisteredDataProto

2015-01-13 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-1344.
---
Resolution: Invalid

The generic history storage has be rebuild. It's no longer an valid issue.

 Separate ApplicationAttemptStartDataProto and 
 ApplicationAttemptRegisteredDataProto
 ---

 Key: YARN-1344
 URL: https://issues.apache.org/jira/browse/YARN-1344
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 Some info in ApplicationAttemptStartData can separated, and put into 
 ApplicationAttemptRegisteredData, to further minimize the info loss 
 probability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-1346) Revisit the output type of the reader interface

2015-01-13 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-1346.
---
Resolution: Invalid

The generic history storage layer is rebuilt. It's no longer a valid problem.

 Revisit the output type of the reader interface
 ---

 Key: YARN-1346
 URL: https://issues.apache.org/jira/browse/YARN-1346
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 In YARN-947, there's a discussion in YARN-947 about changing the reader 
 interface to return the report protobuf (e.g., ApplicationReport) directly 
 instead of AHS internal objects (e.g., ApplicationHistoryData). We need to 
 think more about it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2177) Timeline server web interfaces high-availablity and scalability

2015-01-13 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-2177.
---
Resolution: Duplicate

Close it as the duplicate of YARN-2928. This topic will be covered by TS next 
gen.

 Timeline server web interfaces high-availablity and scalability
 ---

 Key: YARN-2177
 URL: https://issues.apache.org/jira/browse/YARN-2177
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 While we are going to leverage HBase to provide high available and scalable 
 storage solution, we also need to take care of high-availability and 
 scalability of the web interfaces, which are likely to handle a big volume of 
 user requests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2520) Scalable and High Available Timeline Server

2015-01-13 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-2520.
---
Resolution: Duplicate

Close it as the duplicate of YARN-2928, as TS next gen will cover this topic

 Scalable and High Available Timeline Server
 ---

 Key: YARN-2520
 URL: https://issues.apache.org/jira/browse/YARN-2520
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 3.0.0, 2.5.0
Reporter: Zhijie Shen
Assignee: Zhijie Shen
 Attachments: Federal Timeline Servers.jpg


 YARN-2032 will provide a scalable and reliable timeline store based on HBase. 
 However a single instance of the timeline server is not scalable enough to 
 handle a large volume of user requests, being the single bottleneck.
 As the timeline server is the stateless machine, it's not difficult to start 
 multiple timeline server instances and write into the same HBase timeline 
 store. We can make use of Zookeeper to register all the timeline servers, as 
 HA RMs do, and client can randomly pick one server to publish the timeline 
 entities for load balancing.
 Moreover, since multiple timeline servers are started together, they are 
 actually back up each other, solving the high availability problem as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2521) Reliable TimelineClient

2015-01-13 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-2521.
---
Resolution: Duplicate

Close the ticket as the duplicate of YARN-2928

 Reliable TimelineClient
 ---

 Key: YARN-2521
 URL: https://issues.apache.org/jira/browse/YARN-2521
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Affects Versions: 3.0.0, 2.5.0
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 The timeline server is likely to be in outage. It would be beneficial if the 
 timeline client can cache the timeline entity locally after the application 
 pass it to the client, and before the client successfully hands it over to 
 the server.
 To prevent the entity from being lost, we may want to persist it into the 
 secondary storage, such as HDFS and Leveldb.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-3013) Findbugs warning aboutAbstractYarnScheduler.rmContext

2015-01-07 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-3013.
---
Resolution: Duplicate

Close it as the duplicate. Thanks for pointing it out.

 Findbugs warning aboutAbstractYarnScheduler.rmContext
 -

 Key: YARN-3013
 URL: https://issues.apache.org/jira/browse/YARN-3013
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Rohith
 Attachments: 0001-YARN-3013.patch


 {code}
 Bug type IS2_INCONSISTENT_SYNC (click for details) 
 In class 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler
 Field 
 org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler.rmContext
 Synchronized 91% of the time
 {code}
 See 
 https://builds.apache.org/job/PreCommit-YARN-Build/6266//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html#IS2_INCONSISTENT_SYNC
  for more details



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2991) TestRMRestart.testDecomissionedNMsMetricsOnRMRestart intermittently fails on trunk

2014-12-23 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-2991:
-

 Summary: TestRMRestart.testDecomissionedNMsMetricsOnRMRestart 
intermittently fails on trunk
 Key: YARN-2991
 URL: https://issues.apache.org/jira/browse/YARN-2991
 Project: Hadoop YARN
  Issue Type: Test
Reporter: Zhijie Shen


{code}
Error Message

test timed out after 6 milliseconds
Stacktrace

java.lang.Exception: test timed out after 6 milliseconds
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1281)
at java.lang.Thread.join(Thread.java:1355)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.serviceStop(AsyncDispatcher.java:150)
at 
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
at 
org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
at 
org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
at 
org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
at 
org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStop(ResourceManager.java:1106)
at 
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
at 
org.apache.hadoop.yarn.server.resourcemanager.TestRMRestart.testDecomissionedNMsMetricsOnRMRestart(TestRMRestart.java:1873)
{code}

It happened twice this months:
https://builds.apache.org/job/PreCommit-YARN-Build/6096/
https://builds.apache.org/job/PreCommit-YARN-Build/6182/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2958) RMStateStore seems to unnecessarily and wronly store sequence number separately

2014-12-12 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-2958:
-

 Summary: RMStateStore seems to unnecessarily and wronly store 
sequence number separately
 Key: YARN-2958
 URL: https://issues.apache.org/jira/browse/YARN-2958
 Project: Hadoop YARN
  Issue Type: Bug
  Components: resourcemanager
Reporter: Zhijie Shen


It seems that RMStateStore updates last sequence number when storing or 
updating each individual DT, to recover the latest sequence number when RM 
restarting.

First, the current logic seems to be problematic:
{code}
  public synchronized void updateRMDelegationTokenAndSequenceNumber(
  RMDelegationTokenIdentifier rmDTIdentifier, Long renewDate,
  int latestSequenceNumber) {
if(isFencedState()) {
  LOG.info(State store is in Fenced state. Can't update RM Delegation 
Token.);
  return;
}
try {
  updateRMDelegationTokenAndSequenceNumberInternal(rmDTIdentifier, 
renewDate,
  latestSequenceNumber);
} catch (Exception e) {
  notifyStoreOperationFailed(e);
}
  }
{code}
{code}
  @Override
  protected void updateStoredToken(RMDelegationTokenIdentifier id,
  long renewDate) {
try {
  LOG.info(updating RMDelegation token with sequence number: 
  + id.getSequenceNumber());
  rmContext.getStateStore().updateRMDelegationTokenAndSequenceNumber(id,
renewDate, id.getSequenceNumber());
} catch (Exception e) {
  LOG.error(Error in updating persisted RMDelegationToken with sequence 
number: 
+ id.getSequenceNumber());
  ExitUtil.terminate(1, e);
}
  }
{code}
According to code above, even when renewing a DT, the last sequence number is 
updated in the store, which is wrong. For example, we have the following 
sequence:
1. Get DT 1 (seq = 1)
2. Get DT 2( seq = 2)
3. Renew DT 1 (seq = 1)
4. Restart RM
The stored and then recovered last sequence number is 1. It makes the next 
created DT after RM restarting will conflict with DT 2 on sequence num.

Second, the aforementioned bug doesn't happen actually, because the recovered 
last sequence num has been overwritten at by the correctly one.
{code}
  public void recover(RMState rmState) throws Exception {

LOG.info(recovering RMDelegationTokenSecretManager.);
// recover RMDTMasterKeys
for (DelegationKey dtKey : rmState.getRMDTSecretManagerState()
  .getMasterKeyState()) {
  addKey(dtKey);
}

// recover RMDelegationTokens
MapRMDelegationTokenIdentifier, Long rmDelegationTokens =
rmState.getRMDTSecretManagerState().getTokenState();
this.delegationTokenSequenceNumber =
rmState.getRMDTSecretManagerState().getDTSequenceNumber();
for (Map.EntryRMDelegationTokenIdentifier, Long entry : rmDelegationTokens
  .entrySet()) {
  addPersistedDelegationToken(entry.getKey(), entry.getValue());
}
  }
{code}
The code above recovers delegationTokenSequenceNumber by reading the last 
sequence number in the store. It could be wrong. Fortunately, 
delegationTokenSequenceNumber updates it to the right number.
{code}
if (identifier.getSequenceNumber()  getDelegationTokenSeqNum()) {
  setDelegationTokenSeqNum(identifier.getSequenceNumber());
}
{code}
All the stored identifiers will be gone through, and 
delegationTokenSequenceNumber will be set to the largest sequence number among 
these identifiers. Therefore, new DT will be assigned a sequence number which 
is always larger than that of all the recovered DT.

To sum up, two negatives make a positive, but it's good to fix the issue. 
Please let me know if I've missed something here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2879) Compatibility validation between YARN 2.2/2.4 and 2.6

2014-11-19 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-2879:
-

 Summary: Compatibility validation between YARN 2.2/2.4 and 2.6
 Key: YARN-2879
 URL: https://issues.apache.org/jira/browse/YARN-2879
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen


Recently, I did some simple backward compatibility experiments. Bascially, I've 
taken the following 2 steps:

1. Deploy the application (MR and DistributedShell) that is compiled against 
*old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is 
submitted via *new* Hadoop (2.6) client.

2. Deploy the application (MR and DistributedShell) that is compiled against 
*old* YARN API (2.2/2.4) on *new* YARN cluster (2.6). The application is 
submitted via *old* Hadoop (2.2/2.4) client that comes with the app.

I've tried these 2 steps on both insecure and secure cluster. Here's a short 
summary:

|| || || DS 2.2 || DS 2.4 || MR 2.2 + Shuffle 2.2 || MR 2.2 + Shuffle 2.6 || MR 
2.4 + Shuffle 2.4 || MR 2.4 + Shuffle 2.6 ||
| Insecure | New Client | OK | OK | Client Incompatible | Client Incompatible | 
OK | OK |
| Insecure | Old Client | OK | OK | AM Incompatible | Client Incompatible | OK 
| OK |
| secure | New Client | OK | OK | Client Incompatible | Client Incompatible | 
OK | OK |
| secure | Old Client | OK | OK | AM Incompatible | Client Incompatible | OK | 
OK |

Note that I've tried to run NM with both old and new shuffle handler version.

In general, the compatibility looks good overall. There're a few issues that 
are related to MR, but they seem to be not the YARN issue. I'll post the 
individual problem in the follow-up comments.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2838) Issues with TimeLineServer (Application History)

2014-11-17 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-2838.
---
Resolution: Not a Problem

Close the ticket and work on separate jiras.

 Issues with TimeLineServer (Application History)
 

 Key: YARN-2838
 URL: https://issues.apache.org/jira/browse/YARN-2838
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Affects Versions: 2.6.0, 2.5.1
Reporter: Naganarasimha G R
Assignee: Naganarasimha G R
 Attachments: IssuesInTimelineServer.pdf


 Few issues in usage of Timeline server for generic application history access



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2867) TimelineClient DT methods should check if the timeline service is enabled or not

2014-11-14 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-2867:
-

 Summary: TimelineClient DT methods should check if the timeline 
service is enabled or not
 Key: YARN-2867
 URL: https://issues.apache.org/jira/browse/YARN-2867
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Zhijie Shen


DT related methods doesn't check if isEnabled == true. On the other side, the 
internal stuff is only inited when isEnabled == true. NPE happens if users call 
these methods when the timeline service config is not set to enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (YARN-2867) TimelineClient DT methods should check if the timeline service is enabled or not

2014-11-14 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-2867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen resolved YARN-2867.
---
Resolution: Invalid

Per discussion on 
[YARN-2375|https://issues.apache.org/jira/browse/YARN-2375?focusedCommentId=14213002page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14213002],
 close this Jira as invalid 

 TimelineClient DT methods should check if the timeline service is enabled or 
 not
 

 Key: YARN-2867
 URL: https://issues.apache.org/jira/browse/YARN-2867
 Project: Hadoop YARN
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Zhijie Shen

 DT related methods doesn't check if isEnabled == true. On the other side, the 
 internal stuff is only inited when isEnabled == true. NPE happens if users 
 call these methods when the timeline service config is not set to enabled.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2861) Timeline DT secret manager should not reuse the RM's configs.

2014-11-13 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-2861:
-

 Summary: Timeline DT secret manager should not reuse the RM's 
configs.
 Key: YARN-2861
 URL: https://issues.apache.org/jira/browse/YARN-2861
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: Zhijie Shen
Assignee: Zhijie Shen


This is the configs for RM DT secret manager. We should create separate ones 
for timeline DT only.
{code}
  @Override
  protected void serviceInit(Configuration conf) throws Exception {
long secretKeyInterval =
conf.getLong(YarnConfiguration.DELEGATION_KEY_UPDATE_INTERVAL_KEY,
YarnConfiguration.DELEGATION_KEY_UPDATE_INTERVAL_DEFAULT);
long tokenMaxLifetime =
conf.getLong(YarnConfiguration.DELEGATION_TOKEN_MAX_LIFETIME_KEY,
YarnConfiguration.DELEGATION_TOKEN_MAX_LIFETIME_DEFAULT);
long tokenRenewInterval =
conf.getLong(YarnConfiguration.DELEGATION_TOKEN_RENEW_INTERVAL_KEY,
YarnConfiguration.DELEGATION_TOKEN_RENEW_INTERVAL_DEFAULT);
secretManager = new TimelineDelegationTokenSecretManager(secretKeyInterval,
tokenMaxLifetime, tokenRenewInterval,
360);
secretManager.startThreads();

serviceAddr = TimelineUtils.getTimelineTokenServiceAddress(getConfig());
super.init(conf);
  }
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2854) The document about timeline service and generic service needs to be updated

2014-11-11 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-2854:
-

 Summary: The document about timeline service and generic service 
needs to be updated
 Key: YARN-2854
 URL: https://issues.apache.org/jira/browse/YARN-2854
 Project: Hadoop YARN
  Issue Type: Bug
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Critical






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (YARN-2837) Timeline server needs to recover the timeline DT when restarting

2014-11-09 Thread Zhijie Shen (JIRA)
Zhijie Shen created YARN-2837:
-

 Summary: Timeline server needs to recover the timeline DT when 
restarting
 Key: YARN-2837
 URL: https://issues.apache.org/jira/browse/YARN-2837
 Project: Hadoop YARN
  Issue Type: New Feature
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen
Priority: Blocker


Timeline server needs to recover the stateful information when restarting as 
RM/NM/JHS does now. So far the stateful information only includes the timeline 
DT. Without recovery, the timeline DT of the existing YARN apps is not long 
valid, and cannot be renewed any more after the timeline server is restarted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   >